From bf7b00c69c89ad8d2a947d0e654210ef68026e09 Mon Sep 17 00:00:00 2001 From: "serhii.nechyporchuk" Date: Wed, 26 Dec 2018 16:00:56 +0200 Subject: [PATCH] iss45: update README to have information on metadata --- README.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 451ce476..a157a8c7 100644 --- a/README.md +++ b/README.md @@ -115,7 +115,7 @@ After creating a project, you will see the "Import Data" page, or click `Import Upload project You can upload two types of files: -- `TXT file`: each line contains a text and no line breaks (`\n`). +- `CSV file`: file must contain a header with a `text` column or be one-column csv file. - `JSON file`: each line contains a JSON object with a `text` key. JSON format supports line breaks rendering. > Notice: Doccano won't render line breaks in annotation page for sequence labeling task due to the indent problem, but the exported JSON file still contains line breaks. @@ -135,6 +135,8 @@ He lives in Newark, Ohio. ... ``` +Any other columns (for csv) or keys (for json) are preserved and will be exported in the `metadata` column or key as is. + Once you select a TXT/JSON file on your computer, click `Upload dataset` button. After uploading the dataset file, we will see the `Dataset` page (or click `Dataset` button list in the left bar). This page displays all the documents we uploaded in one project. ### Define labels @@ -156,7 +158,22 @@ After the annotation step, you can download the annotated data. Click the `Edit Edit label -You can export data as CSV file or JSON file by clicking the button. As for the export file format, you can check it here: [Export File Formats](https://github.com/chakki-works/doccano/wiki/Export-File-Formats) +You can export data as CSV file or JSON file by clicking the button. As for the export file format, you can check it here: [Export File Formats](https://github.com/chakki-works/doccano/wiki/Export-File-Formats). + +Each exported document will have metadata column or key, which will contain +additional columns or keys from the imported document. The primary use-case for metadata is to allow you to match exported data with other system +by adding `external_id` to the imported file. For example: + +Input file may look like this: +`import.json` +```JSON +{"text": "EU rejects German call to boycott British lamb.", "external_id": 1} +``` +and the exported file will look like this: +`output.json` +```JSON +{"doc_id": 2023, "text": "EU rejects German call to boycott British lamb.", "labels": ["news"], "username": "root", "metadata": {"external_id": 1}} +``` ### Tutorial