From 9625deed575e98dfdcee9f68ef795c43cda2d080 Mon Sep 17 00:00:00 2001 From: Stephanie Blotner Date: Mon, 12 Jun 2023 17:02:12 -0700 Subject: [PATCH] Edit doccano docs --- README.md | 20 ++++---- docs/developer_guide.md | 8 +-- docs/faq.md | 109 ++++++++++++++++++++++------------------ docs/index.md | 56 ++++++++++++--------- docs/mkdocs.yml | 6 +-- docs/tutorial.md | 60 ++++++++++++++-------- 6 files changed, 147 insertions(+), 112 deletions(-) diff --git a/README.md b/README.md index 5f6f7187..30058019 100644 --- a/README.md +++ b/README.md @@ -7,17 +7,17 @@ [![Codacy Badge](https://app.codacy.com/project/badge/Grade/35ac8625a2bc4eddbff23dbc61bc6abb)](https://www.codacy.com/gh/doccano/doccano/dashboard?utm_source=github.com&utm_medium=referral&utm_content=doccano/doccano&utm_campaign=Badge_Grade) [![doccano CI](https://github.com/doccano/doccano/actions/workflows/ci.yml/badge.svg)](https://github.com/doccano/doccano/actions/workflows/ci.yml) -doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence to sequence tasks. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours. +doccano is an open-source text annotation tool for humans. It provides annotation features for text classification, sequence labeling, and sequence to sequence tasks. You can create labeled data for sentiment analysis, named entity recognition, text summarization, and so on. Just create a project, upload data, and start annotating. You can build a dataset in hours. ## Demo -You can try the [annotation demo](http://doccano.herokuapp.com). +Try the [annotation demo](http://doccano.herokuapp.com). ![Demo image](https://raw.githubusercontent.com/doccano/doccano/master/docs/images/demo/demo.gif) ## Documentation -Read the documentation at the . +Read the documentation at . ## Features @@ -30,7 +30,7 @@ Read the documentation at the . ## Usage -Three options to run doccano: +There are three options to run doccano: - pip (Python 3.8+) - Docker @@ -38,7 +38,7 @@ Three options to run doccano: ### pip -To install doccano, simply run: +To install doccano, run: ```bash pip install doccano @@ -50,7 +50,7 @@ By default, SQLite 3 is used for the default database. If you want to use Postgr pip install 'doccano[postgresql]' ``` -and set `DATABASE_URL` environment variable according to your PostgreSQL credentials: +and set the `DATABASE_URL` environment variable according to your PostgreSQL credentials: ```bash DATABASE_URL="postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DB}?sslmode=disable" @@ -67,7 +67,7 @@ doccano createuser --username admin --password pass doccano webserver --port 8000 ``` -In another terminal, run the following command: +In another terminal, run the command: ```bash # Start the task queue to handle file upload/download. @@ -100,7 +100,7 @@ Go to . To stop the container, run `docker container stop doccano -t 5`. All data created in the container will persist across restarts. -If you want to use the latest features, please specify `nightly` tag: +If you want to use the latest features, specify the `nightly` tag: ```bash docker pull doccano/doccano:nightly @@ -108,7 +108,7 @@ docker pull doccano/doccano:nightly ### Docker Compose -You need to install Git and to clone the repository: +You need to install Git and clone the repository: ```bash git clone https://github.com/doccano/doccano.git @@ -189,4 +189,4 @@ Here are some tips might be helpful. [How to Contribute to Doccano Project](http ## Contact -For help and feedback, please feel free to contact [the author](https://github.com/Hironsan). +For help and feedback, feel free to contact [the author](https://github.com/Hironsan). diff --git a/docs/developer_guide.md b/docs/developer_guide.md index 7260f420..85652b56 100644 --- a/docs/developer_guide.md +++ b/docs/developer_guide.md @@ -1,6 +1,6 @@ # Developer Guide -The important directories are as follows: +The important doccano directories are: ```bash ├── backend/ @@ -11,7 +11,7 @@ The important directories are as follows: ## backend -The `backend/` directory includes the backend's REST API code. These APIs are built by [Python 3.8+](https://www.python.org/) and [Django 4.0+](https://www.djangoproject.com). The all of the packages are managed by Poetry, Python packaging and dependency management software. The directory structure of the backend follows mainly [Django](https://www.djangoproject.com) one. The following table shows the main files and directories: +The `backend/` directory includes the backend's REST API code. These APIs are built by [Python 3.8+](https://www.python.org/) and [Django 4.0+](https://www.djangoproject.com). All of the packages are managed by Poetry, Python packaging, and dependency management software. The directory structure of the backend follows mainly the [Django](https://www.djangoproject.com) structure. The following table shows the main files and directories: | file or directory | description | | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -32,7 +32,7 @@ The `backend/` directory includes the backend's REST API code. These APIs are bu | poetry.lock | Related to Poetry. This file prevents you from automatically getting the latest versions of your dependencies. See [Basic usage](https://python-poetry.org/docs/basic-usage/) in Poetry documentation. | | pyproject.toml | This file contains build system requirements and information, which are used by pip to build the package. See [pyproject.toml](https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/) and [The pyproject.toml file in Poetry](https://python-poetry.org/docs/pyproject/) in detail. | -If you want to setup the backend environment, please see [Installation guide](./install_and_upgrade_doccano.md#install-from-source). +If you want to set up the backend environment, see the [Installation guide](./install_and_upgrade_doccano.md#install-from-source). Also, you can set the following environment variables: @@ -68,7 +68,7 @@ On the other hand, the one of the `Dockerfile` is as follows: ## frontend -The `frontend/` directory contains frontend code. The `frontent` directory structure follows [Nuxt.js](https://ru.nuxtjs.org) one. See the [Nuxt.js documentation](https://nuxtjs.org/guide/directory-structure/) in details. +The `frontend/` directory contains frontend code. The `frontend` directory structure follows the [Nuxt.js](https://ru.nuxtjs.org) structure. See the [Nuxt.js documentation](https://nuxtjs.org/guide/directory-structure/) for details. ## tools diff --git a/docs/faq.md b/docs/faq.md index 13e4a857..fe1c7fe7 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -1,70 +1,79 @@ # FAQ ## How to create a user +To create a new doccano user: +1. Run the doccano webserver. +2. Log in to the admin site (in the case of pip installation) via . The example below uses the port `8000` and username `admin`. If you set your own port or username and password on running the server, use those values to log in. -After running doccano webserver, login to the admin site(in the case of pip installation) via . The below is the example of port `8000` and username `admin`. If you set your own port or username and password on running the server, please change to your one. + ![](images/faq/user_creation/login.png) -![](images/faq/user_creation/login.png) +3. After logging in to the admin site, click **Users**: -After login to the admin site, select `Users`: + ![](images/faq/user_creation/select_users.png) -![](images/faq/user_creation/select_users.png) +4. Click the **ADD USER** button in the upper right corner: -Select the ADD USER button in the upper right corner: + ![](images/faq/user_creation/select_add_user.png) -![](images/faq/user_creation/select_add_user.png) +5. After entering the username and password for the new user, click **SAVE**: -After entering the username and password for the new user, select the `SAVE` button: + ![](images/faq/user_creation/create_user.png) -![](images/faq/user_creation/create_user.png) - -Congratulations. Now you are able to log in to doccano as a new user. After logging out of the admin site, try logging in as a new user. +Congratulations. Now you can log in to doccano as a new user. After logging out of the admin site, try logging in to doccano as a new user. ## How to add a user to your project -Note: This step assumes you have already created a new user. See [How to create a user](#how-to-create-a-user) in detail. +**Note**: You must be the administrator of the project to add new users to it. These instructions also assume that you have already created a new user. See [How to create a user](#how-to-create-a-user) above. -After logging in to doccano, select your project. Note that you must be the administrator of the project to add users to the project. +To add a user to your project: -Select `Members` from the left side menu. If you are not the administrator of the project, `Members` will not be displayed. +1. Log in to doccano. +2. Click on your project. +3. From the left side menu, click **Members**. If you are not the administrator of the project, **Members** will not appear. -![](images/faq/add_annotator/select_members.png) + ![](images/faq/add_annotator/select_members.png) -Select the `Add` button to display the form. Fill in this form with the user name and role you want to add to the project. Then, select the `Save` button. +4. Click **Add** and fill in the Add Member form with the user name and role you want to add to the project. +5. Click **Save**. ![](images/faq/add_annotator/select_user.png) -Congratulations. Now the new user are able to access the project. +Now the new user can access the project. ## How to change the password -After running doccano webserver, login to the admin site(in the case of pip installation) via . Note that you need to have a staff permission to login to the admin site. If you don't have it, please ask the administrator to change your password. +To change a user's password: + +1. Run the doccano webserver. +2. Log in to the admin site (in the case of pip installation) via . -![](images/faq/user_creation/login.png) + **Note**: You need to have a staff permission to log in to the admin site. If you don't have the right permissions, ask the administrator to change your password. -After login to the admin site, select `Users`: + ![](images/faq/user_creation/login.png) -![](images/faq/user_creation/select_users.png) +3. Click **Users**. -Select the user you want to change the password: + ![](images/faq/user_creation/select_users.png) -![](images/faq/how_to_change_password/user_list.png) +4. Click on the name of the user whose password you want to change: -Click `this form` link: + ![](images/faq/how_to_change_password/user_list.png) -![](images/faq/how_to_change_password/user_page.png) +5. Click the link that says **this form** in the password section. -After showing a form below, change password there: + ![](images/faq/how_to_change_password/user_page.png) -![](images/faq/how_to_change_password/change_password.png) +6. Fill out the form and change the password. + + ![](images/faq/how_to_change_password/change_password.png) ## I can't upload my data -Please check the following list. +To troubleshoot, review this list: - File encoding: `UTF-8` is appropriate. - Filename: alphabetic file name is suitable. -- File format selection: File format radio button should be selected properly. +- File format selection: file format radio button should be selected properly. - When you are using JSON/JSONL: Confirm JSON data is valid. - You can use [JSONLint](https://jsonlint.com/) or some other tool (when JSONL, pick one data and check it). - When you are using CSV: Confirm CSV data is valid. @@ -72,29 +81,31 @@ Please check the following list. - Lack of line: Data file should not contain blank line. - Lack of field: Data file should not contain blank field. -**You don't need your real & all data to validate file format. The picked data & masked data is suitable if your data is large or secret.** +**You don't need your real complete data to validate the file format. The picked data and masked data is suitable if your data is large or secret.** -## I want to change port number +## I want to change the port number -In the case of Docker Compose, you can change the port number by editing `docker-compose.prod.yml`. First, you change `80:8080` to `:8080` in `nginx`/`ports` section as follows: +In the case of Docker Compose, you can change the port number by editing `docker-compose.prod.yml`. -```yaml -nginx: - image: doccano/doccano:frontend - ports: - - :8080 -``` +1. Change `80:8080` to `:8080` in `nginx`/`ports` section as follows: -Then, you need to add `CSRF_TRUSTED_ORIGINS` environment variable to `backend`/`environment` section as follows: + ```yaml + nginx: + image: doccano/doccano:frontend + ports: + - :8080 + ``` -```yaml -backend: - image: doccano/doccano:backend - environment: - ... - DJANGO_SETTINGS_MODULE: "config.settings.production" - CSRF_TRUSTED_ORIGINS: "http://127.0.0.1:" -``` +2. Add the `CSRF_TRUSTED_ORIGINS` environment variable to the `backend`/`environment` section as follows: + + ```yaml + backend: + image: doccano/doccano:backend + environment: + ... + DJANGO_SETTINGS_MODULE: "config.settings.production" + CSRF_TRUSTED_ORIGINS: "http://127.0.0.1:" + ``` ## I want to update to the latest doccano image @@ -121,13 +132,11 @@ local doccano_www doccano uses JSONField on SQLite. So you need to enable the JSON1 extension on Python's sqlite3 library. If the extension is not enabled on your installation, a system error will be raised. This is especially related to the user who uses macOS and Python which is less than 3.7, Windows and Python which is less than 3.9. -If you have this problem, please try the following: - -- [Enabling JSON1 extension on SQLite](https://code.djangoproject.com/wiki/JSON1Extension) +If you have this problem, try [enabling JSON1 extension on SQLite](https://code.djangoproject.com/wiki/JSON1Extension). ## CSRF failed -If you have this problem, please set `CSRF_TRUSTED_ORIGINS` environment variable to your domain name. For example, if your domain name is `example.com`, please set `CSRF_TRUSTED_ORIGINS=example.com`. In the debug mode, the default value is `http://127.0.0.1:3000`, `http://0.0.0.0:3000`, and `http://localhost:3000`. If you are using Docker Compose, please set `CSRF_TRUSTED_ORIGINS` in `docker-compose.prod.yml`: +If you have this problem, set the `CSRF_TRUSTED_ORIGINS` environment variable to your domain name. For example, if your domain name is `example.com`, set `CSRF_TRUSTED_ORIGINS=example.com`. In the debug mode, the default value is `http://127.0.0.1:3000`, `http://0.0.0.0:3000`, and `http://localhost:3000`. If you are using Docker Compose, set `CSRF_TRUSTED_ORIGINS` in `docker-compose.prod.yml`: ```yaml backend: diff --git a/docs/index.md b/docs/index.md index ab7cbbe8..3a4d5d85 100644 --- a/docs/index.md +++ b/docs/index.md @@ -2,49 +2,54 @@ ## What is doccano? -doccano is an open-source data labeling tool for machine learning practitioners. You can perform different types of labeling tasks with many data formats. You can try doccano from the [demo page](http://doccano.herokuapp.com). +**doccano** is an open-source data labeling tool for machine learning practitioners. You can use doccano to perform different types of labeling tasks with many data formats. To see what doccano can do, try the [doccano demo](http://doccano.herokuapp.com). ![Demo image](https://raw.githubusercontent.com/doccano/doccano/master/docs/images/demo/demo.gif) -You can also integrate doccano with your script because it exposes the features as REST APIs. By using the APIs, you can label your data by using some machine learning model. See API documentation in detail. +You can also integrate doccano with your script via the doccano REST APIs. By using the doccano APIs, you can label your data by using some machine learning model. -## Labeling workflow with doccano +## Doccano labeling workflow -Start and finish a labeling project with doccano by the following steps: +To complete a labeling project with doccano: 1. Install doccano. 2. Run doccano. 3. Set up the labeling project. Select the type of labeling project and configure project settings. -4. Import dataset. You can also import labeled datasets. +4. Import your dataset. You can also import labeled datasets. 5. Add users to the project. 6. Define the annotation guideline. 7. Start labeling the data. 8. Export the labeled dataset. -## Quick start +## Quickstart -1. Install doccano: +1. Install doccano with pip (Python 3.8+): -```bash -pip install doccano -``` + ```bash + pip install doccano + ``` 2. Run doccano: -```bash -doccano init -doccano createuser -doccano webserver -# In another terminal, run the following command: -doccano task -``` - -3. Open doccano UI at . -4. Sign up with a username and password created by the `doccano createuser`. -5. Click `Create` to create a project and start labeling data. -6. Click `Import dataset` on the dataset page and import the dataset you want to use. -7. Click `Start annotation` and label the data. -8. Click `Export dataset` on the dataset page and export the labeled dataset. + ```bash + doccano init + doccano createuser + doccano webserver + + # In another terminal, run the command: + doccano task + ``` + +3. Open the doccano UI at . +4. Sign in with the username and password created by `doccano createuser`. The default is **username:** admin, **password:** password. + +5. Change the default admin password at . + +6. Return to the doccano UI at . +7. Create a project for labeling data. Click **Create**, select a project type, and fill out project details. +8. Import a dataset. Go to the **Dataset** page and click **Actions** > **Import Dataset** and import the dataset you want to use. +9. Click **Annotate** and label the data. +10. When you're finished, export the labeled dataset. Go to the **Dataset** page and click **Actions** > **Export dataset**. ## Architecture @@ -56,5 +61,6 @@ You can customize doccano to suit your needs. The architecture of doccano consis | [doccano frontend](https://github.com/doccano/doccano/tree/master/frontend) | Javascript web app using [Vue.js](https://vuejs.org/) and [Nuxt.js](https://nuxtjs.org/) | Perform data labeling in a user interface. | ## Contact +If you get stuck, check the [FAQ](../docs/faq.md). -For help and feedback, please feel free to contact [the author](https://github.com/Hironsan). +For help and feedback, feel free to contact [the author](https://github.com/Hironsan). diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index ed38b76e..31939d2e 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -38,8 +38,8 @@ nav: - Advanced: - AWS HTTPS settings: advanced/aws_https_settings.md - OAuth2 settings: advanced/oauth2_settings.md - - Auto Labeling settings: advanced/auto_labelling_config.md - - Developer Guide: developer_guide.md + - Auto labeling settings: advanced/auto_labelling_config.md + - Developer guide: developer_guide.md - FAQ: faq.md - - Code of Conduct: CODE_OF_CONDUCT.md + - Code of conduct: CODE_OF_CONDUCT.md - Roadmap: roadmap.md diff --git a/docs/tutorial.md b/docs/tutorial.md index 70e77ca9..7adc0230 100644 --- a/docs/tutorial.md +++ b/docs/tutorial.md @@ -1,8 +1,10 @@ # Tutorial +This tutorial demonstrates how to use doccano to complete a named entity recognition annotation task for an example science fiction dataset. + ## Dataset -Here we take named entity recognition annotation task for science fiction to give you a brief tutorial on doccano. Below is a JSON file named `books.json` containing lots of science fictions description with different languages. We need to annotate some entities like person name, book title, date and so on. +Here is a JSON file named `books.json` containing lots of science fiction book descriptions in different languages. We need to annotate some entities like names, book titles, dates, and so on. ```json {"text": "The Hitchhiker's Guide to the Galaxy (sometimes referred to as HG2G, HHGTTGor H2G2) is a comedy science fiction series created by Douglas Adams. Originally a radio comedy broadcast on BBC Radio 4 in 1978, it was later adapted to other formats, including stage shows, novels, comic books, a 1981 TV series, a 1984 video game, and 2005 feature film."} @@ -12,55 +14,73 @@ Here we take named entity recognition annotation task for science fiction to giv ## Create a project -We need to create a new project for this task. Log in with the superuser account. +To start, let's create a new project for this task. + +1. Log in to doccano with the superuser account. -![Sign in as a superuser.](./images/tutorial/signin.png) + ![Sign in as a superuser.](./images/tutorial/signin.png) -To create your project, make sure you're in the project list page and click `Create` button. As for this tutorial, we name the project as `sequence labeling for books`, write some description, choose the sequence labeling task type. +2. To create your project, go to the project list page and click **Create** +3. Fill out the project details. For this tutorial, name the project `sequence labeling for books`, write a description, and choose the sequence labeling task type. -![Creating a project.](./images/tutorial/create_project.png) + ![Creating a project.](./images/tutorial/create_project.png) ## Import a dataset -After creating a project, we will see the `Dataset` page, and click `Import dataset` button in the `Actions` menu. We should see the following screen: +After creating a project, the **Dataset** page appears. + +To import a dataset: + +1. Click **Actions** > **Import Dataset**. You should see the following screen: -![Importing a dataset.](./images/tutorial/import_dataset.png) + ![Importing a dataset.](./images/tutorial/import_dataset.png) -We choose `JSON` and click `Select a file` button. Select `books.json` and it would be loaded automatically. +2. Choose **JSON** and click **Select a file. +3. Click **books.json** and it will load automatically. ## Define labels -Click `Labels` button in the left left side menu to define our labels. We should see the label editor page. In label editor page, you can create labels by specifying label text, shortcut key, background color and text color. +Define the labels to use for your annotation project: -![Defining labels.](./images/tutorial/define_labels.png) +1. Click **Labels** in the left side menu. You should see the label editor page. +2. On the label editor page, create labels by specifying label text, a shortcut key, background color, and text color. For this tutorial, let's create some entities related to science fiction, as shown below. -As for the tutorial, we created some entities related to science fictions. + ![Defining labels.](./images/tutorial/define_labels.png) ## Add members -Click `Members` button in the left side menu. If you are not the project administrator, the button won't be displayed. +Members are users who can participate in labeling activities. To add members: -![](images/faq/add_annotator/select_members.png) +1. Click **Members** in the left side menu. If you are not the project administrator, the button won't appear. -Then, select the `Add` button to display the form. Fill in this form with the user name and role you want to add to the project. Then, select the `Save` button. + ![](images/faq/add_annotator/select_members.png) -![](images/faq/add_annotator/select_user.png) +2. Click **Add** to display the Add Member form. -If there is no user to select, please create users(see [FAQ](./faq.md)). + ![](images/faq/add_annotator/select_user.png) + +3. Fill in the form with the user name and role you want to add to the project. If there is no user to select, you need to create the user first. See the [FAQ](./faq.md) for instructions. +4. Click **Save**. ## Annotation -Next, we are ready to annotate the texts. Just click the `Start annotation` button in the navigation bar, we can start to annotate the documents. +Next, let's annotate the texts. + +Click **Start annotation** in the navigation bar to start annotating the documents. ![Annotating named entities.](./images/tutorial/annotation.png) ## Export the dataset -After the annotation step, we can download the annotated data. Go to the `Dataset` page and click the `Export dataset` button in the `Action` menu. After selecting an export format, click `Export`. You should see the following screen: +After finishing the annotation step, let's download the annotated data. + +1. Go to the **Dataset** page and click **Action** > **Export Dataset**. +2. Select an export format. For this tutorial choose the JSONL format. +3. Click **Export**. You should see this screen: ![Exporting a dataset.](./images/tutorial/export_dataset.png) -Here we choose JSONL file to download the data by clicking the button. Below is the annotated result for our tutorial project. + Below is the annotated result for this tutorial. `sequence_labeling_for_books.json` @@ -71,4 +91,4 @@ Here we choose JSONL file to download the data by clicking the button. Below is "username": "admin"} ``` -Congratulation! You just mastered how to use doccano for a sequence labeling project. +Congratulations! You just explored how to use doccano for a sequence labeling project.