Browse Source

Edit doccano docs

pull/2216/head
Stephanie Blotner 1 year ago
parent
commit
9625deed57
6 changed files with 147 additions and 112 deletions
  1. 20
      README.md
  2. 8
      docs/developer_guide.md
  3. 109
      docs/faq.md
  4. 56
      docs/index.md
  5. 6
      docs/mkdocs.yml
  6. 60
      docs/tutorial.md

20
README.md

@ -7,17 +7,17 @@
[![Codacy Badge](https://app.codacy.com/project/badge/Grade/35ac8625a2bc4eddbff23dbc61bc6abb)](https://www.codacy.com/gh/doccano/doccano/dashboard?utm_source=github.com&utm_medium=referral&utm_content=doccano/doccano&utm_campaign=Badge_Grade)
[![doccano CI](https://github.com/doccano/doccano/actions/workflows/ci.yml/badge.svg)](https://github.com/doccano/doccano/actions/workflows/ci.yml)
doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence to sequence tasks. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours.
doccano is an open-source text annotation tool for humans. It provides annotation features for text classification, sequence labeling, and sequence to sequence tasks. You can create labeled data for sentiment analysis, named entity recognition, text summarization, and so on. Just create a project, upload data, and start annotating. You can build a dataset in hours.
## Demo
You can try the [annotation demo](http://doccano.herokuapp.com).
Try the [annotation demo](http://doccano.herokuapp.com).
![Demo image](https://raw.githubusercontent.com/doccano/doccano/master/docs/images/demo/demo.gif)
## Documentation
Read the documentation at the <https://doccano.github.io/doccano/>.
Read the documentation at <https://doccano.github.io/doccano/>.
## Features
@ -30,7 +30,7 @@ Read the documentation at the <https://doccano.github.io/doccano/>.
## Usage
Three options to run doccano:
There are three options to run doccano:
- pip (Python 3.8+)
- Docker
@ -38,7 +38,7 @@ Three options to run doccano:
### pip
To install doccano, simply run:
To install doccano, run:
```bash
pip install doccano
@ -50,7 +50,7 @@ By default, SQLite 3 is used for the default database. If you want to use Postgr
pip install 'doccano[postgresql]'
```
and set `DATABASE_URL` environment variable according to your PostgreSQL credentials:
and set the `DATABASE_URL` environment variable according to your PostgreSQL credentials:
```bash
DATABASE_URL="postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DB}?sslmode=disable"
@ -67,7 +67,7 @@ doccano createuser --username admin --password pass
doccano webserver --port 8000
```
In another terminal, run the following command:
In another terminal, run the command:
```bash
# Start the task queue to handle file upload/download.
@ -100,7 +100,7 @@ Go to <http://127.0.0.1:8000/>.
To stop the container, run `docker container stop doccano -t 5`. All data created in the container will persist across restarts.
If you want to use the latest features, please specify `nightly` tag:
If you want to use the latest features, specify the `nightly` tag:
```bash
docker pull doccano/doccano:nightly
@ -108,7 +108,7 @@ docker pull doccano/doccano:nightly
### Docker Compose
You need to install Git and to clone the repository:
You need to install Git and clone the repository:
```bash
git clone https://github.com/doccano/doccano.git
@ -189,4 +189,4 @@ Here are some tips might be helpful. [How to Contribute to Doccano Project](http
## Contact
For help and feedback, please feel free to contact [the author](https://github.com/Hironsan).
For help and feedback, feel free to contact [the author](https://github.com/Hironsan).

8
docs/developer_guide.md

@ -1,6 +1,6 @@
# Developer Guide
The important directories are as follows:
The important doccano directories are:
```bash
├── backend/
@ -11,7 +11,7 @@ The important directories are as follows:
## backend
The `backend/` directory includes the backend's REST API code. These APIs are built by [Python 3.8+](https://www.python.org/) and [Django 4.0+](https://www.djangoproject.com). The all of the packages are managed by Poetry, Python packaging and dependency management software. The directory structure of the backend follows mainly [Django](https://www.djangoproject.com) one. The following table shows the main files and directories:
The `backend/` directory includes the backend's REST API code. These APIs are built by [Python 3.8+](https://www.python.org/) and [Django 4.0+](https://www.djangoproject.com). All of the packages are managed by Poetry, Python packaging, and dependency management software. The directory structure of the backend follows mainly the [Django](https://www.djangoproject.com) structure. The following table shows the main files and directories:
| file or directory | description |
| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@ -32,7 +32,7 @@ The `backend/` directory includes the backend's REST API code. These APIs are bu
| poetry.lock | Related to Poetry. This file prevents you from automatically getting the latest versions of your dependencies. See [Basic usage](https://python-poetry.org/docs/basic-usage/) in Poetry documentation. |
| pyproject.toml | This file contains build system requirements and information, which are used by pip to build the package. See [pyproject.toml](https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/) and [The pyproject.toml file in Poetry](https://python-poetry.org/docs/pyproject/) in detail. |
If you want to setup the backend environment, please see [Installation guide](./install_and_upgrade_doccano.md#install-from-source).
If you want to set up the backend environment, see the [Installation guide](./install_and_upgrade_doccano.md#install-from-source).
Also, you can set the following environment variables:
@ -68,7 +68,7 @@ On the other hand, the one of the `Dockerfile` is as follows:
## frontend
The `frontend/` directory contains frontend code. The `frontent` directory structure follows [Nuxt.js](https://ru.nuxtjs.org) one. See the [Nuxt.js documentation](https://nuxtjs.org/guide/directory-structure/) in details.
The `frontend/` directory contains frontend code. The `frontend` directory structure follows the [Nuxt.js](https://ru.nuxtjs.org) structure. See the [Nuxt.js documentation](https://nuxtjs.org/guide/directory-structure/) for details.
## tools

109
docs/faq.md

@ -1,70 +1,79 @@
# FAQ
## How to create a user
To create a new doccano user:
1. Run the doccano webserver.
2. Log in to the admin site (in the case of pip installation) via <http://localhost:{port}/admin/>. The example below uses the port `8000` and username `admin`. If you set your own port or username and password on running the server, use those values to log in.
After running doccano webserver, login to the admin site(in the case of pip installation) via <http://localhost:{port}/admin/>. The below is the example of port `8000` and username `admin`. If you set your own port or username and password on running the server, please change to your one.
![](images/faq/user_creation/login.png)
![](images/faq/user_creation/login.png)
3. After logging in to the admin site, click **Users**:
After login to the admin site, select `Users`:
![](images/faq/user_creation/select_users.png)
![](images/faq/user_creation/select_users.png)
4. Click the **ADD USER** button in the upper right corner:
Select the ADD USER button in the upper right corner:
![](images/faq/user_creation/select_add_user.png)
![](images/faq/user_creation/select_add_user.png)
5. After entering the username and password for the new user, click **SAVE**:
After entering the username and password for the new user, select the `SAVE` button:
![](images/faq/user_creation/create_user.png)
![](images/faq/user_creation/create_user.png)
Congratulations. Now you are able to log in to doccano as a new user. After logging out of the admin site, try logging in as a new user.
Congratulations. Now you can log in to doccano as a new user. After logging out of the admin site, try logging in to doccano as a new user.
## How to add a user to your project
Note: This step assumes you have already created a new user. See [How to create a user](#how-to-create-a-user) in detail.
**Note**: You must be the administrator of the project to add new users to it. These instructions also assume that you have already created a new user. See [How to create a user](#how-to-create-a-user) above.
After logging in to doccano, select your project. Note that you must be the administrator of the project to add users to the project.
To add a user to your project:
Select `Members` from the left side menu. If you are not the administrator of the project, `Members` will not be displayed.
1. Log in to doccano.
2. Click on your project.
3. From the left side menu, click **Members**. If you are not the administrator of the project, **Members** will not appear.
![](images/faq/add_annotator/select_members.png)
![](images/faq/add_annotator/select_members.png)
Select the `Add` button to display the form. Fill in this form with the user name and role you want to add to the project. Then, select the `Save` button.
4. Click **Add** and fill in the Add Member form with the user name and role you want to add to the project.
5. Click **Save**.
![](images/faq/add_annotator/select_user.png)
Congratulations. Now the new user are able to access the project.
Now the new user can access the project.
## How to change the password
After running doccano webserver, login to the admin site(in the case of pip installation) via <http://localhost:{port}/admin/>. Note that you need to have a staff permission to login to the admin site. If you don't have it, please ask the administrator to change your password.
To change a user's password:
1. Run the doccano webserver.
2. Log in to the admin site (in the case of pip installation) via <http://localhost:{port}/admin/>.
![](images/faq/user_creation/login.png)
**Note**: You need to have a staff permission to log in to the admin site. If you don't have the right permissions, ask the administrator to change your password.
After login to the admin site, select `Users`:
![](images/faq/user_creation/login.png)
![](images/faq/user_creation/select_users.png)
3. Click **Users**.
Select the user you want to change the password:
![](images/faq/user_creation/select_users.png)
![](images/faq/how_to_change_password/user_list.png)
4. Click on the name of the user whose password you want to change:
Click `this form` link:
![](images/faq/how_to_change_password/user_list.png)
![](images/faq/how_to_change_password/user_page.png)
5. Click the link that says **this form** in the password section.
After showing a form below, change password there:
![](images/faq/how_to_change_password/user_page.png)
![](images/faq/how_to_change_password/change_password.png)
6. Fill out the form and change the password.
![](images/faq/how_to_change_password/change_password.png)
## I can't upload my data
Please check the following list.
To troubleshoot, review this list:
- File encoding: `UTF-8` is appropriate.
- Filename: alphabetic file name is suitable.
- File format selection: File format radio button should be selected properly.
- File format selection: file format radio button should be selected properly.
- When you are using JSON/JSONL: Confirm JSON data is valid.
- You can use [JSONLint](https://jsonlint.com/) or some other tool (when JSONL, pick one data and check it).
- When you are using CSV: Confirm CSV data is valid.
@ -72,29 +81,31 @@ Please check the following list.
- Lack of line: Data file should not contain blank line.
- Lack of field: Data file should not contain blank field.
**You don't need your real & all data to validate file format. The picked data & masked data is suitable if your data is large or secret.**
**You don't need your real complete data to validate the file format. The picked data and masked data is suitable if your data is large or secret.**
## I want to change port number
## I want to change the port number
In the case of Docker Compose, you can change the port number by editing `docker-compose.prod.yml`. First, you change `80:8080` to `<your_port>:8080` in `nginx`/`ports` section as follows:
In the case of Docker Compose, you can change the port number by editing `docker-compose.prod.yml`.
```yaml
nginx:
image: doccano/doccano:frontend
ports:
- <your_port>:8080
```
1. Change `80:8080` to `<your_port>:8080` in `nginx`/`ports` section as follows:
Then, you need to add `CSRF_TRUSTED_ORIGINS` environment variable to `backend`/`environment` section as follows:
```yaml
nginx:
image: doccano/doccano:frontend
ports:
- <your_port>:8080
```
```yaml
backend:
image: doccano/doccano:backend
environment:
...
DJANGO_SETTINGS_MODULE: "config.settings.production"
CSRF_TRUSTED_ORIGINS: "http://127.0.0.1:<your_port>"
```
2. Add the `CSRF_TRUSTED_ORIGINS` environment variable to the `backend`/`environment` section as follows:
```yaml
backend:
image: doccano/doccano:backend
environment:
...
DJANGO_SETTINGS_MODULE: "config.settings.production"
CSRF_TRUSTED_ORIGINS: "http://127.0.0.1:<your_port>"
```
## I want to update to the latest doccano image
@ -121,13 +132,11 @@ local doccano_www
doccano uses JSONField on SQLite. So you need to enable the JSON1 extension on Python's sqlite3 library. If the extension is not enabled on your installation, a system error will be raised. This is especially related to the user who uses macOS and Python which is less than 3.7, Windows and Python which is less than 3.9.
If you have this problem, please try the following:
- [Enabling JSON1 extension on SQLite](https://code.djangoproject.com/wiki/JSON1Extension)
If you have this problem, try [enabling JSON1 extension on SQLite](https://code.djangoproject.com/wiki/JSON1Extension).
## CSRF failed
If you have this problem, please set `CSRF_TRUSTED_ORIGINS` environment variable to your domain name. For example, if your domain name is `example.com`, please set `CSRF_TRUSTED_ORIGINS=example.com`. In the debug mode, the default value is `http://127.0.0.1:3000`, `http://0.0.0.0:3000`, and `http://localhost:3000`. If you are using Docker Compose, please set `CSRF_TRUSTED_ORIGINS` in `docker-compose.prod.yml`:
If you have this problem, set the `CSRF_TRUSTED_ORIGINS` environment variable to your domain name. For example, if your domain name is `example.com`, set `CSRF_TRUSTED_ORIGINS=example.com`. In the debug mode, the default value is `http://127.0.0.1:3000`, `http://0.0.0.0:3000`, and `http://localhost:3000`. If you are using Docker Compose, set `CSRF_TRUSTED_ORIGINS` in `docker-compose.prod.yml`:
```yaml
backend:

56
docs/index.md

@ -2,49 +2,54 @@
## What is doccano?
doccano is an open-source data labeling tool for machine learning practitioners. You can perform different types of labeling tasks with many data formats. You can try doccano from the [demo page](http://doccano.herokuapp.com).
**doccano** is an open-source data labeling tool for machine learning practitioners. You can use doccano to perform different types of labeling tasks with many data formats. To see what doccano can do, try the [doccano demo](http://doccano.herokuapp.com).
![Demo image](https://raw.githubusercontent.com/doccano/doccano/master/docs/images/demo/demo.gif)
You can also integrate doccano with your script because it exposes the features as REST APIs. By using the APIs, you can label your data by using some machine learning model. See API documentation in detail.
You can also integrate doccano with your script via the doccano REST APIs. By using the doccano APIs, you can label your data by using some machine learning model.
## Labeling workflow with doccano
## Doccano labeling workflow
Start and finish a labeling project with doccano by the following steps:
To complete a labeling project with doccano:
1. Install doccano.
2. Run doccano.
3. Set up the labeling project. Select the type of labeling project and configure project settings.
4. Import dataset. You can also import labeled datasets.
4. Import your dataset. You can also import labeled datasets.
5. Add users to the project.
6. Define the annotation guideline.
7. Start labeling the data.
8. Export the labeled dataset.
## Quick start
## Quickstart
1. Install doccano:
1. Install doccano with pip (Python 3.8+):
```bash
pip install doccano
```
```bash
pip install doccano
```
2. Run doccano:
```bash
doccano init
doccano createuser
doccano webserver
# In another terminal, run the following command:
doccano task
```
3. Open doccano UI at <http://localhost:8000>.
4. Sign up with a username and password created by the `doccano createuser`.
5. Click `Create` to create a project and start labeling data.
6. Click `Import dataset` on the dataset page and import the dataset you want to use.
7. Click `Start annotation` and label the data.
8. Click `Export dataset` on the dataset page and export the labeled dataset.
```bash
doccano init
doccano createuser
doccano webserver
# In another terminal, run the command:
doccano task
```
3. Open the doccano UI at <http://localhost:8000/auth>.
4. Sign in with the username and password created by `doccano createuser`. The default is **username:** admin, **password:** password.
5. Change the default admin password at <http://localhost:8000/admin/password_change/>.
6. Return to the doccano UI at <http://localhost:8000/projects?>.
7. Create a project for labeling data. Click **Create**, select a project type, and fill out project details.
8. Import a dataset. Go to the **Dataset** page and click **Actions** > **Import Dataset** and import the dataset you want to use.
9. Click **Annotate** and label the data.
10. When you're finished, export the labeled dataset. Go to the **Dataset** page and click **Actions** > **Export dataset**.
## Architecture
@ -56,5 +61,6 @@ You can customize doccano to suit your needs. The architecture of doccano consis
| [doccano frontend](https://github.com/doccano/doccano/tree/master/frontend) | Javascript web app using [Vue.js](https://vuejs.org/) and [Nuxt.js](https://nuxtjs.org/) | Perform data labeling in a user interface. |
## Contact
If you get stuck, check the [FAQ](../docs/faq.md).
For help and feedback, please feel free to contact [the author](https://github.com/Hironsan).
For help and feedback, feel free to contact [the author](https://github.com/Hironsan).

6
docs/mkdocs.yml

@ -38,8 +38,8 @@ nav:
- Advanced:
- AWS HTTPS settings: advanced/aws_https_settings.md
- OAuth2 settings: advanced/oauth2_settings.md
- Auto Labeling settings: advanced/auto_labelling_config.md
- Developer Guide: developer_guide.md
- Auto labeling settings: advanced/auto_labelling_config.md
- Developer guide: developer_guide.md
- FAQ: faq.md
- Code of Conduct: CODE_OF_CONDUCT.md
- Code of conduct: CODE_OF_CONDUCT.md
- Roadmap: roadmap.md

60
docs/tutorial.md

@ -1,8 +1,10 @@
# Tutorial
This tutorial demonstrates how to use doccano to complete a named entity recognition annotation task for an example science fiction dataset.
## Dataset
Here we take named entity recognition annotation task for science fiction to give you a brief tutorial on doccano. Below is a JSON file named `books.json` containing lots of science fictions description with different languages. We need to annotate some entities like person name, book title, date and so on.
Here is a JSON file named `books.json` containing lots of science fiction book descriptions in different languages. We need to annotate some entities like names, book titles, dates, and so on.
```json
{"text": "The Hitchhiker's Guide to the Galaxy (sometimes referred to as HG2G, HHGTTGor H2G2) is a comedy science fiction series created by Douglas Adams. Originally a radio comedy broadcast on BBC Radio 4 in 1978, it was later adapted to other formats, including stage shows, novels, comic books, a 1981 TV series, a 1984 video game, and 2005 feature film."}
@ -12,55 +14,73 @@ Here we take named entity recognition annotation task for science fiction to giv
## Create a project
We need to create a new project for this task. Log in with the superuser account.
To start, let's create a new project for this task.
1. Log in to doccano with the superuser account.
![Sign in as a superuser.](./images/tutorial/signin.png)
![Sign in as a superuser.](./images/tutorial/signin.png)
To create your project, make sure you're in the project list page and click `Create` button. As for this tutorial, we name the project as `sequence labeling for books`, write some description, choose the sequence labeling task type.
2. To create your project, go to the project list page and click **Create**
3. Fill out the project details. For this tutorial, name the project `sequence labeling for books`, write a description, and choose the sequence labeling task type.
![Creating a project.](./images/tutorial/create_project.png)
![Creating a project.](./images/tutorial/create_project.png)
## Import a dataset
After creating a project, we will see the `Dataset` page, and click `Import dataset` button in the `Actions` menu. We should see the following screen:
After creating a project, the **Dataset** page appears.
To import a dataset:
1. Click **Actions** > **Import Dataset**. You should see the following screen:
![Importing a dataset.](./images/tutorial/import_dataset.png)
![Importing a dataset.](./images/tutorial/import_dataset.png)
We choose `JSON` and click `Select a file` button. Select `books.json` and it would be loaded automatically.
2. Choose **JSON** and click **Select a file.
3. Click **books.json** and it will load automatically.
## Define labels
Click `Labels` button in the left left side menu to define our labels. We should see the label editor page. In label editor page, you can create labels by specifying label text, shortcut key, background color and text color.
Define the labels to use for your annotation project:
![Defining labels.](./images/tutorial/define_labels.png)
1. Click **Labels** in the left side menu. You should see the label editor page.
2. On the label editor page, create labels by specifying label text, a shortcut key, background color, and text color. For this tutorial, let's create some entities related to science fiction, as shown below.
As for the tutorial, we created some entities related to science fictions.
![Defining labels.](./images/tutorial/define_labels.png)
## Add members
Click `Members` button in the left side menu. If you are not the project administrator, the button won't be displayed.
Members are users who can participate in labeling activities. To add members:
![](images/faq/add_annotator/select_members.png)
1. Click **Members** in the left side menu. If you are not the project administrator, the button won't appear.
Then, select the `Add` button to display the form. Fill in this form with the user name and role you want to add to the project. Then, select the `Save` button.
![](images/faq/add_annotator/select_members.png)
![](images/faq/add_annotator/select_user.png)
2. Click **Add** to display the Add Member form.
If there is no user to select, please create users(see [FAQ](./faq.md)).
![](images/faq/add_annotator/select_user.png)
3. Fill in the form with the user name and role you want to add to the project. If there is no user to select, you need to create the user first. See the [FAQ](./faq.md) for instructions.
4. Click **Save**.
## Annotation
Next, we are ready to annotate the texts. Just click the `Start annotation` button in the navigation bar, we can start to annotate the documents.
Next, let's annotate the texts.
Click **Start annotation** in the navigation bar to start annotating the documents.
![Annotating named entities.](./images/tutorial/annotation.png)
## Export the dataset
After the annotation step, we can download the annotated data. Go to the `Dataset` page and click the `Export dataset` button in the `Action` menu. After selecting an export format, click `Export`. You should see the following screen:
After finishing the annotation step, let's download the annotated data.
1. Go to the **Dataset** page and click **Action** > **Export Dataset**.
2. Select an export format. For this tutorial choose the JSONL format.
3. Click **Export**. You should see this screen:
![Exporting a dataset.](./images/tutorial/export_dataset.png)
Here we choose JSONL file to download the data by clicking the button. Below is the annotated result for our tutorial project.
Below is the annotated result for this tutorial.
`sequence_labeling_for_books.json`
@ -71,4 +91,4 @@ Here we choose JSONL file to download the data by clicking the button. Below is
"username": "admin"}
```
Congratulation! You just mastered how to use doccano for a sequence labeling project.
Congratulations! You just explored how to use doccano for a sequence labeling project.
Loading…
Cancel
Save