mirror of https://github.com/doccano/doccano.git
Browse Source
Merge pull request #1699 from doccano/enhancement/documentation
Merge pull request #1699 from doccano/enhancement/documentation
[Enhancement] documentationpull/1709/head
committed by
GitHub
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 268 additions and 173 deletions
Split View
Diff Options
-
129docs/getting-started.md
-
82docs/index.md
-
224docs/install-and-upgrade-doccano.md
-
6docs/mkdocs.yml
@ -1,129 +0,0 @@ |
|||
# Getting started |
|||
|
|||
## Usage |
|||
|
|||
doccano has two options to run: |
|||
|
|||
- (Recommended) Docker Compose |
|||
- Docker |
|||
|
|||
The usage of docker compose version is explained in the [README.md](https://github.com/doccano/doccano/blob/master/README.md#usage). We highly recommend that you should use docker compose version. However, we explain the usage of Docker version and Python/Node version for the additional information. |
|||
|
|||
### Docker |
|||
|
|||
As a one-time setup, create a Docker container for Doccano: |
|||
|
|||
```bash |
|||
docker pull doccano/doccano |
|||
docker container create --name doccano \ |
|||
-e "ADMIN_USERNAME=admin" \ |
|||
-e "ADMIN_EMAIL=admin@example.com" \ |
|||
-e "ADMIN_PASSWORD=password" \ |
|||
-p 8000:8000 doccano/doccano |
|||
``` |
|||
|
|||
Next, start Doccano by running the container: |
|||
|
|||
```bash |
|||
docker container start doccano |
|||
``` |
|||
|
|||
To stop the container, run `docker container stop doccano -t 5`. |
|||
All data created in the container will persist across restarts. |
|||
|
|||
Go to <http://127.0.0.1:8000/>. |
|||
|
|||
### Setup development environment |
|||
|
|||
You can setup development environment via Python and Node.js. You need to install Git and to clone the repository: |
|||
|
|||
```bash |
|||
git clone https://github.com/doccano/doccano.git |
|||
cd doccano |
|||
``` |
|||
|
|||
### Backend |
|||
|
|||
The doccano backend is built in Python 3.8+ and uses [Poetry](https://github.com/python-poetry/poetry) as a dependency manager. If you haven't installed them yet, please see [Python](https://www.python.org/downloads/) and [Poetry](https://python-poetry.org/docs/) documentation. |
|||
|
|||
First, to install the defined dependencies for our project, just run the `install` command. After that, activate the virtual environment by runnning `shell` command: |
|||
|
|||
```bash |
|||
cd backend |
|||
poetry install |
|||
poetry shell |
|||
``` |
|||
|
|||
Second, setup database and run the development server. Doccano uses [Django](https://www.djangoproject.com/) and [Django Rest Framework](https://www.django-rest-framework.org/) as a backend. We can setup them by using Django command: |
|||
|
|||
```bash |
|||
python manage.py migrate |
|||
python manage.py create_roles |
|||
python manage.py create_admin --noinput --username "admin" --email "admin@example.com" --password "password" |
|||
python manage.py runserver |
|||
``` |
|||
|
|||
In another terminal, you need to run Celery to use import/export dataset feature: |
|||
|
|||
```bash |
|||
cd doccano/backend |
|||
celery --app=config worker --loglevel=INFO --concurrency=1 |
|||
``` |
|||
|
|||
After you change the code, don't forget to run [mypy](https://mypy.readthedocs.io/en/stable/index.html), [flake8](https://flake8.pycqa.org/en/latest/), [black](https://github.com/psf/black), and [isort](https://github.com/PyCQA/isort). These ensures the code consistency. To run them, just run the following commands: |
|||
|
|||
```bash |
|||
poetry run task mypy |
|||
poetry run task flake8 |
|||
poetry run task black |
|||
poetry run task isort |
|||
``` |
|||
|
|||
Similarly, you can run the test by executing the following command: |
|||
|
|||
```bash |
|||
poetry run task test |
|||
``` |
|||
|
|||
Did you pass the test? Great! |
|||
|
|||
### Frontend |
|||
|
|||
The doccano frontend is built in Node.js and uses [Yarn](https://yarnpkg.com/) as a package manager. If you haven't installed them yet, please see [Node.js](https://nodejs.org/en/) and [Yarn](https://yarnpkg.com/) documentation. |
|||
|
|||
First, to install the defined dependencies for our project, just run the `install` command. |
|||
|
|||
```bash |
|||
cd frontend |
|||
yarn install |
|||
``` |
|||
|
|||
Then run the `dev` command to serve with hot reload at <localhost:3000>: |
|||
|
|||
```bash |
|||
yarn dev |
|||
``` |
|||
|
|||
## How to create a Python package |
|||
|
|||
During development, you may want to create a Python package and verify it works correctly. In such a case, you can create a package by running the following command in the root directory of your project: |
|||
|
|||
```bash |
|||
./tools/create-package.sh |
|||
``` |
|||
|
|||
This command builds the frontend, copies the files, and packages them. This will take a few minutes. After finishing the command, you will find `sdist` and `wheel` in `backend/dist`: |
|||
|
|||
```bash |
|||
Building doccano (1.5.5.post335.dev0+6be6d198) |
|||
- Building sdist |
|||
- Built doccano-1.5.5.post335.dev0+6be6d198.tar.gz |
|||
- Building wheel |
|||
- Built doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl |
|||
``` |
|||
|
|||
Then, you can install the package via `pip install` command: |
|||
|
|||
```bash |
|||
pip install doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl |
|||
``` |
@ -1,58 +1,60 @@ |
|||
# Welcome to doccano |
|||
# Get started with doccano |
|||
|
|||
## Text Annotation for Humans |
|||
## What is doccano? |
|||
|
|||
doccano is an open source text annotation tool built for human beings. It provides annotation features for text classification, sequence labeling and sequence to sequence. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create project, upload your data and start annotating. You can build a dataset in hours. |
|||
doccano is an open-source data labeling tool for machine learning practitioners. You can perform different types of labeling tasks with many data formats. You can try doccano from the [demo page](http://doccano.herokuapp.com). |
|||
|
|||
## Demo |
|||
 |
|||
|
|||
You can enjoy this [annotation demo](http://doccano.herokuapp.com). |
|||
You can also integrate doccano with your script because it exposes the features as REST APIs. By using the APIs, you can label your data by using some machine learning model. See API documentation in detail. |
|||
|
|||
### [Named entity recognition](https://doccano.herokuapp.com/demo/named-entity-recognition/) |
|||
## Labeling workflow with doccano |
|||
|
|||
First demo is one of the sequence labeling tasks, named-entity recognition. You just select text spans and annotate them. Since doccano supports shortcut keys, you can quickly annotate text spans. |
|||
Start and finish a labeling project with doccano by the following steps: |
|||
|
|||
 |
|||
1. Install doccano. |
|||
2. Run doccano. |
|||
3. Set up the labeling project. Select the type of labeling project and configure project settings. |
|||
4. Import dataset. You can also import labeled datasets. |
|||
5. Add users to the project. |
|||
6. Define the annotation guideline. |
|||
7. Start labeling the data. |
|||
8. Export the labeled dataset. |
|||
|
|||
### [Text Classification](https://doccano.herokuapp.com/demo/text-classification/) |
|||
## Quick start |
|||
|
|||
Second demo is one of the text classification tasks, topic classification. Since there may be more than one category, you can annotate multi-labels. |
|||
1. Install doccano: |
|||
|
|||
 |
|||
|
|||
### [Machine translation](https://doccano.herokuapp.com/demo/translation/) |
|||
|
|||
Final demo is one of the sequence to sequence tasks, machine translation. Since there may be more than one responses in sequence to sequence tasks, you can create multiple responses. |
|||
|
|||
 |
|||
|
|||
## Quick Deployment |
|||
|
|||
<!-- ### Azure |
|||
|
|||
Doccano can be deployed to Azure ([Web App for Containers](https://azure.microsoft.com/en-us/services/app-service/containers/) + |
|||
[PostgreSQL database](https://azure.microsoft.com/en-us/services/postgresql/)) by clicking on the button below: |
|||
|
|||
[](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fdoccano%2Fdoccano%2Fmaster%2Fazuredeploy.json) --> |
|||
|
|||
### Heroku |
|||
|
|||
Doccano can be deployed to [Heroku](https://www.heroku.com/) by clicking on the button below: |
|||
|
|||
[](https://heroku.com/deploy) |
|||
```bash |
|||
pip install doccano |
|||
``` |
|||
|
|||
Of course, you can deploy doccano by using [heroku-cli](https://devcenter.heroku.com/articles/heroku-cli). |
|||
2. Run doccano: |
|||
|
|||
```bash |
|||
heroku create |
|||
heroku stack:set container |
|||
git push heroku master |
|||
doccano init |
|||
doccano createuser |
|||
doccano webserver |
|||
# In another terminal, run the following command: |
|||
doccano task |
|||
``` |
|||
|
|||
### AWS |
|||
3. Open doccano UI at <http://localhost:8000>. |
|||
4. Sign up with a username and password created by the `doccano createuser`. |
|||
5. Click `Create` to create a project and start labeling data. |
|||
6. Click `Import dataset` on the dataset page and import the dataset you want to use. |
|||
7. Click `Start annotation` and label the data. |
|||
8. Click `Export dataset` on the dataset page and export the labeled dataset. |
|||
|
|||
## Architecture |
|||
|
|||
You can customize doccano to suit your needs. The architecture of doccano consists of two parts: backend and frontend. |
|||
|
|||
Doccano can be deployed to AWS ([Cloudformation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html)) by clicking on the button below: |
|||
| Module | Technology | Description | |
|||
| ---------------- | ------------------------------------------- | ------------------------------------------ | |
|||
| [doccano backend](https://github.com/doccano/doccano/tree/master/backend) | Python, [Django](https://www.djangoproject.com/), and [Django Rest Framework](https://www.django-rest-framework.org/) | Perform data labeling via REST APIs. | |
|||
| [doccano frontend](https://github.com/doccano/doccano/tree/master/frontend) | Javascript web app using [Vue.js](https://vuejs.org/) and [Nuxt.js](https://nuxtjs.org/) | Perform data labeling in a user interface. | |
|||
|
|||
[](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://s3-external-1.amazonaws.com/cf-templates-10vry9l3mp71r-us-east-1/20190732wl-new.templatexloywxxyimi&stackName=doccano) |
|||
## Contact |
|||
|
|||
> Notice: (1) EC2 KeyPair cannot be created automatically, so make sure you have an existing EC2 KeyPair in one region. Or [create one yourself](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair). (2) If you want to access doccano via HTTPS in AWS, here is an [instruction](https://github.com/doccano/doccano/wiki/HTTPS-setting-for-doccano-in-AWS). |
|||
For help and feedback, please feel free to contact [the author](https://github.com/Hironsan). |
@ -0,0 +1,224 @@ |
|||
# Install doccano |
|||
|
|||
Install doccano on local or in the cloud. Choose the installation method that works best for your environment: |
|||
|
|||
- [Install with pip](#install-with-pip) |
|||
- [Install with Docker](#install-with-docker) |
|||
- [Install with Docker Compose](#install-with-docker-compose) |
|||
- [Install from source](#install-from-source) |
|||
- [Install to cloud](#install-to-cloud) |
|||
- [Upgrade doccano](#upgrade-doccano) |
|||
|
|||
## System requirements |
|||
|
|||
You can install doccano on a Linux, Windows, or macOS machine running Python 3.8+. |
|||
|
|||
### Web browser support |
|||
|
|||
doccano is tested with the latest version of Google Chrome and is expected to work in the latest versions of: |
|||
|
|||
- Google Chrome |
|||
- Apple Safari |
|||
|
|||
If using other web browsers, or older versions of supported web browsers, unexpected behavior could occur. |
|||
|
|||
### Port requirements |
|||
|
|||
doccano uses port 8000 by default. To use a different port, specify it when running doccano webserver. |
|||
|
|||
## Install with pip |
|||
|
|||
To install doccano with pip, you need Python 3.8+. Run the following: |
|||
|
|||
```bash |
|||
pip install doccano |
|||
``` |
|||
|
|||
After you install doccano, start the server with the following command: |
|||
|
|||
```bash |
|||
# Initialize database. First time only. |
|||
doccano init |
|||
# Create a super user. First time only. |
|||
doccano createuser --username admin --password pass |
|||
# Start a web server. |
|||
doccano webserver --port 8000 |
|||
``` |
|||
|
|||
In another terminal, run the following command: |
|||
|
|||
```bash |
|||
# Start the task queue to handle file upload/download. |
|||
doccano task |
|||
``` |
|||
|
|||
Open <http://localhost:8000/>. |
|||
|
|||
## Install with Docker |
|||
|
|||
doccano is also available as a [Docker](https://www.docker.com/) container. Make sure you have Docker installed on your machine. |
|||
|
|||
To install and start doccano at <http://localhost:8000>, run the following command: |
|||
|
|||
```bash |
|||
docker pull doccano/doccano |
|||
docker container create --name doccano \ |
|||
-e "ADMIN_USERNAME=admin" \ |
|||
-e "ADMIN_EMAIL=admin@example.com" \ |
|||
-e "ADMIN_PASSWORD=password" \ |
|||
-p 8000:8000 doccano/doccano |
|||
``` |
|||
|
|||
Next, start doccano by running the container: |
|||
|
|||
```bash |
|||
docker container start doccano |
|||
``` |
|||
|
|||
To stop the container, run `docker container stop doccano -t 5`. |
|||
All data created in the container persist across restarts. |
|||
|
|||
### Build a local image with Docker |
|||
|
|||
If you want to build a local image, run: |
|||
|
|||
```bash |
|||
docker build -t doccano:latest . -f docker/Dockerfile |
|||
``` |
|||
|
|||
## Install with Docker Compose |
|||
|
|||
You need to install Git and to clone the repository: |
|||
|
|||
```bash |
|||
git clone https://github.com/doccano/doccano.git |
|||
cd doccano |
|||
``` |
|||
|
|||
To install and start doccano at <http://localhost>, run the following command: |
|||
|
|||
```bash |
|||
docker-compose -f docker/docker-compose.prod.yml --env-file ./docker/.env.example up |
|||
``` |
|||
|
|||
You can override the default setting by rewriting the `.env` file. |
|||
|
|||
## Install from source |
|||
|
|||
If you want to develop doccano, consider downloading the source code using Git and running doccano locally. First of all, clone the repository: |
|||
|
|||
```bash |
|||
git clone https://github.com/doccano/doccano.git |
|||
cd doccano |
|||
``` |
|||
|
|||
### Backend |
|||
|
|||
The doccano backend is built in Python 3.8+ and uses [Poetry](https://github.com/python-poetry/poetry) as a dependency manager. If you haven't installed them yet, please see [Python](https://www.python.org/downloads/) and [Poetry](https://python-poetry.org/docs/) documentation. |
|||
|
|||
First, to install the defined dependencies for our project, just run the `install` command. After that, activate the virtual environment by running `shell` command: |
|||
|
|||
```bash |
|||
cd backend |
|||
poetry install |
|||
poetry shell |
|||
``` |
|||
|
|||
Second, set up the database and run the development server. Doccano uses [Django](https://www.djangoproject.com/) and [Django Rest Framework](https://www.django-rest-framework.org/) as a backend. We can set up them by using Django command: |
|||
|
|||
```bash |
|||
python manage.py migrate |
|||
python manage.py create_roles |
|||
python manage.py create_admin --noinput --username "admin" --email "admin@example.com" --password "password" |
|||
python manage.py runserver |
|||
``` |
|||
|
|||
In another terminal, you need to run Celery to use import/export dataset feature: |
|||
|
|||
```bash |
|||
cd doccano/backend |
|||
celery --app=config worker --loglevel=INFO --concurrency=1 |
|||
``` |
|||
|
|||
After you change the code, don't forget to run [mypy](https://mypy.readthedocs.io/en/stable/index.html), [flake8](https://flake8.pycqa.org/en/latest/), [black](https://github.com/psf/black), and [isort](https://github.com/PyCQA/isort). These ensure code consistency. To run them, just run the following commands: |
|||
|
|||
```bash |
|||
poetry run task mypy |
|||
poetry run task flake8 |
|||
poetry run task black |
|||
poetry run task isort |
|||
``` |
|||
|
|||
Similarly, you can run the test by executing the following command: |
|||
|
|||
```bash |
|||
poetry run task test |
|||
``` |
|||
|
|||
Did you pass the test? Great! |
|||
|
|||
### Frontend |
|||
|
|||
The doccano frontend is built in Node.js and uses [Yarn](https://yarnpkg.com/) as a package manager. If you haven't installed them yet, please see [Node.js](https://nodejs.org/en/) and [Yarn](https://yarnpkg.com/) documentation. |
|||
|
|||
First, to install the defined dependencies for our project, just run the `install` command. |
|||
|
|||
```bash |
|||
cd frontend |
|||
yarn install |
|||
``` |
|||
|
|||
Then run the `dev` command to serve with hot reload at <localhost:3000>: |
|||
|
|||
```bash |
|||
yarn dev |
|||
``` |
|||
|
|||
### How to create a Python package |
|||
|
|||
During development, you may want to create a Python package and verify it works correctly. In such a case, you can create a package by running the following command in the root directory of your project: |
|||
|
|||
```bash |
|||
./tools/create-package.sh |
|||
``` |
|||
|
|||
This command builds the frontend, copies the files, and packages them. This will take a few minutes. After finishing the command, you will find `sdist` and `wheel` in `backend/dist`: |
|||
|
|||
```bash |
|||
Building doccano (1.5.5.post335.dev0+6be6d198) |
|||
- Building sdist |
|||
- Built doccano-1.5.5.post335.dev0+6be6d198.tar.gz |
|||
- Building wheel |
|||
- Built doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl |
|||
``` |
|||
|
|||
Then, you can install the package via `pip install` command: |
|||
|
|||
```bash |
|||
pip install doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl |
|||
``` |
|||
|
|||
## Install to cloud |
|||
|
|||
doccano also supports one-click deployment to cloud providers. Click the following button, configure the environment, and access the UI. |
|||
|
|||
| Service | Button | |
|||
|---------|---| |
|||
| AWS | [](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=doccano&templateURL=https://doccano.s3.amazonaws.com/public/cloudformation/template.aws.yaml) | |
|||
| Heroku | [](https://dashboard.heroku.com/new?template=https%3A%2F%2Fgithub.com%2Fdoccano%2Fdoccano) | |
|||
|
|||
## Upgrade doccano |
|||
|
|||
Caution: If you use SQLite3 as a database, upgrading the package would lose your database. |
|||
|
|||
To upgrade to the latest version of doccano, reinstall or upgrade using pip. |
|||
|
|||
```bash |
|||
pip install -U doccano |
|||
``` |
|||
|
|||
If you need update the database scheme, run the following: |
|||
|
|||
```bash |
|||
doccano migrate |
|||
``` |
Write
Preview
Loading…
Cancel
Save