Browse Source

Merge pull request #1699 from doccano/enhancement/documentation

[Enhancement] documentation
pull/1709/head
Hiroki Nakayama 3 years ago
committed by GitHub
parent
commit
bfdecb92fe
No known key found for this signature in database GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 268 additions and 173 deletions
  1. 129
      docs/getting-started.md
  2. 82
      docs/index.md
  3. 224
      docs/install-and-upgrade-doccano.md
  4. 6
      docs/mkdocs.yml

129
docs/getting-started.md

@ -1,129 +0,0 @@
# Getting started
## Usage
doccano has two options to run:
- (Recommended) Docker Compose
- Docker
The usage of docker compose version is explained in the [README.md](https://github.com/doccano/doccano/blob/master/README.md#usage). We highly recommend that you should use docker compose version. However, we explain the usage of Docker version and Python/Node version for the additional information.
### Docker
As a one-time setup, create a Docker container for Doccano:
```bash
docker pull doccano/doccano
docker container create --name doccano \
-e "ADMIN_USERNAME=admin" \
-e "ADMIN_EMAIL=admin@example.com" \
-e "ADMIN_PASSWORD=password" \
-p 8000:8000 doccano/doccano
```
Next, start Doccano by running the container:
```bash
docker container start doccano
```
To stop the container, run `docker container stop doccano -t 5`.
All data created in the container will persist across restarts.
Go to <http://127.0.0.1:8000/>.
### Setup development environment
You can setup development environment via Python and Node.js. You need to install Git and to clone the repository:
```bash
git clone https://github.com/doccano/doccano.git
cd doccano
```
### Backend
The doccano backend is built in Python 3.8+ and uses [Poetry](https://github.com/python-poetry/poetry) as a dependency manager. If you haven't installed them yet, please see [Python](https://www.python.org/downloads/) and [Poetry](https://python-poetry.org/docs/) documentation.
First, to install the defined dependencies for our project, just run the `install` command. After that, activate the virtual environment by runnning `shell` command:
```bash
cd backend
poetry install
poetry shell
```
Second, setup database and run the development server. Doccano uses [Django](https://www.djangoproject.com/) and [Django Rest Framework](https://www.django-rest-framework.org/) as a backend. We can setup them by using Django command:
```bash
python manage.py migrate
python manage.py create_roles
python manage.py create_admin --noinput --username "admin" --email "admin@example.com" --password "password"
python manage.py runserver
```
In another terminal, you need to run Celery to use import/export dataset feature:
```bash
cd doccano/backend
celery --app=config worker --loglevel=INFO --concurrency=1
```
After you change the code, don't forget to run [mypy](https://mypy.readthedocs.io/en/stable/index.html), [flake8](https://flake8.pycqa.org/en/latest/), [black](https://github.com/psf/black), and [isort](https://github.com/PyCQA/isort). These ensures the code consistency. To run them, just run the following commands:
```bash
poetry run task mypy
poetry run task flake8
poetry run task black
poetry run task isort
```
Similarly, you can run the test by executing the following command:
```bash
poetry run task test
```
Did you pass the test? Great!
### Frontend
The doccano frontend is built in Node.js and uses [Yarn](https://yarnpkg.com/) as a package manager. If you haven't installed them yet, please see [Node.js](https://nodejs.org/en/) and [Yarn](https://yarnpkg.com/) documentation.
First, to install the defined dependencies for our project, just run the `install` command.
```bash
cd frontend
yarn install
```
Then run the `dev` command to serve with hot reload at <localhost:3000>:
```bash
yarn dev
```
## How to create a Python package
During development, you may want to create a Python package and verify it works correctly. In such a case, you can create a package by running the following command in the root directory of your project:
```bash
./tools/create-package.sh
```
This command builds the frontend, copies the files, and packages them. This will take a few minutes. After finishing the command, you will find `sdist` and `wheel` in `backend/dist`:
```bash
Building doccano (1.5.5.post335.dev0+6be6d198)
- Building sdist
- Built doccano-1.5.5.post335.dev0+6be6d198.tar.gz
- Building wheel
- Built doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl
```
Then, you can install the package via `pip install` command:
```bash
pip install doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl
```

82
docs/index.md

@ -1,58 +1,60 @@
# Welcome to doccano
# Get started with doccano
## Text Annotation for Humans
## What is doccano?
doccano is an open source text annotation tool built for human beings. It provides annotation features for text classification, sequence labeling and sequence to sequence. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create project, upload your data and start annotating. You can build a dataset in hours.
doccano is an open-source data labeling tool for machine learning practitioners. You can perform different types of labeling tasks with many data formats. You can try doccano from the [demo page](http://doccano.herokuapp.com).
## Demo
![Demo image](https://raw.githubusercontent.com/doccano/doccano/master/docs/images/demo/demo.gif)
You can enjoy this [annotation demo](http://doccano.herokuapp.com).
You can also integrate doccano with your script because it exposes the features as REST APIs. By using the APIs, you can label your data by using some machine learning model. See API documentation in detail.
### [Named entity recognition](https://doccano.herokuapp.com/demo/named-entity-recognition/)
## Labeling workflow with doccano
First demo is one of the sequence labeling tasks, named-entity recognition. You just select text spans and annotate them. Since doccano supports shortcut keys, you can quickly annotate text spans.
Start and finish a labeling project with doccano by the following steps:
![Named Entity Recognition](./images/demo/named_entity_annotation.gif)
1. Install doccano.
2. Run doccano.
3. Set up the labeling project. Select the type of labeling project and configure project settings.
4. Import dataset. You can also import labeled datasets.
5. Add users to the project.
6. Define the annotation guideline.
7. Start labeling the data.
8. Export the labeled dataset.
### [Text Classification](https://doccano.herokuapp.com/demo/text-classification/)
## Quick start
Second demo is one of the text classification tasks, topic classification. Since there may be more than one category, you can annotate multi-labels.
1. Install doccano:
![Text Classification](./images/demo/text_classification.gif)
### [Machine translation](https://doccano.herokuapp.com/demo/translation/)
Final demo is one of the sequence to sequence tasks, machine translation. Since there may be more than one responses in sequence to sequence tasks, you can create multiple responses.
![Machine Translation](./images/demo/translation.gif)
## Quick Deployment
<!-- ### Azure
Doccano can be deployed to Azure ([Web App for Containers](https://azure.microsoft.com/en-us/services/app-service/containers/) +
[PostgreSQL database](https://azure.microsoft.com/en-us/services/postgresql/)) by clicking on the button below:
[![Deploy to Azure](https://azuredeploy.net/deploybutton.svg)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fdoccano%2Fdoccano%2Fmaster%2Fazuredeploy.json) -->
### Heroku
Doccano can be deployed to [Heroku](https://www.heroku.com/) by clicking on the button below:
[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy)
```bash
pip install doccano
```
Of course, you can deploy doccano by using [heroku-cli](https://devcenter.heroku.com/articles/heroku-cli).
2. Run doccano:
```bash
heroku create
heroku stack:set container
git push heroku master
doccano init
doccano createuser
doccano webserver
# In another terminal, run the following command:
doccano task
```
### AWS
3. Open doccano UI at <http://localhost:8000>.
4. Sign up with a username and password created by the `doccano createuser`.
5. Click `Create` to create a project and start labeling data.
6. Click `Import dataset` on the dataset page and import the dataset you want to use.
7. Click `Start annotation` and label the data.
8. Click `Export dataset` on the dataset page and export the labeled dataset.
## Architecture
You can customize doccano to suit your needs. The architecture of doccano consists of two parts: backend and frontend.
Doccano can be deployed to AWS ([Cloudformation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html)) by clicking on the button below:
| Module | Technology | Description |
| ---------------- | ------------------------------------------- | ------------------------------------------ |
| [doccano backend](https://github.com/doccano/doccano/tree/master/backend) | Python, [Django](https://www.djangoproject.com/), and [Django Rest Framework](https://www.django-rest-framework.org/) | Perform data labeling via REST APIs. |
| [doccano frontend](https://github.com/doccano/doccano/tree/master/frontend) | Javascript web app using [Vue.js](https://vuejs.org/) and [Nuxt.js](https://nuxtjs.org/) | Perform data labeling in a user interface. |
[![AWS CloudFormation Launch Stack SVG Button](https://cdn.rawgit.com/buildkite/cloudformation-launch-stack-button-svg/master/launch-stack.svg)](https://us-east-1.console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?templateURL=https://s3-external-1.amazonaws.com/cf-templates-10vry9l3mp71r-us-east-1/20190732wl-new.templatexloywxxyimi&stackName=doccano)
## Contact
> Notice: (1) EC2 KeyPair cannot be created automatically, so make sure you have an existing EC2 KeyPair in one region. Or [create one yourself](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair). (2) If you want to access doccano via HTTPS in AWS, here is an [instruction](https://github.com/doccano/doccano/wiki/HTTPS-setting-for-doccano-in-AWS).
For help and feedback, please feel free to contact [the author](https://github.com/Hironsan).

224
docs/install-and-upgrade-doccano.md

@ -0,0 +1,224 @@
# Install doccano
Install doccano on local or in the cloud. Choose the installation method that works best for your environment:
- [Install with pip](#install-with-pip)
- [Install with Docker](#install-with-docker)
- [Install with Docker Compose](#install-with-docker-compose)
- [Install from source](#install-from-source)
- [Install to cloud](#install-to-cloud)
- [Upgrade doccano](#upgrade-doccano)
## System requirements
You can install doccano on a Linux, Windows, or macOS machine running Python 3.8+.
### Web browser support
doccano is tested with the latest version of Google Chrome and is expected to work in the latest versions of:
- Google Chrome
- Apple Safari
If using other web browsers, or older versions of supported web browsers, unexpected behavior could occur.
### Port requirements
doccano uses port 8000 by default. To use a different port, specify it when running doccano webserver.
## Install with pip
To install doccano with pip, you need Python 3.8+. Run the following:
```bash
pip install doccano
```
After you install doccano, start the server with the following command:
```bash
# Initialize database. First time only.
doccano init
# Create a super user. First time only.
doccano createuser --username admin --password pass
# Start a web server.
doccano webserver --port 8000
```
In another terminal, run the following command:
```bash
# Start the task queue to handle file upload/download.
doccano task
```
Open <http://localhost:8000/>.
## Install with Docker
doccano is also available as a [Docker](https://www.docker.com/) container. Make sure you have Docker installed on your machine.
To install and start doccano at <http://localhost:8000>, run the following command:
```bash
docker pull doccano/doccano
docker container create --name doccano \
-e "ADMIN_USERNAME=admin" \
-e "ADMIN_EMAIL=admin@example.com" \
-e "ADMIN_PASSWORD=password" \
-p 8000:8000 doccano/doccano
```
Next, start doccano by running the container:
```bash
docker container start doccano
```
To stop the container, run `docker container stop doccano -t 5`.
All data created in the container persist across restarts.
### Build a local image with Docker
If you want to build a local image, run:
```bash
docker build -t doccano:latest . -f docker/Dockerfile
```
## Install with Docker Compose
You need to install Git and to clone the repository:
```bash
git clone https://github.com/doccano/doccano.git
cd doccano
```
To install and start doccano at <http://localhost>, run the following command:
```bash
docker-compose -f docker/docker-compose.prod.yml --env-file ./docker/.env.example up
```
You can override the default setting by rewriting the `.env` file.
## Install from source
If you want to develop doccano, consider downloading the source code using Git and running doccano locally. First of all, clone the repository:
```bash
git clone https://github.com/doccano/doccano.git
cd doccano
```
### Backend
The doccano backend is built in Python 3.8+ and uses [Poetry](https://github.com/python-poetry/poetry) as a dependency manager. If you haven't installed them yet, please see [Python](https://www.python.org/downloads/) and [Poetry](https://python-poetry.org/docs/) documentation.
First, to install the defined dependencies for our project, just run the `install` command. After that, activate the virtual environment by running `shell` command:
```bash
cd backend
poetry install
poetry shell
```
Second, set up the database and run the development server. Doccano uses [Django](https://www.djangoproject.com/) and [Django Rest Framework](https://www.django-rest-framework.org/) as a backend. We can set up them by using Django command:
```bash
python manage.py migrate
python manage.py create_roles
python manage.py create_admin --noinput --username "admin" --email "admin@example.com" --password "password"
python manage.py runserver
```
In another terminal, you need to run Celery to use import/export dataset feature:
```bash
cd doccano/backend
celery --app=config worker --loglevel=INFO --concurrency=1
```
After you change the code, don't forget to run [mypy](https://mypy.readthedocs.io/en/stable/index.html), [flake8](https://flake8.pycqa.org/en/latest/), [black](https://github.com/psf/black), and [isort](https://github.com/PyCQA/isort). These ensure code consistency. To run them, just run the following commands:
```bash
poetry run task mypy
poetry run task flake8
poetry run task black
poetry run task isort
```
Similarly, you can run the test by executing the following command:
```bash
poetry run task test
```
Did you pass the test? Great!
### Frontend
The doccano frontend is built in Node.js and uses [Yarn](https://yarnpkg.com/) as a package manager. If you haven't installed them yet, please see [Node.js](https://nodejs.org/en/) and [Yarn](https://yarnpkg.com/) documentation.
First, to install the defined dependencies for our project, just run the `install` command.
```bash
cd frontend
yarn install
```
Then run the `dev` command to serve with hot reload at <localhost:3000>:
```bash
yarn dev
```
### How to create a Python package
During development, you may want to create a Python package and verify it works correctly. In such a case, you can create a package by running the following command in the root directory of your project:
```bash
./tools/create-package.sh
```
This command builds the frontend, copies the files, and packages them. This will take a few minutes. After finishing the command, you will find `sdist` and `wheel` in `backend/dist`:
```bash
Building doccano (1.5.5.post335.dev0+6be6d198)
- Building sdist
- Built doccano-1.5.5.post335.dev0+6be6d198.tar.gz
- Building wheel
- Built doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl
```
Then, you can install the package via `pip install` command:
```bash
pip install doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl
```
## Install to cloud
doccano also supports one-click deployment to cloud providers. Click the following button, configure the environment, and access the UI.
| Service | Button |
|---------|---|
| AWS | [![AWS CloudFormation Launch Stack SVG Button](https://cdn.rawgit.com/buildkite/cloudformation-launch-stack-button-svg/master/launch-stack.svg)](https://console.aws.amazon.com/cloudformation/home?#/stacks/new?stackName=doccano&templateURL=https://doccano.s3.amazonaws.com/public/cloudformation/template.aws.yaml) |
| Heroku | [![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://dashboard.heroku.com/new?template=https%3A%2F%2Fgithub.com%2Fdoccano%2Fdoccano) |
## Upgrade doccano
Caution: If you use SQLite3 as a database, upgrading the package would lose your database.
To upgrade to the latest version of doccano, reinstall or upgrade using pip.
```bash
pip install -U doccano
```
If you need update the database scheme, run the following:
```bash
doccano migrate
```

6
docs/mkdocs.yml

@ -31,15 +31,13 @@ plugins:
# Page tree
nav:
- Doccano: index.md
- Getting started: getting-started.md
- Get started: index.md
- Install and upgrade doccano: install-and-upgrade-doccano.md
- Tutorial: tutorial.md
- Project Structure: project_structure.md
- Advanced:
- AWS HTTPS settings: advanced/aws_https_settings.md
- OAuth2 settings: advanced/oauth2_settings.md
#- Release notes: release-notes.md
#- Author's notes: authors-notes.md
- FAQ: faq.md
- Code of Conduct: CODE_OF_CONDUCT.md
- Roadmap: roadmap.md
Loading…
Cancel
Save