You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

11 KiB

Install doccano

Install doccano on local or in the cloud. Choose the installation method that works best for your environment:

System requirements

You can install doccano on a Linux, Windows, or macOS machine running Python 3.8+.

Web browser support

doccano is tested with the latest version of Google Chrome and is expected to work in the latest versions of:

  • Google Chrome
  • Apple Safari

If using other web browsers, or older versions of supported web browsers, unexpected behavior could occur.

Port requirements

doccano uses port 8000 by default. To use a different port, specify it when running doccano webserver.

Install with pip

To install doccano with pip, you need Python 3.8+. Run the following:

pip install doccano

After you install doccano, start the server with the following command:

# Initialize database. First time only.
doccano init
# Create a super user. First time only.
doccano createuser --username admin --password pass
# Start a web server.
doccano webserver --port 8000

In another terminal, run the following command:

# Start the task queue to handle file upload/download.
doccano task

Open http://localhost:8000/.

Use PostgreSQL as a database

By default, SQLite 3 is used for the default database system. You can also use other database systems like PostgreSQL, MySQL, and so on. Here we will show you how to use PostgreSQL.

First, you need to install psycopg2-binary as an additional dependency:

pip install psycopg2-binary

Next, set up PostgreSQL. You can set up PostgreSQL directly, but here we will use Docker. Let's run the docker run command with the user name(POSTGRES_USER), password(POSTGRES_PASSWORD), and database name(POSTGRES_DB). For other options, please refer to the official documentation.

docker run -d \
  --name doccano-postgres \
  -e POSTGRES_USER=doccano_admin \
  -e POSTGRES_PASSWORD=doccano_pass \
  -e POSTGRES_DB=doccano \
  -v doccano-db:/var/lib/postgresql/data \
  -p 5432:5432 \
  postgres:13.8-alpine

Then, set DATABASE_URL environment variable according to your PostgreSQL credentials. The schema is in line with dj-database-url. Please refer to the official documentation for the detailed information.

# export DATABASE_URL="postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DB}?sslmode=disable"
export DATABASE_URL="postgres://doccano_admin:doccano_pass@localhost:5432/doccano?sslmode=disable"

That's it. Now you can start by running the doccano init command.

Use RabbitMQ as a message broker

doccano uses Celery and a message broker to handle long tasks like importing/exxporting datasets. By default, SQLite3 is used for the default message broker. You can also use other message brokers like RabbitMQ, Redis, and so on. Here we will show you how to use RabbitMQ.

First, set up RabbitMQ. You can set up RabbitMQ directly, but here we will use Docker. Let's run the docker run command with the user name(RABBITMQ_DEFAULT_USER), password(RABBITMQ_DEFAULT_PASS). For other options, please refer to the official documentation.

docker run -d \
  --hostname doccano \
  --name doccano-rabbit \
  -e RABBITMQ_DEFAULT_USER=doccano_rabit \
  -e RABBITMQ_DEFAULT_PASS=doccano_pass \
  -p 5672:5672 \
  rabbitmq:3.10.7-alpine

Then, set CELERY_BROKER_URL environment variable according to your RabbitMQ credentials. If you want to know the schema, please refer to the official documentation.

# export CELERY_BROKER_URL='amqp://${RABBITMQ_DEFAULT_USER}:${RABBITMQ_DEFAULT_PASS}@localhost:5672//'
export CELERY_BROKER_URL='amqp://doccano_rabit:doccano_pass@localhost:5672//'

That's it. Now you can start webserver and task queue by running the doccano webserver and doccano task command. Notice that the both commands needs DATABASE_URL and CELERY_BROKER_URL environment variables if you would change them.

Install with Docker

doccano is also available as a Docker container. Make sure you have Docker installed on your machine.

To install and start doccano at http://localhost:8000, run the following command:

docker pull doccano/doccano
docker container create --name doccano \
  -e "ADMIN_USERNAME=admin" \
  -e "ADMIN_EMAIL=admin@example.com" \
  -e "ADMIN_PASSWORD=password" \
  -v doccano-db:/data \
  -p 8000:8000 doccano/doccano

Next, start doccano by running the container:

docker container start doccano

To stop the container, run docker container stop doccano -t 5. All data created in the container persist across restarts.

Build a local image with Docker

If you want to build a local image, run:

docker build -t doccano:latest . -f docker/Dockerfile

Install with Docker Compose

You need to install Git and to clone the repository:

git clone https://github.com/doccano/doccano.git
cd doccano

To install and start doccano at http://localhost, run the following command:

docker-compose -f docker/docker-compose.prod.yml --env-file .env up

You can override the default setting by rewriting the .env file. See ./docker/.env.example in detail.

Install from source

If you want to develop doccano, consider downloading the source code using Git and running doccano locally. First of all, clone the repository:

git clone https://github.com/doccano/doccano.git
cd doccano

Backend

The doccano backend is built in Python 3.8+ and uses Poetry as a dependency manager. If you haven't installed them yet, please see Python and Poetry documentation.

First, to install the defined dependencies for our project, just run the install command. After that, activate the virtual environment by running shell command:

cd backend
poetry install
poetry shell

Second, set up the database and run the development server. Doccano uses Django and Django Rest Framework as a backend. We can set up them by using Django command:

python manage.py migrate
python manage.py create_roles
python manage.py create_admin --noinput --username "admin" --email "admin@example.com" --password "password"
python manage.py runserver

In another terminal, you need to run Celery to use import/export dataset feature:

cd doccano/backend
celery --app=config worker --loglevel=INFO --concurrency=1

After you change the code, don't forget to run mypy, flake8, black, and isort. These ensure code consistency. To run them, just run the following commands:

poetry run task mypy
poetry run task flake8
poetry run task black
poetry run task isort

Similarly, you can run the test by executing the following command:

poetry run task test

Did you pass the test? Great!

Frontend

The doccano frontend is built in Node.js and uses Yarn as a package manager. If you haven't installed them yet, please see Node.js and Yarn documentation.

First, to install the defined dependencies for our project, just run the install command.

cd frontend
yarn install

Then run the dev command to serve with hot reload at localhost:3000:

yarn dev

After you change the code, don't forget to run the following commands to ensure code consistency:

yarn lintfix
yarn precommit
yarn fix:prettier

How to create a Python package

During development, you may want to create a Python package and verify it works correctly. In such a case, you can create a package by running the following command in the root directory of your project:

./tools/create-package.sh

This command builds the frontend, copies the files, and packages them. This will take a few minutes. After finishing the command, you will find sdist and wheel in backend/dist:

Building doccano (1.5.5.post335.dev0+6be6d198)
  - Building sdist
  - Built doccano-1.5.5.post335.dev0+6be6d198.tar.gz
  - Building wheel
  - Built doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl

Then, you can install the package via pip install command:

pip install doccano-1.5.5.post335.dev0+6be6d198-py3-none-any.whl

Install to cloud

doccano also supports one-click deployment to cloud providers. Click the following button, configure the environment, and access the UI.

Service Button
AWS AWS CloudFormation Launch Stack SVG Button
Heroku Deploy

Upgrade doccano

Caution: If you use SQLite3 as a database, upgrading the package would lose your database.

The migrate command has been supported since v1.6.0.

After v1.6.0

To upgrade to the latest version of doccano, reinstall or upgrade using pip.

pip install -U doccano

If you need to update the database scheme, run the following:

doccano migrate

Before v1.6.0

First, you need to copy the database file and media directory in the case of SQLite3:

mkdir -p ~/doccano
# Replace your path.
cp venv/lib/python3.8/site-packages/backend/db.sqlite3 ~/doccano/
cp -r venv/lib/python3.8/site-packages/backend/media ~/doccano/

Then, upgrade the package:

pip install -U doccano

At the end, run the migration:

doccano migrate