@ -13,24 +13,23 @@ doccano is an open source text annotation tool for humans. It provides annotatio
You can try the [annotation demo](http://doccano.herokuapp.com).


## Features
- Collaborative annotation
- Multi-language support
- Mobile support
- Emoji :smile: support
- Dark theme
- RESTful API
- Collaborative annotation
- Multi-language support
- Mobile support
- Emoji :smile: support
- Dark theme
- RESTful API
## Usage
Two options to run doccano:
- (Recommended) Docker Compose
- Docker
- (Recommended) Docker Compose
- Docker
### Docker Compose
@ -42,13 +41,14 @@ $ docker-compose -f docker-compose.prod.yml up
Go to <http://0.0.0.0/>.
_Note the superuser account credentials located in the `docker-compose.prod.yml` file:_
Note the superuser account credentials located in the `docker-compose.prod.yml` file:
```yml
ADMIN_USERNAME: "admin"
ADMIN_PASSWORD: "password"
```
> Note: If you want to add annotators, see [Frequently Asked Questions](https://github.com/doccano/doccano/wiki/Frequently-Asked-Questions#i-want-to-add-annotators)
> Note: If you want to add annotators, see [Frequently Asked Questions](./docs/faq.md)
_Note for Windows developers: Be sure to configure git to correctly handle line endings or you may encounter `status code 127` errors while running the services in future steps. Running with the git config options below will ensure your git directory correctly handles line endings._
@ -112,7 +112,7 @@ Here are some tips might be helpful. [How to Contribute to Doccano Project](http
## Citation
```
```tex
@misc{doccano,
title={{doccano}: Text Annotation Tool for Human},
Click the `Create Load Balancer` button and select `Application Load Balancer`.
Fill the name, change protocol to HTTPS, and do not forget add at least two availability zones. Make sure the zone that EC2 instance created is included.
Fill the name, change protocol to HTTPS, and do not forget add at least two availability zones. Make sure the zone that EC2 instance created is included.
This document aims to instruct how to setup OAuth for doccano. doccano now supports social login via GitHub and Active Directory by [#75](https://github.com/doccano/doccano/pull/75). In this document, we show GitHub OAuth as an example.
# How to use OAuth
This document aims to instruct how to setup OAuth for doccano. doccano now supports social login via GitHub and Active Directory by [#75](https://github.com/doccano/doccano/pull/75). In this document, we show GitHub OAuth as an example.
## Create OAuth App
@ -15,7 +16,7 @@ This document aims to instruct how to setup OAuth for doccano. doccano now suppo
## Set enviromental variables
Once the application is registered, your app's `Client ID` and `Client Secret` will be displayed on the following page:
Following list is ordered by from easy to hard. If you are not familiar with Python development, please consider easy setup.
1. [One click deployment to Cloud Service.](https://github.com/doccano/doccano#deployment)
* Only you have to do is create an account. Especially [Heroku](https://www.heroku.com/home) does not require your credit card (if free plan).
* [](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Fdoccano%2Fdoccano%2Fmaster%2Fazuredeploy.json)
* > Notice: (1) EC2 KeyPair cannot be created automatically, so make sure you have an existing EC2 KeyPair in one region. Or [create one yourself](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair). (2) If you want to access doccano via HTTPS in AWS, here is an [instruction](https://github.com/doccano/doccano/wiki/HTTPS-setting-for-doccano-in-AWS).
2. [Use Docker](https://docs.docker.com/install/)
* Docker doesn't bother you by the OS, Python version, etc problems. Because an environment for application is packed as a container.
* Get doccano's image: `docker pull doccano/doccano`
* Create & Run doccano container: `docker run -d --name doccano -p 8000:80 doccano/doccano`
doccano is an open source text annotation tool built for human beings. It provides annotation features for text classification, sequence labeling and sequence to sequence. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create project, upload your data and start annotating. You can build a dataset in hours.
## Demo
You can enjoy this [annotation demo](http://doccano.herokuapp.com).
@ -13,19 +12,19 @@ You can enjoy this [annotation demo](http://doccano.herokuapp.com).
First demo is one of the sequence labeling tasks, named-entity recognition. You just select text spans and annotate them. Since doccano supports shortcut keys, you can quickly annotate text spans.
Final demo is one of the sequence to sequence tasks, machine translation. Since there may be more than one responses in sequence to sequence tasks, you can create multiple responses.
@ -38,7 +38,6 @@ This is a list of features on the short term roadmap and beyond:
* Implement RBAC and enable to assign a role to a user by project administrators.
* Enhance annotation statistics.
### Annotation
* Increase the number of annotation tasks such as relation extraction, entity linking, aspect-based sentiment analysis, visual question answering and so on.
@ -48,7 +47,6 @@ This is a list of features on the short term roadmap and beyond:
* Control sort order on the frontend.
* More documentation and tutorials.
### Upload and download
* Enable to import data from cloud storage like s3.
@ -63,8 +61,7 @@ This is a list of features on the short term roadmap and beyond:
* Enable to customize font and font-family.
* Enable to customize label color per user.
* Enable to customize site theme per user.
### Entire project
* Design Vue component and use it to implement frontend.
@ -82,4 +79,3 @@ This is a list of features on the short term roadmap and beyond:
* Gather and highlight novel doccano use cases.
Track the progress of these features in the GitHub project tracker.
Here we take an NER annotation task for science fictions to give you a brief tutorial on doccano.
## Dataset
Below is a JSON file containing lots of science fictions description with different languages. We need to annotate some entities like people name, book title, date and so on.
Here we take an NER annotation task for science fictions to give you a brief tutorial on doccano. Below is a JSON file named `books.json` containing lots of science fictions description with different languages. We need to annotate some entities like people name, book title, date and so on.
`books.json`
```JSON
```json
{"text": "The Hitchhiker's Guide to the Galaxy (sometimes referred to as HG2G, HHGTTGor H2G2) is a comedy science fiction series created by Douglas Adams. Originally a radio comedy broadcast on BBC Radio 4 in 1978, it was later adapted to other formats, including stage shows, novels, comic books, a 1981 TV series, a 1984 video game, and 2005 feature film."}
@ -16,18 +14,17 @@ Below is a JSON file containing lots of science fictions description with differ
We need to create a new project for this task. Logging in with the superuser account.


To create your project, make sure you’re in the project list page and click `Create` button. As for this tutorial, we name the project as `sequence labeling for books`, write some description, choose the sequence labeling task type.


## Import Data
After creating a project, we will see the `Dataset` page, and click `Import dataset` button in the `Actions` menu. We should see the following screen:


We choose `JSON` and click `Select a file` button. Select `books.json` and it would be loaded automatically.
@ -35,7 +32,7 @@ We choose `JSON` and click `Select a file` button. Select `books.json` and it wo
Click `Labels` button in left bar to define our own labels. We should see the label editor page. In label editor page, you can create labels by specifying label text, shortcut key, background color and text color.
As for the tutorial, we created some entities related to science fictions.
@ -43,18 +40,19 @@ As for the tutorial, we created some entities related to science fictions.
Next, we are ready to annotate the texts. Just click the `Start annotation` button in the navigation bar, we can start to annotate the documents.


## Export Data
After the annotation step, we can download the annotated data. Go to the `Dataset` page and click the `Export dataset` button in the `Action` menu. After selecting an export format, click `Export`. You should see below screen:


Here we choose JSONL file to download the data by clicking the button. Below is the annotated result for our tutorial project.
`sequence_labeling_for_books.json`
```JSON
```json
{"doc_id": 33,
"text": "The Hitchhiker's Guide to the Galaxy (sometimes referred to as HG2G, HHGTTGor H2G2) is a comedy science fiction series created by Douglas Adams. Originally a radio comedy broadcast on BBC Radio 4 in 1978, it was later adapted to other formats, including stage shows, novels, comic books, a 1981 TV series, a 1984 video game, and 2005 feature film.",