You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

161 lines
6.0 KiB

  1. # Setup Auto Labeling
  2. In this tutorial, you will learn how to set up and use the auto-labeling feature. Auto-labeling is a feature that automates labeling using the Web API. This is not required to use doccano, but if you set it up, you will be able to label data more efficiently.
  3. The tutorial is divided into several sections:
  4. - Select a Template will give you a starting point to follow the tutorial.
  5. - Set Request Parameters will teach you how to set parameters required to send a request.
  6. - Specify Response Mapping will teach you the way to extract the label information from the response.
  7. - Specify Label Mapping will give you the way to map the extracted labels to the internal labels of doccano.
  8. - Enable the Feature will show you how to enable the auto-labeling.
  9. In this tutorial, we will show you how to set up auto-labeling using Amazon Comprehend Sentiment Analysis as an example. Therefore, we assume that you have a text classification project in doccano, an AWS account and be able to generate access keys.
  10. ## Use pre-defined service
  11. ### Select a Template
  12. First, move to the "settings" page and open "Auto Labeling" tab. The new tab should display a "Create" button and an empty table. Click the button and select "Amazon Comprehend Sentiment Analysis" from the dropdown menu:
  13. ![](../images/auto-labeling/select_template.png)
  14. ### Set Request Parameters
  15. Next, you need to set parameters to send an API request. In the case of Amazon Comprehend Sentiment Analysis, the following parameters are required:
  16. - aws_access_key
  17. - aws_secret_access_key
  18. - region_name
  19. - language_code
  20. In the following example, we set `us-west-2` as a `region_name` and `en` as a `language_code`:
  21. ![](../images/auto-labeling/set_parameters.png)
  22. Then, we will test them using the sample text to make sure whether the parameters are set correctly or not. In this case, we set "I like you" as a sample text and be able to get the response from Amazon Comprehend Sentiment Analysis. If you look at the Sentiment field in the response, you will see that its value is POSITIVE:
  23. ![](../images/auto-labeling/test_parameters.png)
  24. ### Specify Response Mapping
  25. Now, you can successfully fetch the API response. Next, you need to convert it to doccano format(below) with the mapping template([Jinja2](https://jinja.palletsprojects.com/en/2.11.x/) format).
  26. ```plain
  27. Text Classification
  28. [{ "label": "Cat" }, ...]
  29. Sequence Labeling
  30. [{ "label": "Cat", "start_offset": 0, "end_offset": 5 }, ...]
  31. Sequence to sequence
  32. [{ "text": "Cat" }, ...]
  33. ```
  34. In the case of Amazon Comprehend Sentiment Analysis, we want to get `Sentiment` value from the response. As we can access the entire response by the `input` variable, the mapping template looks like the following:
  35. ```json
  36. [
  37. {
  38. "label": "{{ input.Sentiment }}"
  39. }
  40. ]
  41. ```
  42. After setting the template, we will test them using the sample response. This response is the same one we fetched in the `Set Request Parameters` section.
  43. ![](../images/auto-labeling/test_mapping_template.png)
  44. ### Specify Label Mapping
  45. Once you specify the mapping template, you need to convert the label in the response into the one you defined at the label page.
  46. Click the `Add` button and fill in the `From` and `To` fields. `From` means the response label string. In this case, we can specify `POSITIVE`. `To` means the label of this project. In this case, we specify `positive`:
  47. ![](../images/auto-labeling/add_label_mapping.png)
  48. After adding the label mapping, we will test them using the sample response:
  49. ![](../images/auto-labeling/test_label_mapping.png)
  50. ### Enable the Feature
  51. Finally, move to the "annotation" page and click "Auto Labeling" button. It should display a "Slide" button for switching enable/disable auto-labeling feature. Try to enable it:
  52. ![](../images/auto-labeling/enable.png)
  53. Each time you view a new document, it will be labeled automatically.
  54. ## Use your own API
  55. First, select "Custom REST Request":
  56. ![](../images/auto-labeling/custom_rest_request_template.png)
  57. Next, you need to build your own API. Any framework can be used. Here we will use [Flask](https://flask.palletsprojects.com/en/2.2.x/) to create a minimal application. This application always returns the same label(`{"label": "NEG"}`). We also call `get_json` method and output its return value to make sure we can receive the data.
  58. ```bash
  59. from flask import Flask, request
  60. app = Flask(__name__)
  61. @app.route("/", methods=["POST"])
  62. def predict():
  63. print(request.get_json())
  64. return {"label": "NEG"}
  65. ```
  66. Save it as `hello.py` or something similar. Make sure to not call your application `flask.py` because this would conflict with Flask itself.
  67. To run the application, use the `flask` command. You need to tell the Flask where your application is with the `--app` option.
  68. ```bash
  69. $ flask --app hello run
  70. * Serving Flask app 'hello'
  71. * Debug mode: off
  72. * Running on http://127.0.0.1:5000
  73. Press CTRL+C to quit
  74. ```
  75. OK. Let's return to doccano.
  76. Next, you need to set parameters(`url` and `method`). Let's set the Flask application's URL and method:
  77. ![](../images/auto-labeling/custom_rest_request_parameters.png)
  78. Next, select the add button next to Body. Enter `text` as key and `{{ text }}` as value. This value is a placeholder and will actually be replaced by your own text.
  79. ![](../images/auto-labeling/custom_rest_request_body.png)
  80. Then, test the parameters with the sample text. If it is working correctly, it should return `{"label": "NEG"}`.
  81. ![](../images/auto-labeling/custom_rest_request_test_parameters.png)
  82. You should also see the following in the console:
  83. ```bash
  84. 127.0.0.1 - - [13/Sep/2022 15:19:57] "GET / HTTP/1.1" 405 -
  85. {'text': 'This is a test sentence.'}
  86. ```
  87. Next, convert the response from your API into a format that doccano can handle. We can access the response by the `input` variable. The mapping template looks like the following:
  88. ```bash
  89. [
  90. {
  91. "label": "{{ input.label }}"
  92. }
  93. ]
  94. ```
  95. Test the mapping template. If it is working correctly, it should be as follows:
  96. ```bash
  97. [
  98. {
  99. "label": "NEG"
  100. }
  101. ]
  102. ```
  103. The rest is the same as when using a predefined service.