aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 7331ed55c1a4fef63c7210cd4b667e8a2a83e191 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# AnnotateChange

Welcome to the repository of the "AnnotateChange" application. This 
application was created to collect annotations of time series data in order to 
construct the [Turing Change Point 
Dataset](https://github.com/alan-turing-institute/TCPD) (TCPD). The TCPD is a 
dataset of real-world time series used to evaluate change point detection 
algorithms. For the change point detection benchmark that was created using 
this dataset, see the [Turing Change Point Detection 
Benchmark](https://github.com/alan-turing-institute/TCPDBench) repository.

Any work that uses this repository should cite our paper: [**Van den Burg & 
Williams - An Evaluation of Change Point Detection Algorithms 
(2020)**](https://arxiv.org/abs/2003.06222). You can use the following BibTeX 
entry:

```bib
@article{vandenburg2020evaluation,
        title={An Evaluation of Change Point Detection Algorithms},
        author={{Van den Burg}, G. J. J. and Williams, C. K. I.},
        journal={arXiv preprint arXiv:2003.06222},
        year={2020}
}
```

Here's a screenshot of what the application looks like during the annotation 
process:

<p align="center">
<img height="500px" src="./annotatechange_wide.png" alt="screenshot of 
AnnotateChange" />
</p>

Some of the features of AnnotateChange include:

* Admin panel to add/remove datasets, add/remove annotation tasks, add/remove 
  users, and inspect incoming annotations.

* Basic user management: authentication, email confirmation, forgotten 
  password, automatic log out after inactivity, etc. Users are only allowed to 
  register using an email address from an approved domain.

* Task assignment of time series to user is done on the fly, ensuring no user 
  ever annotates the same dataset twice, and prioritising datasets that are 
  close to a desired number of annotations.

* Interactive graph of a time series that supports pan and zoom, support for 
  multidimensional time series.

* Mandatory "demo" to onboard the user to change point annotation.

* Backup of annotations to the admin via email.

* Time series datasets are verified upon upload acccording to a strict schema.

## Notes

This codebase is provided "as is". If you find any problems, please raise an 
issue [on GitHub](https://github.com/alan-turing-institute/annotatechange).

The code is licensed under the [MIT License](./LICENSE).

This code was written by [Gertjan van den Burg](https://gertjan.dev) with 
helpful comments provided by [Chris 
Williams](https://homepages.inf.ed.ac.uk/ckiw/).

## Some implementation details

Below are some thoughts that may help make sense of the codebase.

* AnnotateChange is a web application build on the Flask framework. See [this 
  excellent 
  tutorial](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world) 
  for an introduction to Flask. The [flask.sh](./flask.sh) shell script loads 
  the appropriate environment variables and runs the application in a virtual 
  environment managed by Poetry.

* The application handles user management and is centered around the idea of a 
  "task" which links a particular user to a particular time series to 
  annotate.

* An admin role is available, and the admin user can manually assign and 
  delete tasks as well as add/delete users, datasets, etc. The admin user is 
  created using the [cli](./app/cli.py).

* All datasets must adhere to a specific dataset schema (see 
  [utils/dataset_schema.json](app/utils/dataset_schema.json)).

* Annotations are stored in the database using 0-based indexing. Tasks are 
  assigned on the fly when a user requests a time series to annotate (see 
  [utils/tasks.py](app/utils/tasks.py)).

* Users can only begin annotating when they have successfully passed the 
  introduction.

* Configuration of the app is done through environment variables, see the 
  [.env.example](.env.example) file for an example.

* [Poetry](https://python-poetry.org/) is used for dependency management.

* Docker is used for deployment (see the deployment documentation in 
  [docs](docs)), and [Traefik](https://containo.us/traefik/) is used for SSL, 
  etc.

* The time series graph is plotted using [d3.js](https://d3js.org/).