1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
|
# AnnotateChange
Welcome to the repository of the "AnnotateChange" application. This
application was created to collect annotations of time series data in order to
construct the [Turing Change Point
Dataset](https://github.com/alan-turing-institute/TCPD) (TCPD). The TCPD is a
dataset of real-world time series used to evaluate change point detection
algorithms. For the change point detection benchmark that was created using
this dataset, see the [Turing Change Point Detection
Benchmark](https://github.com/alan-turing-institute/TCPDBench) repository.
Any work that uses this repository should cite our paper: [**Van den Burg &
Williams - An Evaluation of Change Point Detection Algorithms
(2020)**](https://arxiv.org/abs/2003.06222). You can use the following BibTeX
entry:
```bib
@article{vandenburg2020evaluation,
title={An Evaluation of Change Point Detection Algorithms},
author={{Van den Burg}, G. J. J. and Williams, C. K. I.},
journal={arXiv preprint arXiv:2003.06222},
year={2020}
}
```
Here's a screenshot of what the application looks like during the annotation
process:
<p align="center">
<img height="500px" src="./annotatechange_wide.png" alt="screenshot of
AnnotateChange" />
</p>
Some of the features of AnnotateChange include:
* Admin panel to add/remove datasets, add/remove annotation tasks, add/remove
users, and inspect incoming annotations.
* Basic user management: authentication, email confirmation, forgotten
password, automatic log out after inactivity, etc. Users are only allowed to
register using an email address from an approved domain.
* Task assignment of time series to user is done on the fly, ensuring no user
ever annotates the same dataset twice, and prioritising datasets that are
close to a desired number of annotations.
* Interactive graph of a time series that supports pan and zoom, support for
multidimensional time series.
* Mandatory "demo" to onboard the user to change point annotation.
* Backup of annotations to the admin via email.
* Time series datasets are verified upon upload acccording to a strict schema.
## Notes
This codebase is provided "as is". If you find any problems, please raise an
issue [on GitHub](https://github.com/alan-turing-institute/annotatechange).
The code is licensed under the [MIT License](./LICENSE).
This code was written by [Gertjan van den Burg](https://gertjan.dev) with
helpful comments provided by [Chris
Williams](https://homepages.inf.ed.ac.uk/ckiw/).
## Some implementation details
Below are some thoughts that may help make sense of the codebase.
* AnnotateChange is a web application build on the Flask framework. See [this
excellent
tutorial](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world)
for an introduction to Flask. The [flask.sh](./flask.sh) shell script loads
the appropriate environment variables and runs the application in a virtual
environment managed by Poetry.
* The application handles user management and is centered around the idea of a
"task" which links a particular user to a particular time series to
annotate.
* An admin role is available, and the admin user can manually assign and
delete tasks as well as add/delete users, datasets, etc. The admin user is
created using the [cli](./app/cli.py).
* All datasets must adhere to a specific dataset schema (see
[utils/dataset_schema.json](app/utils/dataset_schema.json)).
* Annotations are stored in the database using 0-based indexing. Tasks are
assigned on the fly when a user requests a time series to annotate (see
[utils/tasks.py](app/utils/tasks.py)).
* Users can only begin annotating when they have successfully passed the
introduction.
* Configuration of the app is done through environment variables, see the
[.env.example](.env.example) file for an example.
* [Poetry](https://python-poetry.org/) is used for dependency management.
* Docker is used for deployment (see the deployment documentation in
[docs](docs)), and [Traefik](https://containo.us/traefik/) is used for SSL,
etc.
* The time series graph is plotted using [d3.js](https://d3js.org/).
|