diff options
| author | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2020-05-08 18:34:11 +0100 |
|---|---|---|
| committer | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2020-05-08 18:34:11 +0100 |
| commit | e11bc31b55df43c0ded49672ad96fde9752e4e9e (patch) | |
| tree | 50c2e590b8d7fb529b48b67756bab526fe257fa7 /README.md | |
| parent | Add update/credit flags to user table (diff) | |
| download | AnnotateChange-e11bc31b55df43c0ded49672ad96fde9752e4e9e.tar.gz AnnotateChange-e11bc31b55df43c0ded49672ad96fde9752e4e9e.zip | |
Update code for public release
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 120 |
1 files changed, 96 insertions, 24 deletions
@@ -1,33 +1,105 @@ # AnnotateChange -## Implementation Notes +Welcome to the repository of the "AnnotateChange" application. This +application was created to collect annotations of time series data in order to +construct the [Turing Change Point +Dataset](https://github.com/alan-turing-institute/TCPD) (TCPD). The TCPD is a +dataset of real-world time series used to evaluate change point detection +algorithms. For the change point detection benchmark that was created using +this dataset, see the [Turing Change Point Detection +Benchmark](https://github.com/alan-turing-institute/TCPDBench) repository. -* Missing values are skipped, so that gaps occur in the graph. X-values are - however counted continuously, to ensure that this gap has nonzero width. - Thus, when a change point is selected by an annotator after such a gap, its - location can be found by retrieving the observation at the index *while - including missing values*. This is in contrast to the approach where missing - values are removed before the index is used to retrieve the data point. +Any work that uses this repository should cite our paper: [**Van den Burg & +Williams - An Evaluation of Change Point Detection Algorithms +(2020)**](https://arxiv.org/abs/2003.06222). You can use the following BibTeX +entry: -* Task assignment flow: tasks assignment is handled upon login, completion of - the demo, and completion of an annotation. +```bib +@article{vandenburg2020evaluation, + title={An Evaluation of Change Point Detection Algorithms}, + author={{Van den Burg}, G. J. J. and Williams, C. K. I.}, + journal={arXiv preprint arXiv:2003.06222}, + year={2020} +} +``` - General rules: +Here's a screenshot of what the application looks like during the annotation +process: - - [x] Users don't get a task assigned if they still have an unfinished task - - [x] Users don't get a task assigned if there are no more datasets to - annotate - - [x] Users don't get a task assigned if they have reached their maximum. - - [x] Users never get assigned the same dataset more than once - - [x] Tasks are assigned on the fly, at the moment that a user requests a - dataset to annotate. +<p align="center"> +<img height="500px" src="./annotatechange_wide.png" alt="screenshot of +AnnotateChange" /> +</p> - Handled at login: +Some of the features of AnnotateChange include: - - [x] When a user logs in, a task that was previously assigned that is not - finished should be removed to avoid duplication, unless this task was - assigned by the admin. +* Admin panel to add/remove datasets, add/remove annotation tasks, add/remove + users, and inspect incoming annotations. - It may be easier to remove assigning a *specific* dataset and instead give - the user the option to "annotate again". If they click that, they get - assigned a task on the fly. This will remove a lot of the difficulties. +* Basic user management: authentication, email confirmation, forgotten + password, automatic log out after inactivity, etc. Users are only allowed to + register using an email address from an approved domain. + +* Task assignment of time series to user is done on the fly, ensuring no user + ever annotates the same dataset twice, and prioritising datasets that are + close to a desired number of annotations. + +* Interactive graph of a time series that supports pan and zoom, support for + multidimensional time series. + +* Mandatory "demo" to onboard the user to change point annotation. + +* Backup of annotations to the admin via email. + +* Time series datasets are verified upon upload acccording to a strict schema. + +## Notes + +This codebase is provided "as is". If you find any problems, please raise an +issue [on GitHub](https://github.com/alan-turing-institute/annotatechange). + +The code is licensed under the [MIT License](./LICENSE). + +This code was written by [Gertjan van den Burg](https://gertjan.dev) with +helpful comments provided by [Chris +Williams](https://homepages.inf.ed.ac.uk/ckiw/). + +## Some implementation details + +Below are some thoughts that may help make sense of the codebase. + +* AnnotateChange is a web application build on the Flask framework. See [this + excellent + tutorial](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world) + for an introduction to Flask. The [flask.sh](./flask.sh) shell script loads + the appropriate environment variables and runs the application in a virtual + environment managed by Poetry. + +* The application handles user management and is centered around the idea of a + "task" which links a particular user to a particular time series to + annotate. + +* An admin role is available, and the admin user can manually assign and + delete tasks as well as add/delete users, datasets, etc. The admin user is + created using the [cli](./app/cli.py). + +* All datasets must adhere to a specific dataset schema (see + [utils/dataset_schema.json](utils/dataset_schema.json)). + +* Annotations are stored in the database using 0-based indexing. Tasks are + assigned on the fly when a user requests a time series to annotate (see + [utils/tasks.py](utils/tasks.py)). + +* Users can only begin annotating when they have successfully passed the + introduction. + +* Configuration of the app is done through environment variables, see the + [.env.example](.env.example) file for an example. + +* [Poetry](https://python-poetry.org/) is used for dependency management. + +* Docker is used for deployment (see the deployment documentation in + [docs](docs)), and [Traefik](https://containo.us/traefik/) is used for SSL, + etc. + +* The time series graph is plotted using [d3.js](https://d3js.org/). |
