aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorGertjan van den Burg <gertjanvandenburg@gmail.com>2020-05-08 18:34:11 +0100
committerGertjan van den Burg <gertjanvandenburg@gmail.com>2020-05-08 18:34:11 +0100
commite11bc31b55df43c0ded49672ad96fde9752e4e9e (patch)
tree50c2e590b8d7fb529b48b67756bab526fe257fa7 /README.md
parentAdd update/credit flags to user table (diff)
downloadAnnotateChange-e11bc31b55df43c0ded49672ad96fde9752e4e9e.tar.gz
AnnotateChange-e11bc31b55df43c0ded49672ad96fde9752e4e9e.zip
Update code for public release
Diffstat (limited to 'README.md')
-rw-r--r--README.md120
1 files changed, 96 insertions, 24 deletions
diff --git a/README.md b/README.md
index ab2e138..ce922db 100644
--- a/README.md
+++ b/README.md
@@ -1,33 +1,105 @@
# AnnotateChange
-## Implementation Notes
+Welcome to the repository of the "AnnotateChange" application. This
+application was created to collect annotations of time series data in order to
+construct the [Turing Change Point
+Dataset](https://github.com/alan-turing-institute/TCPD) (TCPD). The TCPD is a
+dataset of real-world time series used to evaluate change point detection
+algorithms. For the change point detection benchmark that was created using
+this dataset, see the [Turing Change Point Detection
+Benchmark](https://github.com/alan-turing-institute/TCPDBench) repository.
-* Missing values are skipped, so that gaps occur in the graph. X-values are
- however counted continuously, to ensure that this gap has nonzero width.
- Thus, when a change point is selected by an annotator after such a gap, its
- location can be found by retrieving the observation at the index *while
- including missing values*. This is in contrast to the approach where missing
- values are removed before the index is used to retrieve the data point.
+Any work that uses this repository should cite our paper: [**Van den Burg &
+Williams - An Evaluation of Change Point Detection Algorithms
+(2020)**](https://arxiv.org/abs/2003.06222). You can use the following BibTeX
+entry:
-* Task assignment flow: tasks assignment is handled upon login, completion of
- the demo, and completion of an annotation.
+```bib
+@article{vandenburg2020evaluation,
+ title={An Evaluation of Change Point Detection Algorithms},
+ author={{Van den Burg}, G. J. J. and Williams, C. K. I.},
+ journal={arXiv preprint arXiv:2003.06222},
+ year={2020}
+}
+```
- General rules:
+Here's a screenshot of what the application looks like during the annotation
+process:
- - [x] Users don't get a task assigned if they still have an unfinished task
- - [x] Users don't get a task assigned if there are no more datasets to
- annotate
- - [x] Users don't get a task assigned if they have reached their maximum.
- - [x] Users never get assigned the same dataset more than once
- - [x] Tasks are assigned on the fly, at the moment that a user requests a
- dataset to annotate.
+<p align="center">
+<img height="500px" src="./annotatechange_wide.png" alt="screenshot of
+AnnotateChange" />
+</p>
- Handled at login:
+Some of the features of AnnotateChange include:
- - [x] When a user logs in, a task that was previously assigned that is not
- finished should be removed to avoid duplication, unless this task was
- assigned by the admin.
+* Admin panel to add/remove datasets, add/remove annotation tasks, add/remove
+ users, and inspect incoming annotations.
- It may be easier to remove assigning a *specific* dataset and instead give
- the user the option to "annotate again". If they click that, they get
- assigned a task on the fly. This will remove a lot of the difficulties.
+* Basic user management: authentication, email confirmation, forgotten
+ password, automatic log out after inactivity, etc. Users are only allowed to
+ register using an email address from an approved domain.
+
+* Task assignment of time series to user is done on the fly, ensuring no user
+ ever annotates the same dataset twice, and prioritising datasets that are
+ close to a desired number of annotations.
+
+* Interactive graph of a time series that supports pan and zoom, support for
+ multidimensional time series.
+
+* Mandatory "demo" to onboard the user to change point annotation.
+
+* Backup of annotations to the admin via email.
+
+* Time series datasets are verified upon upload acccording to a strict schema.
+
+## Notes
+
+This codebase is provided "as is". If you find any problems, please raise an
+issue [on GitHub](https://github.com/alan-turing-institute/annotatechange).
+
+The code is licensed under the [MIT License](./LICENSE).
+
+This code was written by [Gertjan van den Burg](https://gertjan.dev) with
+helpful comments provided by [Chris
+Williams](https://homepages.inf.ed.ac.uk/ckiw/).
+
+## Some implementation details
+
+Below are some thoughts that may help make sense of the codebase.
+
+* AnnotateChange is a web application build on the Flask framework. See [this
+ excellent
+ tutorial](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world)
+ for an introduction to Flask. The [flask.sh](./flask.sh) shell script loads
+ the appropriate environment variables and runs the application in a virtual
+ environment managed by Poetry.
+
+* The application handles user management and is centered around the idea of a
+ "task" which links a particular user to a particular time series to
+ annotate.
+
+* An admin role is available, and the admin user can manually assign and
+ delete tasks as well as add/delete users, datasets, etc. The admin user is
+ created using the [cli](./app/cli.py).
+
+* All datasets must adhere to a specific dataset schema (see
+ [utils/dataset_schema.json](utils/dataset_schema.json)).
+
+* Annotations are stored in the database using 0-based indexing. Tasks are
+ assigned on the fly when a user requests a time series to annotate (see
+ [utils/tasks.py](utils/tasks.py)).
+
+* Users can only begin annotating when they have successfully passed the
+ introduction.
+
+* Configuration of the app is done through environment variables, see the
+ [.env.example](.env.example) file for an example.
+
+* [Poetry](https://python-poetry.org/) is used for dependency management.
+
+* Docker is used for deployment (see the deployment documentation in
+ [docs](docs)), and [Traefik](https://containo.us/traefik/) is used for SSL,
+ etc.
+
+* The time series graph is plotted using [d3.js](https://d3js.org/).