diff options
| author | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2020-03-13 12:39:50 +0000 |
|---|---|---|
| committer | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2020-03-13 12:39:50 +0000 |
| commit | 062094e7a5ea3bc56b3f5ede3ba909489749786e (patch) | |
| tree | 5e5a34b42fc469f0927b15f921ed3ae4a80b5adb /README.md | |
| parent | Initial commit (diff) | |
| download | TCPD-062094e7a5ea3bc56b3f5ede3ba909489749786e.tar.gz TCPD-062094e7a5ea3bc56b3f5ede3ba909489749786e.zip | |
Add README
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 115 |
1 files changed, 115 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..41c134e --- /dev/null +++ b/README.md @@ -0,0 +1,115 @@ +# Turing Change Point Dataset + +Welcome to the host repository of the Turing Change Point Dataset, a set of +time series specifically collected for the evaluation of change point +detection algorithms on real-world data. For the repository containing the +code and annotations, see +[TCPDBench](https://github.com/alan-turing-institute/TCPDBench). + +**Useful links:** +- [Turing Change Point Dataset](https://github.com/alan-turing-institute/TCPD) + on GitHub. +- [Turing Change Point Benchmark](https://github.com/alan-turing-institute/TCPDBench) +- [An Evaluation of Change Point Detection Algorithms](URL_TO_PAPER), a paper + by [Gertjan van den Burg](https://gertjan.dev) and [Chris + Williams](https://homepages.inf.ed.ac.uk/ckiw/). + +## Getting Started + +Many of the time series in the dataset are included in this repository. +However, due to licensing restrictions, some series can not be redistributed +and need to be downloaded locally. We've added a Python script and a Makefile +to make this process as easy as possible. + +Note that work based on the dataset should cite [our paper](URL_TO_PAPER): + +```bib +@article{vandenburg2020evaluation, + title={An Evaluation of Change Point Detection Algorithms}, + author={{Van den Burg}, G. J. J. and Williams, C. K. I.}, + journal={arXiv preprint}, + year={2020} +} +``` + +To obtain the dataset, please run the following steps: + +1. Clone the GitHub repository and change to the new directory: + + ``` + $ git clone https://github.com/alan-turing-institute/TCPD + $ cd TCPD + ``` + +2. Make sure you have Python (v3.2 or newer) installed, as well as + [virtualenv](https://virtualenv.pypa.io/en/latest/): + ``` + $ pip install virtualenv + ``` + +3. Next, use either of these steps: + - To obtain the dataset using Make, simply run: + + ``` + $ make + ``` + + This command will download all remaining datasets and verify that they + match the expected checksums. + + - If you don't have Make, you can obtain the dataset by manually executing + the following commands: + + ``` + $ virtualenv ./venv + $ source ./venv/bin/activate + $ pip install -r requirements.txt + $ python build_tcpd.py -v collect + ``` + + If you wish to verify the downloaded datasets you can run: + + ``` + $ python ./utils/check_checksums.py -v -c ./checksums.json -d ./datasets + ``` + +4. It may be convenient to export all dataset files to a single directory. + This can be done using Make as follows: + + ``` + $ make export + ``` + +All datasets are stored in individual directories inside the ``datasets`` +directory and each has its own README file with additional metadata and +sources. The data format used is [JSON](https://json.org/) and each file +follows the [JSON Schema](https://json-schema.org/) provided in +``schema.json``. + +## Using the data + +For your convenience, example code to load a dataset from the JSON format to a +data frame is provided in the ``examples`` directory in the following +languages: + +- [Python](examples/python/) +- [R](examples/R/) + +Implementations of various change point detection algorithms that use these +datasets are available in +[TCPDBench](https://github.com/alan-turing-institute/TCPDBench). + +## License + +The code in this repository is licensed under the MIT license. See the +[LICENSE file](LICENSE) for more details. Individual data files are often +distributed under different terms, see the relevant README files for more +details. Work that uses this dataset should cite [our paper](URL_TO_PAPER). + +## Notes + +If you find any problems or have a suggestion for improvement of this +repository, please let us know as it will help us make this resource better +for everyone. You can open an issue on +[GitHub](https://github.com/alan-turing-institute/TCPD) or send an email to +``gvandenburg at turing dot ac dot uk``. |
