aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorGertjan van den Burg <gertjanvandenburg@gmail.com>2020-03-13 12:39:50 +0000
committerGertjan van den Burg <gertjanvandenburg@gmail.com>2020-03-13 12:39:50 +0000
commit062094e7a5ea3bc56b3f5ede3ba909489749786e (patch)
tree5e5a34b42fc469f0927b15f921ed3ae4a80b5adb /README.md
parentInitial commit (diff)
downloadTCPD-062094e7a5ea3bc56b3f5ede3ba909489749786e.tar.gz
TCPD-062094e7a5ea3bc56b3f5ede3ba909489749786e.zip
Add README
Diffstat (limited to 'README.md')
-rw-r--r--README.md115
1 files changed, 115 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..41c134e
--- /dev/null
+++ b/README.md
@@ -0,0 +1,115 @@
+# Turing Change Point Dataset
+
+Welcome to the host repository of the Turing Change Point Dataset, a set of
+time series specifically collected for the evaluation of change point
+detection algorithms on real-world data. For the repository containing the
+code and annotations, see
+[TCPDBench](https://github.com/alan-turing-institute/TCPDBench).
+
+**Useful links:**
+- [Turing Change Point Dataset](https://github.com/alan-turing-institute/TCPD)
+ on GitHub.
+- [Turing Change Point Benchmark](https://github.com/alan-turing-institute/TCPDBench)
+- [An Evaluation of Change Point Detection Algorithms](URL_TO_PAPER), a paper
+ by [Gertjan van den Burg](https://gertjan.dev) and [Chris
+ Williams](https://homepages.inf.ed.ac.uk/ckiw/).
+
+## Getting Started
+
+Many of the time series in the dataset are included in this repository.
+However, due to licensing restrictions, some series can not be redistributed
+and need to be downloaded locally. We've added a Python script and a Makefile
+to make this process as easy as possible.
+
+Note that work based on the dataset should cite [our paper](URL_TO_PAPER):
+
+```bib
+@article{vandenburg2020evaluation,
+ title={An Evaluation of Change Point Detection Algorithms},
+ author={{Van den Burg}, G. J. J. and Williams, C. K. I.},
+ journal={arXiv preprint},
+ year={2020}
+}
+```
+
+To obtain the dataset, please run the following steps:
+
+1. Clone the GitHub repository and change to the new directory:
+
+ ```
+ $ git clone https://github.com/alan-turing-institute/TCPD
+ $ cd TCPD
+ ```
+
+2. Make sure you have Python (v3.2 or newer) installed, as well as
+ [virtualenv](https://virtualenv.pypa.io/en/latest/):
+ ```
+ $ pip install virtualenv
+ ```
+
+3. Next, use either of these steps:
+ - To obtain the dataset using Make, simply run:
+
+ ```
+ $ make
+ ```
+
+ This command will download all remaining datasets and verify that they
+ match the expected checksums.
+
+ - If you don't have Make, you can obtain the dataset by manually executing
+ the following commands:
+
+ ```
+ $ virtualenv ./venv
+ $ source ./venv/bin/activate
+ $ pip install -r requirements.txt
+ $ python build_tcpd.py -v collect
+ ```
+
+ If you wish to verify the downloaded datasets you can run:
+
+ ```
+ $ python ./utils/check_checksums.py -v -c ./checksums.json -d ./datasets
+ ```
+
+4. It may be convenient to export all dataset files to a single directory.
+ This can be done using Make as follows:
+
+ ```
+ $ make export
+ ```
+
+All datasets are stored in individual directories inside the ``datasets``
+directory and each has its own README file with additional metadata and
+sources. The data format used is [JSON](https://json.org/) and each file
+follows the [JSON Schema](https://json-schema.org/) provided in
+``schema.json``.
+
+## Using the data
+
+For your convenience, example code to load a dataset from the JSON format to a
+data frame is provided in the ``examples`` directory in the following
+languages:
+
+- [Python](examples/python/)
+- [R](examples/R/)
+
+Implementations of various change point detection algorithms that use these
+datasets are available in
+[TCPDBench](https://github.com/alan-turing-institute/TCPDBench).
+
+## License
+
+The code in this repository is licensed under the MIT license. See the
+[LICENSE file](LICENSE) for more details. Individual data files are often
+distributed under different terms, see the relevant README files for more
+details. Work that uses this dataset should cite [our paper](URL_TO_PAPER).
+
+## Notes
+
+If you find any problems or have a suggestion for improvement of this
+repository, please let us know as it will help us make this resource better
+for everyone. You can open an issue on
+[GitHub](https://github.com/alan-turing-institute/TCPD) or send an email to
+``gvandenburg at turing dot ac dot uk``.