1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
|
# Turing Change Point Detection Benchmark
[](https://github.com/alan-turing-institute/TCPDBench/actions?query=workflow%3A%22Reproducible+Research%22)
[](https://doi.org/10.5281/zenodo.3740582)
Welcome to the repository for the Turing Change Point Detection Benchmark, a
benchmark evaluation of change point detection algorithms developed at [The
Alan Turing Institute](https://turing.ac.uk). This benchmark uses the time
series from the [Turing Change Point
Dataset](https://github.com/alan-turing-institute/TCPD) (TCPD).
**Useful links:**
- [Turing Change Point Detection
Benchmark](https://github.com/alan-turing-institute/TCPDBench)
- [Turing Change Point Dataset](https://github.com/alan-turing-institute/TCPD)
- [An Evaluation of Change Point Detection Algorithms](https://arxiv.org/abs/2003.06222) by
[Gertjan van den Burg](https://gertjan.dev) and [Chris
Williams](https://homepages.inf.ed.ac.uk/ckiw/).
- [Annotation Tool](https://github.com/alan-turing-institute/annotatechange)
If you encounter a problem when using this repository or simply want to ask a
question, please don't hesitate to [open an issue on
GitHub](https://github.com/alan-turing-institute/TCPDBench/issues) or send an
email to ``gvandenburg at turing dot ac dot uk``.
## Introduction
Change point detection focuses on accurately detecting moments of abrupt
change in the behavior of a time series. While many methods for change point
detection exist, past research has paid little attention to the evaluation of
existing algorithms on real-world data. This work introduces a benchmark study
and a dataset ([TCPD](https://github.com/alan-turing-institute/TCPD)) that are
explicitly designed for the evaluation of change point detection algorithms.
We hope that our work becomes a proving ground for the comparison and
development of change point detection algorithms that work well in practice.
This repository contains the code necessary to evaluate and analyze a
significant number of change point detection algorithms on the TCPD, and
serves to reproduce the work in [Van den Burg and Williams
(2020)](https://arxiv.org/abs/2003.06222). Note that work based on either the
dataset or this benchmark should cite that paper:
```bib
@article{vandenburg2020evaluation,
title={An Evaluation of Change Point Detection Algorithms},
author={{Van den Burg}, G. J. J. and Williams, C. K. I.},
journal={arXiv preprint arXiv:2003.06222},
year={2020}
}
```
For the experiments we've used the [abed](https://github.com/GjjvdBurg/abed)
command line program, which makes it easy to organize and run the experiments.
This means that all experiments are defined through the
[abed_conf.py](abed_conf.py) file. In particular, the hyperparameters and the
command line arguments to all methods are defined in that file. Next, all
methods are called as command line scripts and they are defined in the
[execs](execs) directory. The raw results from the experiments are collected
in JSON files and placed in the [abed_results](abed_results) directory,
organized by dataset and method. Finally, we use
[Make](https://www.gnu.org/software/make/) to coordinate our analysis scripts:
first we generate [summary files](analysis/output/summaries) using
[summarize.py](analysis/scripts/summarize.py), and then use these to generate
all the tables and figures in the paper.
## Getting Started
This repository contains all the code to generate the results
(tables/figures/constants) from the paper, as well as to reproduce the
experiments entirely. You can either install the dependencies directly on your
machine or use the provided Dockerfile (see below). If you don't use Docker,
first clone this repository using:
```
$ git clone --recurse-submodules https://github.com/alan-turing-institute/TCPDBench
```
### Generating Tables/Figures
Generating the tables and figures from the paper is done through the scripts
in ``analysis/scripts`` and can be run through the provided ``Makefile``.
First make sure you have all requirements:
```
$ pip install -r ./analysis/requirements.txt
```
and then use make:
```
$ make results
```
The results will be placed in ``./analysis/output``. Note that to generate the
figures a working LaTeX and ``latexmk`` installation is needed.
### Reproducing the experiments
To fully reproduce the experiments, some additional steps are needed. Note
that the Docker procedure outlined below automates this process somewhat.
First, obtain the [Turing Change Point
Dataset](https://github.com/alan-turing-institute/TCPD) and follow the
instructions provided there. Copy the dataset files to a ``datasets``
directory in this repository.
To run all the tasks we use the [abed](https://github.com/GjjvdBurg/abed)
command line tool. This allows us to define the experiments in a single
configuration file (``abed_conf.py``) and makes it easy to keep track of which
tasks still need to be run.
Note that this repository contains all the result files, so it is not
necessary to redo all the experiments. If you still wish to do so, the
instructions are as follows:
1. Move the current result directory out of the way:
```
$ mv abed_results old_abed_results
```
2. Install [abed](https://github.com/GjjvdBurg/abed). This requires an
existing installation of openmpi, but otherwise should be a matter of
running:
```
$ pip install abed
```
3. Tell abed to rediscover all the tasks that need to be done:
```
$ abed reload_tasks
```
This will populate the ``abed_tasks.txt`` file and will automatically
commit the updated file to the Git repository. You can show the number of
tasks that need to be completed through:
```
$ abed status
```
4. Initialize the virtual environments for Python and R, which installs all
required dependencies:
```
$ make venvs
```
Note that this will also create an R virtual environment (using
[RSimpleVenv](https://github.com/GjjvdBurg/RSimpleVenv)), which ensures
that the exact versions of the packages used in the experiments will be
installed. This step can take a little while (:coffee:), but is important
to ensure reproducibility.
5. Run abed through ``mpiexec``, as follows:
```
$ mpiexec -np 4 abed local
```
This will run abed using 4 cores, which can of course be increased or
decreased if desired. Note that a minimum of two cores is needed for abed
to operate. You may want to run these experiments in parallel on a large
number of cores, as the expected runtime is on the order of 21 days on a
single core. Once this command starts running the experiments you will see
result files appear in the ``staging`` directory.
### Running the experiments with Docker
If you like to use [Docker](https://www.docker.com/) to manage the environment
and dependencies, you can do so easily with the provided Dockerfile. You can
build the Docker image using:
```
$ docker build -t alan-turing-institute/tcpdbench github.com/alan-turing-institute/TCPDBench
```
To ensure that the results created in the docker container persist to the
host, we need to create a volume first (following [these
instructions](https://stackoverflow.com/a/47528568/1154005)):
```
$ mkdir /path/to/tcpdbench/results # *absolute* path where you want the results
$ docker volume create --driver local \
--opt type=none \
--opt device=/path/to/tcpdbench/results \
--opt o=bind tcpdbench_vol
```
You can then follow the same procedure as described above to reproduce the
experiments, but using the relevant docker commands to run them in the
container:
* For reproducing just the tables and figures, use:
```
$ docker run -i -t -v tcpdbench_vol:/TCPDBench alan-turing-institute/tcpdbench /bin/bash -c "make results"
```
* For reproducing all the experiments, use:
```
$ docker run -i -t -v tcpdbench_vol:/TCPDBench alan-turing-institute/tcpdbench /bin/bash -c "mv abed_results old_abed_results && mkdir abed_results && abed reload_tasks && abed status && make venvs && mpiexec --allow-run-as-root -np 4 abed local && make results"
```
where ``-np 4`` sets the number of cores used for the experiments to four.
This can be changed as desired to increase efficiency.
## Extending the Benchmark
It should be relatively straightforward to extend the benchmark with your own
methods and datasets. Remember to [cite our
paper](https://arxiv.org/abs/2003.06222) if you do end up using this work.
### Adding a new method
To add a new method to the benchmark, you'll need to write a script in the
``execs`` folder that takes a dataset file as input and computes the change
point locations. Currently the methods are organized by language (R and
python), but you don't necessarily need to follow this structure when adding a
new method. Please do check the existing code for inspiration though, as
adding a new method is probably easiest when following the same structure.
Experiments are managed using the [abed](https://github.com/GjjvdBurg/abed)
command line application. This facilitates running all the methods with all
their hyperparameter settings on all datasets.
Note that currently the methods print the output file to stdout, so if you
want to print from your script, use stderr.
#### Python
When adding a method in Python, you can start with the
[cpdbench_zero.py](./execs/python/cpdbench_zero.py) file as a template, as
this contains most of the boilerplate code. A script should take command line
arguments where ``-i/--input`` marks the path to a dataset file and optionally
can take further command line arguments for hyperparameter settings.
Specifying these items from the command line facilitates reproducibility.
Roughly, the main function of a Python method could look like this:
```python
# Adding a new Python method to CPDBench
def main():
args = parse_args()
# data is the raw dataset dictionary, mat is a T x d matrix of observations
data, mat = load_dataset(args.input)
# set algorithm parameters that are not varied in the grid search
defaults = {
'param_1': value_1,
'param_2': value_2
}
# combine command line arguments with defaults
parameters = make_param_dict(args, defaults)
# start the timer
start_time = time.time()
error = None
status = 'fail' # if not overwritten, it must have failed
# run the algorithm in a try/except
try:
locations = your_custom_method(mat, parameters)
status = 'success'
except Exception as err:
error = repr(err)
stop_time = time.time()
runtime = stop_time - start_time
# exit with error if the run failed
if status == 'fail':
exit_with_error(data, args, parameters, error, __file__)
# make sure locations are 0-based and integer!
exit_success(data, args, parameters, locations, runtime, __file__)
```
Remember to add the following to the bottom of the script so it can be run
from the command line:
```python
if __name__ == '__main__':
main()
```
If you need to add a timeout to your method, take a look at the
[BOCPDMS](./execs/python/cpdbench_bocpdms.py) example.
#### R
Adding a method implemented in R to the benchmark can be done similarly to how
it is done for Python. Again, the input file path and the hyperparameters are
specified by command line arguments, which are parsed using
[argparse](https://cran.r-project.org/web/packages/argparse/index.html). For R
scripts we use a number of utility functions in the
[utils.R](./execs/R/utils.R) file. To reliably load this file you can use the
``load.utils()`` function available in all R scripts.
The main function of a method implemented in R could be roughly as follows:
```R
main <- function()
{
args <- parse.args()
# load the data
data <- load.dataset(args$input)
# create list of default algorithm parameters
defaults <- list(param_1=value_1, param_2=value_2)
# combine defaults and command line arguments
params <- make.param.list(args, defaults)
# Start the timer
start.time <- Sys.time()
# call the detection function in a tryCatch
result <- tryCatch({
locs <- your.custom.method(data$mat, params)
list(locations=locs, error=NULL)
}, error=function(e) {
return(list(locations=NULL, error=e$message))
})
stop.time <- Sys.time()
# Compute runtime, note units='secs' is not optional!
runtime <- difftime(stop.time, start.time, units='secs')
if (!is.null(result$error))
exit.with.error(data$original, args, params, result$error)
# convert result$locations to 0-based if needed
exit.success(data$original, args, params, locations, runtime)
}
```
Remember to add the following to the bottom of the script so it can be run
from the command line:
```R
load.utils()
main()
```
#### Adding the method to the experimental configuration
When you've written the command line script to run your method and verified
that it works correctly, it's time to add it to the experiment configuration.
For this, we'll have to edit the [abed_conf.py](./abed_conf.py) file.
1. To add your method, located the ``METHODS`` list in the configuration file
and add an entry ``best_<yourmethod>`` and ``default_<yourmethod>``,
replacing ``<yourmethod>`` with the name of your method (without spaces or
underscores).
2. Next, add the method to the ``PARAMS`` dictionary. This is where you
specify all the hyperparameters that your method takes (for the ``best``
experiment). The hyperparameters are specified with a name and a list of
values to explore (see the current configuration for examples). For the
default experiment, add an entry ``"default_<yourmethod>" : {"no_param":
[0]}``. This ensures it will be run without any parameters.
3. Finally, add the command that needs to be executed to run your method to
the ``COMMANDS`` dictionary. You'll need an entry for ``best_<yourmethod>``
and for ``default_<yourmethod>``. Please use the existing entries as
examples. Methods implemented in R are run with Rscript. The ``{execdir}``,
``{datadir}``, and ``{dataset}`` values will be filled in by abed based on
the other settings. Use curly braces to specify hyperparameters, matching
the names of the fields in the ``PARAMS`` dictionary.
#### Dependencies
If your method needs external R or Python packages to operate, you can add
them to the respective dependency lists.
* For R, simply add the package name to the [Rpackages.txt](./Rpackages.txt)
file. Next, run ``make clean_R_venv`` and ``make R_venv`` to add the package
to the R virtual environment. It is recommended to be specific in the
version of the package you want to use in the ``Rpackages.txt`` file, for
future reference and reproducibility.
* For Python, individual methods use individual virtual environments, as can
be seen from the bocpdms and rbocpdms examples. These virtual environments
need to be activated in the ``COMMANDS`` section of the ``abed_conf.py``
file. Setting up these environments is done through the Makefile. Simply add
a ``requirements.txt`` file in your package similarly to what is done for
bocpdms and rbocpdms, copy and edit the corresponding lines in the Makefile,
and run ``make venv_<yourmethod>`` to build the virtual environment.
#### Running experiments
When you've added the method and set up the environment, run
```
$ abed reload_tasks
```
to have abed generate the new tasks for your method (see above under [Getting
Started](#getting-started)). Note that abed automatically does a Git commit
when you do this, so you may want to switch to a separate branch. You can see
the tasks that abed has generated (and thus the command that will be executed)
using the command:
```
$ abed explain_tbd_tasks
```
If you're satisfied with the commands, you can run the experiments using:
```
$ mpiexec -np 4 abed local
```
You can subsequently use the Makefile to generate updated figures and tables
with your method or dataset.
### Adding a new dataset
To add a new dataset to the benchmark you'll need both a dataset file (in JSON
format) and annotations (for evaluation). More information on how the datasets
are constructed can be found in the
[TCPD](https://github.com/alan-turing-institute/TCPD) repository, which also
includes a schema file. A high-level overview is as follows:
* Each dataset has a short name in the ``name`` field and a longer more
descriptive name in the ``longname`` field. The ``name`` field must be
unique.
* The number of observations and dimensions is defined in the ``n_obs`` and
``n_dim`` fields.
* The time axis is defined in the ``time`` field. This has at least an
``index`` field to mark the indices of each data point. At the moment, these
indices need to be consecutive integers. This entry mainly exist for a
future scenario where we may want to consider non-consecutive time axes. If
the time axis can be mapped to a date or time, then a type and format of
this field can be specified (see e.g. the [nile
dataset](https://github.com/alan-turing-institute/TCPD/blob/master/datasets/nile/nile.json#L8),
which has year labels).
* The actual observations are specified in the ``series`` field. This is an
ordered list of JSON objects, one for each dimension. Every dimension has a
label, a data type, and a ``"raw"`` field with the actual observations.
Missing values in the time series can be marked with ``null`` (see e.g.
[uk_coal_employ](https://github.com/alan-turing-institute/TCPD/blob/master/datasets/uk_coal_employ/uk_coal_employ.json#L236)
for an example).
If you want to evaluate the methods in the benchmark on a new dataset, you may
want to collect annotations for the dataset. These annotations can be
collected in the [annotations.json](./analysis/annotations/annotations.json)
file, which is an object that maps each dataset name to a map from the
annotator ID to the marked change points. You can collect annotations using
the [annotation tool](https://github.com/alan-turing-institute/annotatechange)
created for this project.
Finally, add your method to the ``DATASETS`` field in the ``abed_conf.py``
file. Proceed with running the experiments as described above.
## License
The code in this repository is licensed under the MIT license, unless
otherwise specified. See the [LICENSE file](LICENSE) for further details.
Reuse of the code in this repository is allowed, but should cite [our
paper](https://arxiv.org/abs/2003.06222).
## Notes
If you find any problems or have a suggestion for improvement of this
repository, please let us know as it will help us make this resource better
for everyone. You can open an issue on
[GitHub](https://github.com/alan-turing-institute/TCPDBench) or send an email
to ``gvandenburg at turing dot ac dot uk``.
|