diff options
Diffstat (limited to 'datasets/shanghai_license/README.md')
| -rw-r--r-- | datasets/shanghai_license/README.md | 29 |
1 files changed, 29 insertions, 0 deletions
diff --git a/datasets/shanghai_license/README.md b/datasets/shanghai_license/README.md new file mode 100644 index 0000000..f4c7026 --- /dev/null +++ b/datasets/shanghai_license/README.md @@ -0,0 +1,29 @@ +# Shanghai License Plate Applicants + +Source: +[Kaggle](https://www.kaggle.com/bogof666/shanghai-car-license-plate-auction-price). +Data licensed under [CC0: Public +Domain](https://creativecommons.org/publicdomain/zero/1.0/), so we can +redistribute it as part of this repository. + +There seems to be a clear sudden growth in the number of applicants. + +Note: according to [this discussion on +Kaggle](https://www.kaggle.com/bogof666/shanghai-car-license-plate-auction-price/discussion/73140), +the record for 2008-02 is missing because the license plates for January and +Feburary were auctioned off simultaneously in January. As this represents an +uneven measurement and a missing value, we choose to split the observation for +January and February 2008 in two, dividing the amount equally between the +months. An alternative would be to introduce a missing value in 2008-02, but +since many of the algorithms we wish to evaluate are not able to handle +missing values (and any imputation method would be incorrect), we believe this +is a reasonable way to deal with this issue. + +To obtain the ``shanghai_license.json`` file from the +``Shanghai_license_plate_price_-_Sheet3.csv`` file, simply run: + +``` +$ python convert.py Shanghai_license_plate_price_-_Sheet3.csv shanghai_license.json +``` + + |
