diff options
| author | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2020-05-25 17:24:33 +0100 |
|---|---|---|
| committer | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2020-05-25 17:24:33 +0100 |
| commit | 292c9bf4013e3c09ba0b08470a3c974b422d3abe (patch) | |
| tree | 68ddfc0a9ced0dc7d45ad0cd5ba6a9f14807d250 | |
| parent | update readme with new R feature (diff) | |
| download | SyncRNG-292c9bf4013e3c09ba0b08470a3c974b422d3abe.tar.gz SyncRNG-292c9bf4013e3c09ba0b08470a3c974b422d3abe.zip | |
Update README
| -rw-r--r-- | README.md | 177 | ||||
| -rw-r--r-- | README.rst | 92 |
2 files changed, 177 insertions, 92 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..1961ace --- /dev/null +++ b/README.md @@ -0,0 +1,177 @@ +# SyncRNG + +A synchronized Tausworthe RNG usable in R and Python. + +## Why? + +This program was created because it was desired to have the same random +numbers in both R and Python programs. Although both languages implement a +Mersenne-Twister RNG, the implementations are so different that it is not +possible to get the same random numbers with the same seed. + +SyncRNG is a Tausworthe RNG implemented in ``syncrng.c``, and linked to both R +and Python. Since both use the same underlying C code, the random numbers will +be the same in both languages, provided the same seed is used. + +You can read more about my motivations for creating this +[here](https://gertjanvandenburg.com/blog/syncrng/). + +## Installation + +Installing the R package can be done through CRAN: + +``` +> install.packages('SyncRNG') +``` + +The Python package can be installed using pip: + +``` +$ pip install syncrng +``` + +## Usage + +After installing the package, you can use the basic ``SyncRNG`` random number +generator. In Python you can do: + + +```python +>>> from SyncRNG import SyncRNG +>>> s = SyncRNG(seed=123456) +>>> for i in range(10): +>>> print(s.randi()) +``` + +And in R you can use: + +```r +> library(SyncRNG) +> s <- SyncRNG(seed=123456) +> for (i in 1:10) { +> cat(s$randi(), '\n') +> } +``` + +You'll notice that the random numbers are indeed the same. + +### R: User defined RNG + +R allows the user to define a custom random number generator, which is then +used for the common ``runif`` and ``rnorm`` functions in R. This has also been +implemented in SyncRNG as of version 1.3.0. To enable this, run: + +```r +> library(SyncRNG) +> set.seed(123456, 'user', 'user') +> runif(10) +``` + +These numbers are between [0, 1) and multiplying by ``2**32 - 1`` gives the +same results as above. + +### Functionality + +In both R and Python the following methods are available for the ``SyncRNG`` +class: + +1. ``randi()``: generate a random integer on the interval [0, 2^32). +2. ``rand()``: generate a random floating point number on the interval [0.0, + 1.0) +3. ``randbelow(n)``: generate a random integer below a given integer ``n``. +4. ``shuffle(x)``: generate a permutation of a given list of numbers ``x``. + +### Creating the same train/test splits + +A common use case for this package is to create the same train and test splits +in R and Python. Below are some code examples that illustrate how to do this. +Both assume you have a matrix ``X`` with `100` rows. + +In R: + +```r + +# This function creates a list with train and test indices for each fold +k.fold <- function(n, K, shuffle=TRUE, seed=0) +{ + idxs <- c(1:n) + if (shuffle) { + rng <- SyncRNG(seed=seed) + idxs <- rng$shuffle(idxs) + } + + # Determine fold sizes + fsizes <- c(1:K)*0 + floor(n / K) + mod <- n %% K + if (mod > 0) + fsizes[1:mod] <- fsizes[1:mod] + 1 + + out <- list(n=n, num.folds=K) + current <- 1 + for (f in 1:K) { + fs <- fsizes[f] + startidx <- current + stopidx <- current + fs - 1 + test.idx <- idxs[startidx:stopidx] + train.idx <- idxs[!(idxs %in% test.idx)] + out$testidxs[[f]] <- test.idx + out$trainidxs[[f]] <- train.idx + current <- stopidx + } + return(out) +} + +# Which you can use as follows +folds <- k.fold(nrow(X), K=10, shuffle=T, seed=123) +for (f in 1:folds$num.folds) { + X.train <- X[folds$trainidx[[f]], ] + X.test <- X[folds$testidx[[f]], ] + + # continue using X.train and X.test here +} +``` + +And in Python: + +```python +def k_fold(n, K, shuffle=True, seed=0): + """Generator for train and test indices""" + idxs = list(range(n)) + if shuffle: + rng = SyncRNG(seed=seed) + idxs = rng.shuffle(idxs) + + fsizes = [n // K]*K + mod = n % K + if mod > 0: + fsizes[:mod] = [x+1 for x in fsizes[:mod]] + + current = 0 + for fs in fsizes: + startidx = current + stopidx = current + fs + test_idx = idxs[startidx:stopidx] + train_idx = [x for x in idxs if not x in test_idx] + yield train_idx, test_idx + current = stopidx + +# Which you can use as follows +kf = k_fold(X.shape[0], K=3, shuffle=True, seed=123) +for trainidx, testidx in kf: + X_train = X[trainidx, :] + X_test = X[testidx, :] + + # continue using X_train and X_test here + +``` + +## Notes + +The random numbers are uniformly distributed on ``[0, 2^32 - 1]``. + +## Questions and Issues + +If you have questions, comments, or suggestions about SyncRNG or you encounter +a problem, please open an issue [on +GitHub](https://github.com/GjjvdBurg/SyncRNG/). Please don't hesitate to +contact me, you're helping to make this project better for everyone! diff --git a/README.rst b/README.rst deleted file mode 100644 index c5bf114..0000000 --- a/README.rst +++ /dev/null @@ -1,92 +0,0 @@ -======= -SyncRNG -======= -A synchronized Tausworthe RNG usable in R and Python. - -Why? -==== - -This program was created because it was desired to have the same random -numbers in both R and Python programs. Although both languages implement a -Mersenne-Twister RNG, the implementations are so different that it is not -possible to get the same random numbers with the same seed. - -SyncRNG is a Tausworthe RNG implemented in ``syncrng.c``, and linked to both R -and Python. Since both use the same underlying C code, the random numbers will -be the same in both languages, provided the same seed is used. - -You can read more about my motivations for creating this `here -<https://gertjanvandenburg.com/blog/syncrng/>`_. - -How -=== - -First install the packages as stated under Installation. Then, in Python you -can do:: - - from SyncRNG import SyncRNG - - s = SyncRNG(seed=123456) - for i in range(10): - print(s.randi()) - -Similarly, after installing the R library you can do in R:: - - library(SyncRNG) - - s <- SyncRNG(seed=123456) - for (i in 1:10) { - cat(s$randi(), '\n') - } - -You'll notice that the random numbers are indeed the same. - -R - User defined RNG --------------------- - -R allows the user to define a custom random number generator, which is then -used for the common ``runif`` and ``rnorm`` functions in R. This has also been -implemented in SyncRNG as of version 1.3.0. To enable this, run:: - - library(SyncRNG) - - set.seed(123456, 'user', 'user') - runif(10) - -These numbers are between [0, 1) and multiplying by ``2**32 - 1`` gives the -same results as above. - -Installation -============ - -Installing the R package can be done through CRAN:: - - install.packages('SyncRNG') - -The Python package can be installed using pip:: - - pip install syncrng - - -Usage -===== - -In both R and Python the following methods are available for the ``SyncRNG`` -class: - -1. ``randi()``: generate a random integer on the interval [0, 2^32). -2. ``rand()``: generate a random floating point number on the interval [0.0, - 1.0) -3. ``randbelow(n)``: generate a random integer below a given integer ``n``. -4. ``shuffle(x)``: generate a permutation of a given list of numbers ``x``. - -Notes -===== - -The random numbers are uniformly distributed on ``[0, 2^32 - 1]``. - -Questions and Issues -==================== - -If you have questions about SyncRNG or you encounter a problem, please open an -`issue on GitHub <https://github.com/GjjvdBurg/SyncRNG/>`_. |
