diff options
Diffstat (limited to 'python/README.md')
| -rw-r--r--[l---------] | python/README.md | 192 |
1 files changed, 191 insertions, 1 deletions
diff --git a/python/README.md b/python/README.md index 32d46ee..181f72c 120000..100644 --- a/python/README.md +++ b/python/README.md @@ -1 +1,191 @@ -../README.md
\ No newline at end of file +# SyncRNG + +[](https://github.com/GjjvdBurg/SyncRNG/actions) +[](https://cran.r-project.org/web/packages/SyncRNG/index.html) +[](https://cran.r-project.org/web/packages/SyncRNG/index.html) +[](https://pypi.org/project/SyncRNG) +[](https://pepy.tech/project/SyncRNG) + +Generate the same random numbers in R and Python. + +## Why? + +This program was created because it was desired to have the same random +numbers in both R and Python programs. Although both languages implement a +Mersenne-Twister random number generator (RNG), the implementations are so +different that it is not possible to get the same random numbers, even with +the same seed. + +SyncRNG is a "Tausworthe" RNG implemented in C and linked to both R and +Python. Since both use the same underlying C code, the random numbers will be +the same in both languages when the same seed is used. + +You can read more about my motivations for creating this +[here](https://gertjanvandenburg.com/blog/syncrng/). + +## Installation + +Installing the R package can be done through CRAN: + +``` +> install.packages('SyncRNG') +``` + +The Python package can be installed using pip: + +``` +$ pip install syncrng +``` + +## Usage + +After installing the package, you can use the basic ``SyncRNG`` random number +generator. In Python you can do: + + +```python +>>> from SyncRNG import SyncRNG +>>> s = SyncRNG(seed=123456) +>>> for i in range(10): +>>> print(s.randi()) +``` + +And in R you can use: + +```r +> library(SyncRNG) +> s <- SyncRNG(seed=123456) +> for (i in 1:10) { +> cat(s$randi(), '\n') +> } +``` + +You'll notice that the random numbers are indeed the same. + +### R: User defined RNG + +R allows the user to define a custom random number generator, which is then +used for the common ``runif`` and ``rnorm`` functions in R. This has also been +implemented in SyncRNG as of version 1.3.0. To enable this, run: + +```r +> library(SyncRNG) +> set.seed(123456, 'user', 'user') +> runif(10) +``` + +These numbers are between [0, 1) and multiplying by ``2**32 - 1`` gives the +same results as above. + +### Functionality + +In both R and Python the following methods are available for the ``SyncRNG`` +class: + +1. ``randi()``: generate a random integer on the interval [0, 2^32). +2. ``rand()``: generate a random floating point number on the interval [0.0, + 1.0) +3. ``randbelow(n)``: generate a random integer below a given integer ``n``. +4. ``shuffle(x)``: generate a permutation of a given list of numbers ``x``. + +Functionality is deliberately kept minimal to make maintaining this library +easier. It is straightforward to build more advanced applications on the +existing methods, as the following example shows. + +### Creating the same train/test splits + +A common use case for this package is to create the same train and test splits +in R and Python. Below are some code examples that illustrate how to do this. +Both assume you have a matrix ``X`` with `100` rows. + +In R: + +```r + +# This function creates a list with train and test indices for each fold +k.fold <- function(n, K, shuffle=TRUE, seed=0) +{ + idxs <- c(1:n) + if (shuffle) { + rng <- SyncRNG(seed=seed) + idxs <- rng$shuffle(idxs) + } + + # Determine fold sizes + fsizes <- c(1:K)*0 + floor(n / K) + mod <- n %% K + if (mod > 0) + fsizes[1:mod] <- fsizes[1:mod] + 1 + + out <- list(n=n, num.folds=K) + current <- 1 + for (f in 1:K) { + fs <- fsizes[f] + startidx <- current + stopidx <- current + fs - 1 + test.idx <- idxs[startidx:stopidx] + train.idx <- idxs[!(idxs %in% test.idx)] + out$testidxs[[f]] <- test.idx + out$trainidxs[[f]] <- train.idx + current <- stopidx + } + return(out) +} + +# Which you can use as follows +folds <- k.fold(nrow(X), K=10, shuffle=T, seed=123) +for (f in 1:folds$num.folds) { + X.train <- X[folds$trainidx[[f]], ] + X.test <- X[folds$testidx[[f]], ] + + # continue using X.train and X.test here +} +``` + +And in Python: + +```python +def k_fold(n, K, shuffle=True, seed=0): + """Generator for train and test indices""" + idxs = list(range(n)) + if shuffle: + rng = SyncRNG(seed=seed) + idxs = rng.shuffle(idxs) + + fsizes = [n // K]*K + mod = n % K + if mod > 0: + fsizes[:mod] = [x+1 for x in fsizes[:mod]] + + current = 0 + for fs in fsizes: + startidx = current + stopidx = current + fs + test_idx = idxs[startidx:stopidx] + train_idx = [x for x in idxs if not x in test_idx] + yield train_idx, test_idx + current = stopidx + +# Which you can use as follows +kf = k_fold(X.shape[0], K=3, shuffle=True, seed=123) +for trainidx, testidx in kf: + X_train = X[trainidx, :] + X_test = X[testidx, :] + + # continue using X_train and X_test here +``` + +## Notes + +The random numbers are uniformly distributed on ``[0, 2^32 - 1]``. No +attention has been paid to thread-safety and you shouldn't use this random +number generator for cryptographic applications. + +## Questions and Issues + +If you have questions, comments, or suggestions about SyncRNG or you encounter +a problem, please open an issue [on +GitHub](https://github.com/GjjvdBurg/SyncRNG/). Please don't hesitate to +contact me, you're helping to make this project better for everyone! If you +prefer not to use Github you can email me at ``gertjanvandenburg at gmail dot +com``. |
