diff options
| author | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2021-01-14 17:30:56 +0000 |
|---|---|---|
| committer | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2021-01-14 17:30:56 +0000 |
| commit | c2058f5e5256f87ec1e79a2f3dbb358fd268a454 (patch) | |
| tree | f3fb7a16fbaec90937eec84e4ee440939822abb2 /README.md | |
| parent | Rename directories, remove extra test dir (diff) | |
| download | SyncRNG-c2058f5e5256f87ec1e79a2f3dbb358fd268a454.tar.gz SyncRNG-c2058f5e5256f87ec1e79a2f3dbb358fd268a454.zip | |
Bring back updated readme
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 180 |
1 files changed, 135 insertions, 45 deletions
@@ -1,14 +1,13 @@ -SyncRNG -======= +# SyncRNG + A synchronized Tausworthe RNG usable in R and Python. -Why? -==== +## Why? -This program was created because I needed to have the same random numbers in -both R and Python. Although both languages implement a Mersenne-Twister RNG, -the implementations are so different that it is not possible to get the same -random numbers with the same seed. +This program was created because it was desired to have the same random +numbers in both R and Python programs. Although both languages implement a +Mersenne-Twister RNG, the implementations are so different that it is not +possible to get the same random numbers with the same seed. SyncRNG is a Tausworthe RNG implemented in ``syncrng.c``, and linked to both R and Python. Since both use the same underlying C code, the random numbers will @@ -17,58 +16,61 @@ be the same in both languages, provided the same seed is used. You can read more about my motivations for creating this [here](https://gertjanvandenburg.com/blog/syncrng/). -How -=== +## Installation + +Installing the R package can be done through CRAN: + +``` +> install.packages('SyncRNG') +``` -First install the packages as stated under Installation. Then, in Python you -can do:: +The Python package can be installed using pip: - from SyncRNG import SyncRNG +``` +$ pip install syncrng +``` - s = SyncRNG(seed=123456) - for i in range(10): - print(s.randi()) +## Usage -Similarly, after installing the R library you can do in R:: +After installing the package, you can use the basic ``SyncRNG`` random number +generator. In Python you can do: - library(SyncRNG) - s <- SyncRNG(seed=123456) - for (i in 1:10) { - cat(s$randi(), '\n') - } +```python +>>> from SyncRNG import SyncRNG +>>> s = SyncRNG(seed=123456) +>>> for i in range(10): +>>> print(s.randi()) +``` + +And in R you can use: + +```r +> library(SyncRNG) +> s <- SyncRNG(seed=123456) +> for (i in 1:10) { +> cat(s$randi(), '\n') +> } +``` You'll notice that the random numbers are indeed the same. -R - User defined RNG --------------------- +### R: User defined RNG R allows the user to define a custom random number generator, which is then used for the common ``runif`` and ``rnorm`` functions in R. This has also been -implemented in SyncRNG as of version 1.3.0. To enable this, run:: - - library(SyncRNG) +implemented in SyncRNG as of version 1.3.0. To enable this, run: - set.seed(123456, 'user', 'user') - runif(10) +```r +> library(SyncRNG) +> set.seed(123456, 'user', 'user') +> runif(10) +``` These numbers are between [0, 1) and multiplying by ``2**32 - 1`` gives the same results as above. -Installation -============ - -Installing the R package can be done through CRAN:: - - install.packages('SyncRNG') - -The Python package can be installed using pip:: - - pip install syncrng - - -Usage -===== +### Functionality In both R and Python the following methods are available for the ``SyncRNG`` class: @@ -79,9 +81,97 @@ class: 3. ``randbelow(n)``: generate a random integer below a given integer ``n``. 4. ``shuffle(x)``: generate a permutation of a given list of numbers ``x``. -Notes -===== +### Creating the same train/test splits + +A common use case for this package is to create the same train and test splits +in R and Python. Below are some code examples that illustrate how to do this. +Both assume you have a matrix ``X`` with `100` rows. + +In R: + +```r + +# This function creates a list with train and test indices for each fold +k.fold <- function(n, K, shuffle=TRUE, seed=0) +{ + idxs <- c(1:n) + if (shuffle) { + rng <- SyncRNG(seed=seed) + idxs <- rng$shuffle(idxs) + } + + # Determine fold sizes + fsizes <- c(1:K)*0 + floor(n / K) + mod <- n %% K + if (mod > 0) + fsizes[1:mod] <- fsizes[1:mod] + 1 + + out <- list(n=n, num.folds=K) + current <- 1 + for (f in 1:K) { + fs <- fsizes[f] + startidx <- current + stopidx <- current + fs - 1 + test.idx <- idxs[startidx:stopidx] + train.idx <- idxs[!(idxs %in% test.idx)] + out$testidxs[[f]] <- test.idx + out$trainidxs[[f]] <- train.idx + current <- stopidx + } + return(out) +} + +# Which you can use as follows +folds <- k.fold(nrow(X), K=10, shuffle=T, seed=123) +for (f in 1:folds$num.folds) { + X.train <- X[folds$trainidx[[f]], ] + X.test <- X[folds$testidx[[f]], ] + + # continue using X.train and X.test here +} +``` + +And in Python: + +```python +def k_fold(n, K, shuffle=True, seed=0): + """Generator for train and test indices""" + idxs = list(range(n)) + if shuffle: + rng = SyncRNG(seed=seed) + idxs = rng.shuffle(idxs) + + fsizes = [n // K]*K + mod = n % K + if mod > 0: + fsizes[:mod] = [x+1 for x in fsizes[:mod]] + + current = 0 + for fs in fsizes: + startidx = current + stopidx = current + fs + test_idx = idxs[startidx:stopidx] + train_idx = [x for x in idxs if not x in test_idx] + yield train_idx, test_idx + current = stopidx + +# Which you can use as follows +kf = k_fold(X.shape[0], K=3, shuffle=True, seed=123) +for trainidx, testidx in kf: + X_train = X[trainidx, :] + X_test = X[testidx, :] + + # continue using X_train and X_test here + +``` + +## Notes The random numbers are uniformly distributed on ``[0, 2^32 - 1]``. +## Questions and Issues +If you have questions, comments, or suggestions about SyncRNG or you encounter +a problem, please open an issue [on +GitHub](https://github.com/GjjvdBurg/SyncRNG/). Please don't hesitate to +contact me, you're helping to make this project better for everyone! |
