# SyncRNG

[![build](https://github.com/GjjvdBurg/SyncRNG/workflows/build/badge.svg)](https://github.com/GjjvdBurg/SyncRNG/actions)
[![CRAN version](https://www.r-pkg.org/badges/version/SyncRNG)](https://cran.r-project.org/web/packages/SyncRNG/index.html)
[![CRAN package downloads](https://cranlogs.r-pkg.org/badges/grand-total/SyncRNG)](https://cran.r-project.org/web/packages/SyncRNG/index.html)
[![PyPI version](https://badge.fury.io/py/SyncRNG.svg)](https://pypi.org/project/SyncRNG)
[![Python package downloads](https://pepy.tech/badge/SyncRNG)](https://pepy.tech/project/SyncRNG)

Generate the same random numbers in R and Python.

## Why?

This program was created because it was desired to have the same random 
numbers in both R and Python programs. Although both languages implement a 
Mersenne-Twister random number generator (RNG), the implementations are so 
different that it is not possible to get the same random numbers, even with 
the same seed.

SyncRNG is a "Tausworthe" RNG implemented in C and linked to both R and 
Python. Since both use the same underlying C code, the random numbers will be 
the same in both languages when the same seed is used.

You can read more about my motivations for creating this 
[here](https://gertjanvandenburg.com/blog/syncrng/).

## Installation

Installing the R package can be done through CRAN:

```
> install.packages('SyncRNG')
```

The Python package can be installed using pip:

```
$ pip install syncrng
```

## Usage

After installing the package, you can use the basic ``SyncRNG`` random number 
generator. In Python you can do:


```python
>>> from SyncRNG import SyncRNG
>>> s = SyncRNG(seed=123456)
>>> for i in range(10):
>>>     print(s.randi())
```

And in R you can use:

```r
> library(SyncRNG)
> s <- SyncRNG(seed=123456)
> for (i in 1:10) {
>    cat(s$randi(), '\n')
> }
```

You'll notice that the random numbers are indeed the same.

### R: User defined RNG

R allows the user to define a custom random number generator, which is then 
used for the common ``runif`` and ``rnorm`` functions in R. This has also been 
implemented in SyncRNG as of version 1.3.0. To enable this, run:

```r
> library(SyncRNG)
> set.seed(123456, 'user', 'user')
> runif(10)
```

These numbers are between [0, 1) and multiplying by ``2**32 - 1`` gives the 
same results as above.

### Functionality

In both R and Python the following methods are available for the ``SyncRNG`` 
class:

1. ``randi()``: generate a random integer on the interval [0, 2^32).
2. ``rand()``: generate a random floating point number on the interval [0.0, 
   1.0)
3. ``randbelow(n)``: generate a random integer below a given integer ``n``.
4. ``shuffle(x)``: generate a permutation of a given list of numbers ``x``.

Functionality is deliberately kept minimal to make maintaining this library 
easier. It is straightforward to build more advanced applications on the 
existing methods, as the following example shows.

### Creating the same train/test splits

A common use case for this package is to create the same train and test splits 
in R and Python. Below are some code examples that illustrate how to do this. 
Both assume you have a matrix ``X`` with `100` rows.

In R:

```r

# This function creates a list with train and test indices for each fold
k.fold <- function(n, K, shuffle=TRUE, seed=0)
{
	idxs <- c(1:n)
	if (shuffle) {
		rng <- SyncRNG(seed=seed)
		idxs <- rng$shuffle(idxs)
	}

	# Determine fold sizes
        fsizes <- c(1:K)*0 + floor(n / K)
        mod <- n %% K
        if (mod > 0)
		fsizes[1:mod] <- fsizes[1:mod] + 1

        out <- list(n=n, num.folds=K)
	current <- 1
        for (f in 1:K) {
		fs <- fsizes[f]
		startidx <- current
		stopidx <- current + fs - 1
		test.idx <- idxs[startidx:stopidx]
		train.idx <- idxs[!(idxs %in% test.idx)]
		out$testidxs[[f]] <- test.idx
		out$trainidxs[[f]] <- train.idx
		current <- stopidx
	}
	return(out)
}

# Which you can use as follows
folds <- k.fold(nrow(X), K=10, shuffle=T, seed=123)
for (f in 1:folds$num.folds) {
        X.train <- X[folds$trainidx[[f]], ]
        X.test <- X[folds$testidx[[f]], ]

        # continue using X.train and X.test here
}
```

And in Python:

```python
def k_fold(n, K, shuffle=True, seed=0):
    """Generator for train and test indices"""
    idxs = list(range(n))
    if shuffle:
        rng = SyncRNG(seed=seed)
        idxs = rng.shuffle(idxs)

    fsizes = [n // K]*K
    mod = n % K
    if mod > 0:
        fsizes[:mod] = [x+1 for x in fsizes[:mod]]

    current = 0
    for fs in fsizes:
        startidx = current
        stopidx = current + fs
        test_idx = idxs[startidx:stopidx]
        train_idx = [x for x in idxs if not x in test_idx]
        yield train_idx, test_idx
        current = stopidx

# Which you can use as follows
kf = k_fold(X.shape[0], K=3, shuffle=True, seed=123)
for trainidx, testidx in kf:
    X_train = X[trainidx, :]
    X_test = X[testidx, :]

    # continue using X_train and X_test here
```

## Notes

The random numbers are uniformly distributed on ``[0, 2^32 - 1]``. No 
attention has been paid to thread-safety and you shouldn't use this random 
number generator for cryptographic applications.

## Questions and Issues

If you have questions, comments, or suggestions about SyncRNG or you encounter 
a problem, please open an issue [on 
GitHub](https://github.com/GjjvdBurg/SyncRNG/). Please don't hesitate to 
contact me, you're helping to make this project better for everyone! If you 
prefer not to use Github you can email me at ``gertjanvandenburg at gmail dot 
com``.