1 files changed, 300 insertions, 0 deletions
diff --git a/docs/README.rst b/docs/README.rst
new file mode 100644
index 0000000..70a27d2
--- /dev/null
+++ b/docs/README.rst
@@ -0,0 +1,300 @@
+
+GenSVM Python Package
+=====================
+
+
+.. image:: https://travis-ci.org/GjjvdBurg/PyGenSVM.svg?branch=master
+   :target: https://travis-ci.org/GjjvdBurg/PyGenSVM
+   :alt: Build Status
+
+
+.. image:: https://readthedocs.org/projects/gensvm/badge/?version=latest
+   :target: https://gensvm.readthedocs.io/en/latest/?badge=latest
+   :alt: Documentation Status
+
+
+This is the Python package for the GenSVM multiclass classifier by `Gerrit 
+J.J. van den Burg <https://gertjanvandenburg.com>`_ and `Patrick J.F. 
+Groenen <https://personal.eur.nl/groenen/>`_.
+
+**Useful links:**
+
+
+* `PyGenSVM on GitHub <https://github.com/GjjvdBurg/PyGenSVM>`_
+* `PyGenSVM on PyPI <https://pypi.org/project/gensvm/>`_
+* `Package documentation <https://gensvm.readthedocs.io/en/latest/>`_
+* Journal paper: `GenSVM: A Generalized Multiclass Support Vector 
+  Machine <http://www.jmlr.org/papers/v17/14-526.html>`_ JMLR, 17(225):1−42, 
+  2016.
+* There is also an `R package <https://github.com/GjjvdBurg/RGenSVM>`_
+* Or you can directly use `the C library <https://github.com/GjjvdBurg/GenSVM>`_
+
+Installation
+------------
+
+**Before** GenSVM can be installed, a working NumPy installation is required. 
+so GenSVM can be installed using the following command:
+
+.. code-block:: bash
+
+   $ pip install numpy && pip install gensvm
+
+If you encounter any errors, please `open an issue on 
+GitHub <https://github.com/GjjvdBurg/PyGenSVM>`_. Don't hesitate, you're helping 
+to make this project better!
+
+Citing
+------
+
+If you use this package in your research please cite the paper, for instance 
+using the following BibTeX entry:
+
+.. code-block:: bib
+
+   @article{JMLR:v17:14-526,
+           author  = {{van den Burg}, G. J. J. and Groenen, P. J. F.},
+           title   = {{GenSVM}: A Generalized Multiclass Support Vector Machine},
+           journal = {Journal of Machine Learning Research},
+           year    = {2016},
+           volume  = {17},
+           number  = {225},
+           pages   = {1-42},
+           url     = {http://jmlr.org/papers/v17/14-526.html}
+   }
+
+Usage
+-----
+
+The package contains two classes to fit the GenSVM model: `GenSVM <https://gensvm.readthedocs.io/en/latest/#gensvm>`_ and 
+`GenSVMGridSearchCV <https://gensvm.readthedocs.io/en/latest/#gensvmgridsearchcv>`_.  These classes respectively fit a single GenSVM model or 
+fit a series of models for a parameter grid search. The interface to these 
+classes is the same as that of classifiers in `Scikit-Learn <http://scikit-learn.org/stable/index.html>`_  so users 
+familiar with Scikit-Learn should have no trouble using this package.  Below 
+we will show some examples of using the GenSVM classifier and the 
+GenSVMGridSearchCV class in practice.
+
+In the examples we assume that we have loaded the `iris 
+dataset <http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html>`_ 
+from Scikit-Learn as follows:
+
+.. code-block:: python
+
+   >>> from sklearn.datasets import load_iris
+   >>> from sklearn.model_selection import train_test_split
+   >>> from sklearn.preprocessing import MaxAbsScaler
+   >>> X, y = load_iris(return_X_y=True)
+   >>> X_train, X_test, y_train, y_test = train_test_split(X, y)
+   >>> scaler = MaxAbsScaler().fit(X_train)
+   >>> X_train, X_test = scaler.transform(X_train), scaler.transform(X_test)
+
+Note that we scale the data using the 
+`MaxAbsScaler <http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html>`_
+function. This scales the columns of the data matrix to ``[-1, 1]`` without 
+breaking sparsity. Scaling the dataset can have a significant effect on the 
+computation time of GenSVM and is `generally recommended for 
+SVMs <https://stats.stackexchange.com/q/65094>`_.
+
+Example 1: Fitting a single GenSVM model
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Let's start by fitting the most basic GenSVM model on the training data:
+
+.. code-block:: python
+
+   >>> from gensvm import GenSVM
+   >>> clf = GenSVM()
+   >>> clf.fit(X_train, y_train)
+   GenSVM(coef=0.0, degree=2.0, epsilon=1e-06, gamma='auto', kappa=0.0,
+   kernel='linear', kernel_eigen_cutoff=1e-08, lmd=1e-05,
+   max_iter=100000000.0, p=1.0, random_state=None, verbose=0,
+   weights='unit')
+
+With the model fitted, we can predict the test dataset:
+
+.. code-block:: python
+
+   >>> y_pred = clf.predict(X_test)
+
+Next, we can compute a score for the predictions. The GenSVM class has a 
+``score`` method which computes the 
+`accuracy_score <http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html>`_
+for the predictions. In the GenSVM paper, the `adjusted Rand 
+index <https://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index>`_ is often 
+used to compare performance. We illustrate both options below (your results 
+may be different depending on the exact train/test split):
+
+.. code-block:: python
+
+   >>> clf.score(X_test, y_test)
+   1.0
+   >>> from sklearn.metrics import adjusted_rand_score
+   >>> adjusted_rand_score(clf.predict(X_test), y_test)
+   1.0
+
+We can try this again by changing the model parameters, for instance we can 
+turn on verbosity and use the Euclidean norm in the GenSVM model by setting ``p = 2``\ :
+
+.. code-block:: python
+
+   >>> clf2 = GenSVM(verbose=True, p=2)
+   >>> clf2.fit(X_train, y_train)
+   Starting main loop.
+   Dataset:
+       n = 112
+       m = 4
+       K = 3
+   Parameters:
+       kappa = 0.000000
+       p = 2.000000
+       lambda = 0.0000100000000000
+       epsilon = 1e-06
+
+   iter = 0, L = 3.4499531579689533, Lbar = 7.3369415851139745, reldiff = 1.1266786095824437
+   ...
+   Optimization finished, iter = 4046, loss = 0.0230726364692517, rel. diff. = 0.0000009998645783
+   Number of support vectors: 9
+   GenSVM(coef=0.0, degree=2.0, epsilon=1e-06, gamma='auto', kappa=0.0,
+       kernel='linear', kernel_eigen_cutoff=1e-08, lmd=1e-05,
+       max_iter=100000000.0, p=2, random_state=None, verbose=True,
+       weights='unit')
+
+For other parameters that can be tuned in the GenSVM model, see `GenSVM <https://gensvm.readthedocs.io/en/latest/#gensvm>`_.
+
+Example 2: Fitting a GenSVM model with a "warm start"
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+One of the key features of the GenSVM classifier is that training can be 
+accelerated by using so-called "warm-starts". This way the optimization can be 
+started in a location that is closer to the final solution than a random 
+starting position would be. To support this, the ``fit`` method of the GenSVM 
+class has an optional ``seed_V`` parameter. We'll illustrate how this can be 
+used below.
+
+We start with relatively large value for the ``epsilon`` parameter in the 
+model. This is the stopping parameter that determines how long the 
+optimization continues (and therefore how exact the fit is).
+
+.. code-block:: python
+
+   >>> clf1 = GenSVM(epsilon=1e-3)
+   >>> clf1.fit(X_train, y_train)
+   ...
+   >>> clf1.n_iter_
+   163
+
+The ``n_iter_`` attribute tells us how many iterations the model did. Now, we 
+can use the solution of this model to start the training for the next model:
+
+.. code-block:: python
+
+   >>> clf2 = GenSVM(epsilon=1e-8)
+   >>> clf2.fit(X_train, y_train, seed_V=clf1.combined_coef_)
+   ...
+   >>> clf2.n_iter_
+   3196
+
+Compare this to a model with the same stopping parameter, but without the warm 
+start:
+
+.. code-block:: python
+
+   >>> clf2.fit(X_train, y_train)
+   ...
+   >>> clf2.n_iter_
+   3699
+
+So we saved about 500 iterations! This effect will be especially significant 
+with large datasets and when you try out many parameter configurations. 
+Therefore this technique is built into the `GenSVMGridSearchCV <https://gensvm.readthedocs.io/en/latest/#gensvmgridsearchcv>`_ class that can 
+be used to do a grid search of parameters.
+
+Example 3: Running a GenSVM grid search
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Often when we're fitting a machine learning model such as GenSVM, we have to 
+try several parameter configurations to figure out which one performs best on 
+our given dataset. This is usually combined with `cross 
+validation <http://scikit-learn.org/stable/modules/cross_validation.html>`_ to 
+avoid overfitting. To do this efficiently and to make use of warm starts, the 
+`GenSVMGridSearchCV <https://gensvm.readthedocs.io/en/latest/#gensvmgridsearchcv>`_ class is available. This class works in the same way as 
+the 
+`GridSearchCV <http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html>`_
+class of `Scikit-Learn <http://scikit-learn.org/stable/index.html>`_\ , but uses the GenSVM C library for speed.
+
+To do a grid search, we first have to define the parameters that we want to 
+vary and what values we want to try:
+
+.. code-block:: python
+
+   >>> from gensvm import GenSVMGridSearchCV
+   >>> param_grid = {'p': [1.0, 2.0], 'lmd': [1e-8, 1e-6, 1e-4, 1e-2, 1.0], 'kappa': [-0.9, 0.0] }
+
+For the values that are not varied in the parameter grid, the default values 
+will be used. This means that if you want to change a specific value (such as 
+``epsilon`` for instance), you can add this to the parameter grid as a 
+parameter with a single value to try (e.g. ``'epsilon': [1e-8]``\ ).
+
+Running the grid search is now straightforward:
+
+.. code-block:: python
+
+   >>> gg = GenSVMGridSearchCV(param_grid)
+   >>> gg.fit(X_train, y_train)
+   GenSVMGridSearchCV(cv=None, iid=True,
+         param_grid={'p': [1.0, 2.0], 'lmd': [1e-06, 0.0001, 0.01, 1.0], 'kappa': [-0.9, 0.0]},
+         refit=True, return_train_score=True, scoring=None, verbose=0)
+
+Note that if we have set ``refit=True`` (the default), then we can use the 
+`GenSVMGridSearchCV <https://gensvm.readthedocs.io/en/latest/#gensvmgridsearchcv>`_ instance to predict or score using the best estimator 
+found in the grid search:
+
+.. code-block:: python
+
+   >>> y_pred = gg.predict(X_test)
+   >>> gg.score(X_test, y_test)
+   1.0
+
+A nice feature borrowed from `Scikit-Learn`_ is that the results from the grid 
+search can be represented as a ``pandas`` DataFrame:
+
+.. code-block:: python
+
+   >>> from pandas import DataFrame
+   >>> df = DataFrame(gg.cv_results_)
+
+This can make it easier to explore the results of the grid search.
+
+Known Limitations
+-----------------
+
+The following are known limitations that are on the roadmap for a future 
+release of the package. If you need any of these features, please vote on them 
+on the linked GitHub issues (this can make us add them sooner!).
+
+
+#. `Support for sparse 
+   matrices <https://github.com/GjjvdBurg/PyGenSVM/issues/1>`_. NumPy supports 
+   sparse matrices, as does the GenSVM C library. Getting them to work 
+   together requires some additional effort. In the meantime, if you really 
+   want to use sparse data with GenSVM (this can lead to significant 
+   speedups!), check out the GenSVM C library.
+#. `Specification of class misclassification 
+   weights <https://github.com/GjjvdBurg/PyGenSVM/issues/3>`_. Currently, 
+   incorrectly classification an object from class A to class C is as bad as 
+   incorrectly classifying an object from class B to class C. Depending on the 
+   application, this may not be the desired effect. Adding class 
+   misclassification weights can solve this issue.
+
+Questions and Issues
+--------------------
+
+If you have any questions or encounter any issues with using this package, 
+please ask them on `GitHub <https://github.com/GjjvdBurg/PyGenSVM>`_.
+
+License
+-------
+
+This package is licensed under the GNU General Public License version 3. 
+
+Copyright (c) G.J.J. van den Burg, excluding the sections of the code that are 
+explicitly marked to come from Scikit-Learn.