added documentation

author: Gertjan van den Burg <gertjanvandenburg@gmail.com> 2017-12-12 20:19:12 -0500
committer: Gertjan van den Burg <gertjanvandenburg@gmail.com> 2017-12-12 20:19:12 -0500
commit: 7d255c08c589a443aa72ff247b46022204a2ef22 (patch)
tree: 68c8f872966852d5627cef748da05612f693e4ef
parent: added gridsearch and extended gensvm class (diff)
download: pygensvm-7d255c08c589a443aa72ff247b46022204a2ef22.tar.gz
pygensvm-7d255c08c589a443aa72ff247b46022204a2ef22.zip
5 files changed, 526 insertions, 0 deletions
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
new file mode 100644
index 0000000..9a203e1
--- /dev/null
+++ b/CHANGELOG.rst
@@ -0,0 +1,2 @@
+Change Log
+==========
diff --git a/README.rst b/README.rst
index e69de29..0182103 100644
--- a/README.rst
+++ b/README.rst
@@ -0,0 +1,267 @@
+GenSVM Python Package
+=====================
+
+This is the documentation of the Python package for the GenSVM classifier, 
+introduced in `GenSVM: A Generalized Multiclass Support Vector Machine 
+<http://www.jmlr.org/papers/v17/14-526.html>`_ by `Gerrit J.J. van den Burg 
+<https://gertjanvandenburg.com>`_ and `Patrick J.F. Groenen 
+<https://personal.eur.nl/groenen/>`_.
+
+The source code of this package is available on GitHub at: 
+`https://github.com/GjjvdBurg/PyGenSVM 
+<https://github.com/GjjvdBurg/PyGenSVM>`_.
+
+Installation
+------------
+
+GenSVM can be easily installed through pip:
+
+.. code:: bash
+
+    pip install gensvm
+
+Usage
+-----
+
+The package contains two classes to fit the GenSVM model: :class:`GenSVM` and 
+:class:`GenSVMGridSearchCV`. These classes respectively fit a single GenSVM 
+model or fit a series of models for a parameter grid search. The interface to 
+these classes is the same as that of classifiers in `Scikit-Learn <http://scikit-learn.org/stable/index.html>`_ so users 
+familiar with `Scikit-Learn <http://scikit-learn.org/stable/index.html>`_ should have no trouble using this package. Below 
+we will show some examples of using the GenSVM classifier and the 
+GenSVMGridSearchCV class in practice.
+
+In the examples We assume that we have loaded the `iris dataset
+<http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html>`_ 
+from Scikit-Learn as follows:
+
+.. code:: python
+
+    >>> from sklearn.datasets import load_iris
+    >>> from sklearn.model_selection import train_test_split
+    >>> from sklearn.preprocessing import maxabs_scale
+    >>> X, y = load_iris(return_X_y=True)
+    >>> X = maxabs_scale(X)
+    >>> X_train, X_test, y_train, y_test = train_test_split(X, y)
+
+Note that we scale the data using the `maxabs_scale 
+<http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.maxabs_scale.html>`_ 
+function. This scales the columns of the data matrix to ``[-1, 1]`` without 
+breaking sparsity. Scaling the dataset can have a significant effect on the 
+computation time of GenSVM and is `generally recommended for SVMs 
+<https://stats.stackexchange.com/q/65094>`_.
+
+
+Example 1: Fitting a single GenSVM model
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Let's start by fitting the most basic GenSVM model on the training data:
+
+.. code:: python
+
+    >>> from gensvm import GenSVM
+    >>> clf = GenSVM()
+    >>> clf.fit(X_train, y_train)
+    GenSVM(coef=0.0, degree=2.0, epsilon=1e-06, gamma='auto', kappa=0.0,
+    kernel='linear', kernel_eigen_cutoff=1e-08, lmd=1e-05,
+    max_iter=100000000.0, p=1.0, random_state=None, verbose=0,
+    weights='unit')
+
+
+With the model fitted, we can predict the test dataset:
+
+.. code:: python
+
+    >>> y_pred = clf.predict(X_test)
+
+Next, we can compute a score for the predictions. The GenSVM class has a 
+``score`` method which computes the `accuracy_score 
+<http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html>`_ 
+for the predictions. In the GenSVM paper, the `adjusted Rand index 
+<https://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index>`_ is often used 
+to compare performance. We illustrate both options below (your results may be 
+different depending on the exact train/test split):
+
+.. code:: python
+
+    >>> clf.score(X_test, y_test)
+    1.0
+    >>> from sklearn.metrics import adjusted_rand_score
+    >>> adjusted_rand_score(clf.predict(X_test), y_test)
+    1.0
+
+We can try this again by changing the model parameters, for instance we can 
+turn on verbosity and use the Euclidean norm in the GenSVM model by setting ``p = 2``:
+
+.. code:: python
+
+    >>> clf2 = GenSVM(verbose=True, p=2)
+    >>> clf2.fit(X_train, y_train)
+    Starting main loop.
+    Dataset:
+        n = 112
+        m = 4
+        K = 3
+    Parameters:
+        kappa = 0.000000
+        p = 2.000000
+        lambda = 0.0000100000000000
+        epsilon = 1e-06
+    
+    iter = 0, L = 3.4499531579689533, Lbar = 7.3369415851139745, reldiff = 1.1266786095824437
+    ...
+    Optimization finished, iter = 4046, loss = 0.0230726364692517, rel. diff. = 0.0000009998645783
+    Number of support vectors: 9
+    GenSVM(coef=0.0, degree=2.0, epsilon=1e-06, gamma='auto', kappa=0.0,
+        kernel='linear', kernel_eigen_cutoff=1e-08, lmd=1e-05,
+        max_iter=100000000.0, p=2, random_state=None, verbose=True,
+        weights='unit')
+
+For other parameters that can be tuned in the GenSVM model, see `GenSVM`_.
+
+
+Example 2: Fitting a GenSVM model with a "warm start"
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One of the key features of the GenSVM classifier is that training can be 
+accelerated by using so-called "warm-starts". This way the optimization can be 
+started in a location that is closer to the final solution than a random 
+starting position would be. To support this, the ``fit`` method of the GenSVM 
+class has an optional ``seed_V`` parameter. We'll illustrate how this can be 
+used below.
+
+We start with relatively large value for the ``epsilon`` parameter in the 
+model. This is the stopping parameter that determines how long the 
+optimization continues (and therefore how exact the fit is).
+
+.. code:: python
+
+    >>> clf1 = GenSVM(epsilon=1e-3)
+    >>> clf1.fit(X_train, y_train)
+    ...
+    >>> clf1.n_iter_
+    163
+
+The ``n_iter_`` attribute tells us how many iterations the model did. Now, we 
+can use the solution of this model to start the training for the next model:
+
+.. code:: python
+
+    >>> clf2 = GenSVM(epsilon=1e-8)
+    >>> clf2.fit(X_train, y_train, seed_V=clf1.combined_coef_)
+    ...
+    >>> clf2.n_iter_
+    3196
+
+Compare this to a model with the same stopping parameter, but without the warm 
+start:
+
+.. code:: python
+
+    >>> clf2.fit(X_train, y_train)
+    ...
+    >>> clf2.n_iter_
+    3699
+
+So we saved about 500 iterations! This effect will be especially significant 
+with large datasets and when you try out many parameter configurations.  
+Therefore this technique is built into the `GenSVMGridSearchCV`_ class that 
+can be used to do a grid search of parameters.
+
+
+Example 3: Running a GenSVM grid search
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Often when we're fitting a machine learning model such as GenSVM, we have to 
+try several parameter configurations to figure out which one performs best on 
+our given dataset. This is usually combined with `cross validation 
+<http://scikit-learn.org/stable/modules/cross_validation.html>`_ to avoid 
+overfitting. To do this efficiently and to make use of warm starts, the 
+`GenSVMGridSearchCV`_ class is available. This class works in the same way as 
+the `GridSearchCV 
+<http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html>`_ 
+class of `Scikit-Learn <http://scikit-learn.org/stable/index.html>`_, but uses 
+the GenSVM C library for speed.
+
+To do a grid search, we first have to define the parameters that we want to 
+vary and what values we want to try:
+
+.. code:: python
+
+    >>> from gensvm import GenSVMGridSearchCV
+    >>> param_grid = {'p': [1.0, 2.0], 'lmd': [1e-8, 1e-6, 1e-4, 1e-2, 1.0], 'kappa': [-0.9, 0.0] }
+
+For the values that are not varied in the parameter grid, the default values 
+will be used. This means that if you want to change a specific value (such as 
+``epsilon`` for instance), you can add this to the parameter grid as a 
+parameter with a single value to try (e.g. ``'epsilon': [1e-8]``).
+
+Running the grid search is now straightforward:
+
+.. code:: python
+
+    >>> gg = GenSVMGridSearchCV(param_grid)
+    >>> gg.fit(X_train, y_train)
+    GenSVMGridSearchCV(cv=None, iid=True,
+          param_grid={'p': [1.0, 2.0], 'lmd': [1e-06, 0.0001, 0.01, 1.0], 'kappa': [-0.9, 0.0]},
+          refit=True, return_train_score=True, scoring=None, verbose=0)
+
+Note that if we have set ``refit=True`` (the default), then we can use the 
+`GenSVMGridSearchCV`_ instance to predict or score using the best estimator 
+found in the grid search:
+
+.. code:: python
+
+    >>> y_pred = gg.predict(X_test)
+    >>> gg.score(X_test, y_test)
+    1.0
+
+A nice feature borrowed from `Scikit-Learn <http://scikit-learn.org>`_ is that 
+the results from the grid search can be represented as a ``pandas`` DataFrame:
+
+.. code:: python
+
+    >>> from pandas import DataFrame
+    >>> df = DataFrame(gg.cv_results_)
+
+This can make it easier to explore the results of the grid search.
+
+Known Limitations
+-----------------
+
+The following are known limitations that are on the roadmap for a future 
+release of the package. If you need any of these features, please vote on them 
+on the linked GitHub issues (this can make us add them sooner!).
+
+1. `Support for sparse matrices 
+   <https://github.com/GjjvdBurg/PyGenSVM/issues/1>`_. NumPy supports sparse 
+   matrices, as does the GenSVM C library. Getting them to work together 
+   requires some time. In the meantime, if you really want to use sparse data 
+   with GenSVM (this can lead to significant speedups!), check out the GenSVM 
+   C library.
+2. `Specification of instance weights 
+   <https://github.com/GjjvdBurg/PyGenSVM/issues/2>`_. Currently the package 
+   allows for two modes of instance weights: ``unit`` weights where each 
+   instance gets weight 1 and ``group`` weights where instances get weights 
+   inversely proportional to the size of their class. In the future, we want 
+   to allow the user to specify a vector of weights as well.
+3. `Specification of class misclassification weights 
+   <https://github.com/GjjvdBurg/PyGenSVM/issues/3>`_. Currently, incorrectly 
+   classification an object from class A to class C is as bad as incorrectly 
+   classifying an object from class B to class C. Depending on the 
+   application, this may not be the desired effect. Adding class 
+   misclassification weights can solve this issue.
+
+Questions and Issues
+--------------------
+
+If you have any questions or encounter any issues with using this package, 
+please ask them on `GitHub <https://github.com/GjjvdBurg/PyGenSVM>`_.
+
+License
+-------
+
+This package is licensed under the GNU General Public License version 3.  
+Copyright G.J.J. van den Burg, excluding the sections of the code that are 
+explicitly marked to come from Scikit-Learn.
+
diff --git a/docs/Makefile b/docs/Makefile
new file mode 100644
index 0000000..ac6c1f0
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = python -msphinx
+SPHINXPROJ    = GenSVM
+SOURCEDIR     = .
+BUILDDIR      = ../../gensvm_docs
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/conf.py b/docs/conf.py
new file mode 100644
index 0000000..a5c06ea
--- /dev/null
+++ b/docs/conf.py
@@ -0,0 +1,212 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+#
+# GenSVM documentation build configuration file, created by
+# sphinx-quickstart on Tue Sep 26 00:11:33 2017.
+#
+# This file is execfile()d with the current directory set to its
+# containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+
+import os
+import sys
+import sphinx_rtd_theme
+
+from unittest.mock import MagicMock
+
+sys.path.insert(0, os.path.abspath('..'))
+
+# mock out C extensions for ReadTheDocs 
+# (http://docs.readthedocs.io/en/latest/faq.html)
+class Mock(MagicMock):
+    @classmethod
+    def __getattr__(cls, name):
+        return MagicMock()
+
+MOCK_MODULES = ['gensvm.wrapper']
+sys.modules.update((mod_name, Mock()) for mod_name in MOCK_MODULES)
+
+
+# -- General configuration ------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = ['sphinx.ext.autodoc',
+    'sphinx.ext.doctest',
+    'sphinx.ext.coverage',
+    'sphinx.ext.mathjax',
+    'sphinx.ext.githubpages',
+    'sphinx.ext.napoleon',
+    'sphinx.ext.intersphinx'
+    ]
+
+# intersphinx mappings (https://kev.inburke.com/kevin/sphinx-interlinks/)
+# https://stackoverflow.com/q/46080681
+intersphinx_mapping = {
+        'sklearn': ('http://scikit-learn.org/stable', None)
+        }
+
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+#
+# source_suffix = ['.rst', '.md']
+source_suffix = '.rst'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = 'GenSVM'
+copyright = '2017, Gertjan van den Burg'
+author = 'Gertjan van den Burg'
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+#version = '0.1.0'
+# The full version, including alpha/beta/rc tags.
+#release = '0.1.0'
+__version__ = "1.0.0"
+try:
+    pth = os.path.realpath(__file__)
+    dr = os.path.dirname(pth)
+    init_pth = os.path.realpath(os.path.join(dr, '..', 'gensvm', 
+        '__init__.py'))
+    line = open(init_pth).readlines()[0]
+    __version__ = line.split('=')[-1].strip("\n '")
+except:
+    pass
+
+version = __version__
+release = version
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This patterns also effect to html_static_path and html_extra_path
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# If true, `todo` and `todoList` produce output, else they produce nothing.
+todo_include_todos = False
+
+
+# -- Options for HTML output ----------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+# html_theme_options = {}
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# This is required for the alabaster theme
+# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
+html_sidebars = {
+    '**': [
+        'about.html',
+        'navigation.html',
+        'relations.html',  # needs 'show_related': True theme option to display
+        'searchbox.html',
+        'donate.html',
+    ]
+}
+
+
+# -- Options for HTMLHelp output ------------------------------------------
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'GenSVMdoc'
+
+
+# -- Options for LaTeX output ---------------------------------------------
+
+latex_elements = {
+    # The paper size ('letterpaper' or 'a4paper').
+    #
+    # 'papersize': 'letterpaper',
+
+    # The font size ('10pt', '11pt' or '12pt').
+    #
+    # 'pointsize': '10pt',
+
+    # Additional stuff for the LaTeX preamble.
+    #
+    # 'preamble': '',
+
+    # Latex figure (float) alignment
+    #
+    # 'figure_align': 'htbp',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'GenSVM.tex', 'GenSVM Documentation',
+     'Gertjan van den Burg', 'manual'),
+]
+
+
+# -- Options for manual page output ---------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    (master_doc, 'gensvm', 'GenSVM Documentation',
+     [author], 1)
+]
+
+
+# -- Options for Texinfo output -------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'GenSVM', 'GenSVM Documentation',
+     author, 'GenSVM', 'Implementation of the GenSVM classifier in Python',
+     'Miscellaneous'),
+]
diff --git a/docs/index.rst b/docs/index.rst
new file mode 100644
index 0000000..d8f8425
--- /dev/null
+++ b/docs/index.rst
@@ -0,0 +1,25 @@
+.. GenSVM documentation master file, created by
+   sphinx-quickstart on Tue Sep 26 00:11:33 2017.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+
+.. include:: ../README.rst
+
+Classes
+=======
+
+The complete documentation of the available GenSVM classes is presented below.  
+
+GenSVM
+------
+
+.. autoclass:: gensvm.core.GenSVM
+
+GenSVMGridSearchCV
+------------------
+
+.. autoclass:: gensvm.gridsearch.GenSVMGridSearchCV
+
+
+.. include:: ../CHANGELOG.rst
author	Gertjan van den Burg <gertjanvandenburg@gmail.com>	2017-12-12 20:19:12 -0500
committer	Gertjan van den Burg <gertjanvandenburg@gmail.com>	2017-12-12 20:19:12 -0500
commit	7d255c08c589a443aa72ff247b46022204a2ef22 (patch)
tree	68c8f872966852d5627cef748da05612f693e4ef
parent	added gridsearch and extended gensvm class (diff)
download	pygensvm-7d255c08c589a443aa72ff247b46022204a2ef22.tar.gz pygensvm-7d255c08c589a443aa72ff247b46022204a2ef22.zip