GenSVM
======
This is the repository for the C implementation of *GenSVM*, a generalized
multiclass support vector machine proposed in:
> [GenSVM: A Generalized Multiclass Support Vector
> Machine](http://jmlr.org/papers/v17/14-526.html)
> G.J.J. van den Burg and P.J.F. Groenen
> *Journal of Machine Learning Research*, 2016.
GenSVM is available in these languages:
Language | URL
:-------:|:-------:
| [https://github.com/GjjvdBurg/PyGenSVM](https://github.com/GjjvdBurg/PyGenSVM)
| [https://github.com/GjjvdBurg/RGenSVM](https://github.com/GjjvdBurg/RGenSVM)
| [https://github.com/GjjvdBurg/GenSVM](https://github.com/GjjvdBurg/GenSVM)
Introduction
------------
GenSVM is a general multiclass support vector machine, which you can use for
classification problems with multiple classes. Training GenSVM in
cross-validation or grid search setups can be done efficiently due to the
ability to use warm starts. See the
[paper](http://jmlr.org/papers/v17/14-526.html) for more information, and
Usage below for how to use GenSVM.
The library has support for datasets in
[MSVMpack](https://members.loria.fr/FLauer/files/MSVMpack/MSVMpack.html) and
[LibSVM/SVMlight](https://www.csie.ntu.edu.tw/~cjlin/libsvm/) format, and can
take advantage of sparse datasets. There is also preliminary support for
nonlinear GenSVM through kernels.
For documentation on how the library is implemented, see the [Doxygen
documentation available here](https://gjjvdburg.github.io/GenSVM/). There are
also many unit tests, which you can use to further understand how the library
works. For the latest version of the library you can view the [test coverage
report](https://gjjvdburg.github.io/GenSVM/cover) online.
This is the C library for GenSVM that contains two executables for using the
method. A Python package for GenSVM is available
[here](https://github.com/GjjvdBurg/PyGenSVM). An R package for GenSVM is
planned. If you are interested in this, please express your interest for the
R package [here](https://github.com/GjjvdBurg/GenSVM/issues/2).
Usage
-----
First, download and compile the library. Minimal requirements for compilation
are a working BLAS and LAPACK installation, which you can likely obtain from
your package manager. It is however recommended to use ATLAS versions of these
libraries, since this will give a significant increase in speed. If you choose
not to use ATLAS, remove linking with ``-latlas`` in the ``LDFLAGS`` variable
in the Makefile.
Then, compile the library with a simple:
```
$ make
```
If you like to run the tests, use ``make test`` on the command line.
After successful compilation, you will have two executables ``gensvm`` and
``gensvm_grid``. Type:
```
$ ./gensvm
```
To get an overview of the command line options to the executable (similar for
``gensvm_grid``).
The ``gensvm`` executable can be used to train a GenSVM model on a dataset
with a single hyperparameter configuration, whereas the ``gensvm_grid``
executable can be used to run a grid search on a dataset.
Here's an example of using the ``gensvm`` executable on a single dataset, with
some custom parameters:
```
$ ./gensvm -l 1e-5 -k 1.0 -p 1.5 data/iris.train
```
This fits the model with regularization parameter ``1e-5``, Huber hinge
parameter ``1.0`` and lp norm parameter ``1.5``, and default settings
otherwise. On my computer this yields a model with 18 support vectors in about
0.1 seconds. The ``gensvm`` executable can also be used to get predictions for
a test dataset, if it is supplied as final argument to the command. In this
case, predictions will be printed to stdout, unless an output file is
specified with the ``-o`` option.
The ``gensvm_grid`` executable can be used to run a grid search on a dataset.
The input to this executable is a file (called a grid file), which specifies
the values of the parameters. See the ``training`` directory for examples and
the documentation [here](https://gjjvdburg.github.io/GenSVM/) for more info on
the file format. One important thing to note is that when the ``repeats``
field has a positive value, a so-called "consistency check" will be performed
after the grid search has finished. This is a robustness check on the best
performing configurations, to find the best overall hyperparameter
configuration with the best performance and smallest training time. In this
robustness check warm-starts are not used, to ensure the observations are
independent measurements of training time.
Here's an example of running ``gensvm_grid`` without repeats on the iris
dataset:
```
$ ./gensvm_grid training/iris_norepeats.training
```
On my computer this runs in about 8 seconds with 342 hyperparameter
configurations. Alternatively, if consistency checks are desired we can run:
```
$ ./gensvm_grid training/iris.training
```
which runs the same grid search but also does 5 consistency repeats for each
of the configurations with the 5% best performance. Note that the performance
is measured by cross-validated accuracy scores. This example runs in about 13
seconds on my computer.
Reference
---------
If you use GenSVM in any of your projects, please cite the GenSVM paper
available at
[http://jmlr.org/papers/v17/14-526.html](http://jmlr.org/papers/v17/14-526.html).
You can use the following BibTeX code:
```bib
@article{JMLR:v17:14-526,
author = {Gerrit J.J. van den Burg and Patrick J.F. Groenen},
title = {{GenSVM}: A Generalized Multiclass Support Vector Machine},
journal = {Journal of Machine Learning Research},
year = {2016},
volume = {17},
number = {225},
pages = {1-42},
url = {http://jmlr.org/papers/v17/14-526.html}
}
```
License
-------
Copyright 2016, G.J.J. van den Burg.
GenSVM is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
GenSVM is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GenSVM. If not, see .
For more information please contact:
G.J.J. van den Burg
email: gertjanvandenburg@gmail.com