diff options
| author | Gertjan van den Burg <burg@ese.eur.nl> | 2016-12-08 14:35:32 +0100 |
|---|---|---|
| committer | Gertjan van den Burg <burg@ese.eur.nl> | 2016-12-08 14:35:32 +0100 |
| commit | e3b00f4152068f5fd14f8fa4eac88969dfb221f2 (patch) | |
| tree | ef07621bf8b44353c646af76dac76f46c3333c8f | |
| parent | Update README (diff) | |
| download | gensvm-e3b00f4152068f5fd14f8fa4eac88969dfb221f2.tar.gz gensvm-e3b00f4152068f5fd14f8fa4eac88969dfb221f2.zip | |
add specification of libsvm data format
| -rw-r--r-- | doc/specifications.dox | 50 | ||||
| -rw-r--r-- | src/gensvm_io.c | 3 |
2 files changed, 44 insertions, 9 deletions
diff --git a/doc/specifications.dox b/doc/specifications.dox index 4b305f4..e6d6b7b 100644 --- a/doc/specifications.dox +++ b/doc/specifications.dox @@ -58,7 +58,7 @@ * @c weight: @n * The weight specifications for the algorithm to use. Two weight * specifications are implemented: the unit weights (index = 1) and the group - * size correction weights (index = 2). See also msvmmaj_initialize_weights(). + * size correction weights (index = 2). See also gensvm_initialize_weights(). * * @c folds: @n * The number of cross validation folds to use. @@ -73,28 +73,28 @@ * @c gamma:* @n * Gamma parameters for the @c RBF, @c POLY, and @c SIGMOID kernels. This * parameter is only optional if the @c LINEAR kernel is specified. See - * msvmmaj_compute_rbf(), msvmmaj_compute_poly(), and - * msvmmaj_compute_sigmoid() for kernel specifications. + * gensvm_kernel_dot_rbf(), gensvm_kernel_dot_poly(), and + * gensvm_kernel_dot_sigmoid() for kernel specifications. * * @c coef:* @n * Coefficients for the @c POLY and @c SIGMOID kernels. This parameter is only * optional if the @c LINEAR or @c RBF kernels are used. See - * msvmmaj_compute_poly() and msvmmaj_compute_sigmoid() for kernel + * gensvm_kernel_dot_poly(), and gensvm_kernel_dot_sigmoid() for kernel * specifications. * * @c degree:* @n * Degrees to search over in the grid search when the @c POLY kernel is * specified. With other kernel specifications this parameter is unnecessary. - * See msvmmaj_compute_poly() for the polynomial kernel specification. + * See gensvm_kernel_dot_poly() for the polynomial kernel specification. * */ /** - * @page spec_data_file Data File Specification + * @page spec_data_file Default Data File Specification * * This page describes the input file format for a dataset. This specification - * is used by msvmmaj_read_data() and msvmmaj_write_predictions(). The data + * is used by gensvm_read_data() and gensvm_write_predictions(). The data * file specification is the same as that used in <a * href="http://www.loria.fr/~lauer/MSVMpack/MSVMpack.html">MSVMpack</a> * (verified in v. 1.3). @@ -126,10 +126,44 @@ x_n1 x_n2 ... x_nm y_n */ /** + * @page spec_libsvm_data_file LibSVM/SVMlight Data File Specification + * + * Here we briefly describe the input file format for a dataset stored in + * LibSVM/SVMlight format. This is based on the LibSVM documentation. Files in + * this format can be read by the function gensvm_read_data_libsvm(), and can + * be used in the executables with the ``-x`` flag. + * + * The LibSVM/SVMlight file format is a sparse format and so only the nonzero + * values are expected to be stored. Each value is therefore accompanied by + * its index. In GenSVM, this index can be either 0-based or 1-based. The + * basic file format is as follows: + * @verbatim +y_1 index1:value1 index2:value2 ... +. +. +. +@endverbatim + * + * For a training dataset, the class labels @c y_i are expected in the first + * column of each line. Class labels can be left out of the file for a test + * dataset (in which case the file only contains index/value pairs). + * + * As an example, below the first 5 lines of the iris dataset are shown. + * + * @verbatim +1 1:5.10000 2:3.50000 3:1.40000 4:0.20000 +1 1:4.90000 2:3.00000 3:1.40000 4:0.20000 +1 1:4.70000 2:3.20000 3:1.30000 4:0.20000 +@endverbatim + * + */ + + +/** * @page spec_model_file Model File Specification * * This page describes the input file format for a GenModel. This - * specification is used by msvmmaj_read_model() and msvmmaj_write_model(). + * specification is used by gensvm_read_model() and gensvm_write_model(). * The model file is designed to fully reproduce a GenModel. * * The model output file follows the format diff --git a/src/gensvm_io.c b/src/gensvm_io.c index d41b2ef..667bc5c 100644 --- a/src/gensvm_io.c +++ b/src/gensvm_io.c @@ -150,7 +150,8 @@ void exit_input_error(int line_num) * * @details * This function reads data from a file where the data is stored in - * LibSVM/SVMlight format. This is a sparse data format, which can be + * LibSVM/SVMlight format. The file format is described in @ref + * spec_libsvm_data_file. This is a sparse data format, which can be * beneficial for certain applications. The advantage of having this function * here is twofold: 1) existing datasets where data is stored in * LibSVM/SVMlight format can be easily used in GenSVM, and 2) sparse datasets |
