aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGertjan van den Burg <burg@ese.eur.nl>2016-12-08 14:35:32 +0100
committerGertjan van den Burg <burg@ese.eur.nl>2016-12-08 14:35:32 +0100
commite3b00f4152068f5fd14f8fa4eac88969dfb221f2 (patch)
treeef07621bf8b44353c646af76dac76f46c3333c8f
parentUpdate README (diff)
downloadgensvm-e3b00f4152068f5fd14f8fa4eac88969dfb221f2.tar.gz
gensvm-e3b00f4152068f5fd14f8fa4eac88969dfb221f2.zip
add specification of libsvm data format
-rw-r--r--doc/specifications.dox50
-rw-r--r--src/gensvm_io.c3
2 files changed, 44 insertions, 9 deletions
diff --git a/doc/specifications.dox b/doc/specifications.dox
index 4b305f4..e6d6b7b 100644
--- a/doc/specifications.dox
+++ b/doc/specifications.dox
@@ -58,7 +58,7 @@
* @c weight: @n
* The weight specifications for the algorithm to use. Two weight
* specifications are implemented: the unit weights (index = 1) and the group
- * size correction weights (index = 2). See also msvmmaj_initialize_weights().
+ * size correction weights (index = 2). See also gensvm_initialize_weights().
*
* @c folds: @n
* The number of cross validation folds to use.
@@ -73,28 +73,28 @@
* @c gamma:* @n
* Gamma parameters for the @c RBF, @c POLY, and @c SIGMOID kernels. This
* parameter is only optional if the @c LINEAR kernel is specified. See
- * msvmmaj_compute_rbf(), msvmmaj_compute_poly(), and
- * msvmmaj_compute_sigmoid() for kernel specifications.
+ * gensvm_kernel_dot_rbf(), gensvm_kernel_dot_poly(), and
+ * gensvm_kernel_dot_sigmoid() for kernel specifications.
*
* @c coef:* @n
* Coefficients for the @c POLY and @c SIGMOID kernels. This parameter is only
* optional if the @c LINEAR or @c RBF kernels are used. See
- * msvmmaj_compute_poly() and msvmmaj_compute_sigmoid() for kernel
+ * gensvm_kernel_dot_poly(), and gensvm_kernel_dot_sigmoid() for kernel
* specifications.
*
* @c degree:* @n
* Degrees to search over in the grid search when the @c POLY kernel is
* specified. With other kernel specifications this parameter is unnecessary.
- * See msvmmaj_compute_poly() for the polynomial kernel specification.
+ * See gensvm_kernel_dot_poly() for the polynomial kernel specification.
*
*/
/**
- * @page spec_data_file Data File Specification
+ * @page spec_data_file Default Data File Specification
*
* This page describes the input file format for a dataset. This specification
- * is used by msvmmaj_read_data() and msvmmaj_write_predictions(). The data
+ * is used by gensvm_read_data() and gensvm_write_predictions(). The data
* file specification is the same as that used in <a
* href="http://www.loria.fr/~lauer/MSVMpack/MSVMpack.html">MSVMpack</a>
* (verified in v. 1.3).
@@ -126,10 +126,44 @@ x_n1 x_n2 ... x_nm y_n
*/
/**
+ * @page spec_libsvm_data_file LibSVM/SVMlight Data File Specification
+ *
+ * Here we briefly describe the input file format for a dataset stored in
+ * LibSVM/SVMlight format. This is based on the LibSVM documentation. Files in
+ * this format can be read by the function gensvm_read_data_libsvm(), and can
+ * be used in the executables with the ``-x`` flag.
+ *
+ * The LibSVM/SVMlight file format is a sparse format and so only the nonzero
+ * values are expected to be stored. Each value is therefore accompanied by
+ * its index. In GenSVM, this index can be either 0-based or 1-based. The
+ * basic file format is as follows:
+ * @verbatim
+y_1 index1:value1 index2:value2 ...
+.
+.
+.
+@endverbatim
+ *
+ * For a training dataset, the class labels @c y_i are expected in the first
+ * column of each line. Class labels can be left out of the file for a test
+ * dataset (in which case the file only contains index/value pairs).
+ *
+ * As an example, below the first 5 lines of the iris dataset are shown.
+ *
+ * @verbatim
+1 1:5.10000 2:3.50000 3:1.40000 4:0.20000
+1 1:4.90000 2:3.00000 3:1.40000 4:0.20000
+1 1:4.70000 2:3.20000 3:1.30000 4:0.20000
+@endverbatim
+ *
+ */
+
+
+/**
* @page spec_model_file Model File Specification
*
* This page describes the input file format for a GenModel. This
- * specification is used by msvmmaj_read_model() and msvmmaj_write_model().
+ * specification is used by gensvm_read_model() and gensvm_write_model().
* The model file is designed to fully reproduce a GenModel.
*
* The model output file follows the format
diff --git a/src/gensvm_io.c b/src/gensvm_io.c
index d41b2ef..667bc5c 100644
--- a/src/gensvm_io.c
+++ b/src/gensvm_io.c
@@ -150,7 +150,8 @@ void exit_input_error(int line_num)
*
* @details
* This function reads data from a file where the data is stored in
- * LibSVM/SVMlight format. This is a sparse data format, which can be
+ * LibSVM/SVMlight format. The file format is described in @ref
+ * spec_libsvm_data_file. This is a sparse data format, which can be
* beneficial for certain applications. The advantage of having this function
* here is twofold: 1) existing datasets where data is stored in
* LibSVM/SVMlight format can be easily used in GenSVM, and 2) sparse datasets