/** * @page spec_grid_file Grid Input File Specification * * This page specifies the training file that can be parsed by * read_training_from_file(). Below is an example training file. * * @verbatim train: /path/to/training/dataset.txt test: /path/to/test/dataset.txt p: 1.0 1.5 2.0 kappa: -0.9 0.0 1.0 lambda: 64 16 4 1 0.25 0.0625 0.015625 0.00390625 0.0009765625 0.000244140625 epsilon: 1e-6 weight: 1 2 folds: 10 kernel: LINEAR gamma: 1e-3 1e-1 1e1 1e3 coef: 1.0 2.0 degree: 2.0 3.0 @endverbatim * * Note that with a @c LINEAR kernel specification, the @c gamma, @c coef, and * @c degree parameters do not need to be specified. The above merely shows * all available parameters that can be specified in the grid search. Below * each of the parameters are described in more detail. Arguments followed by * an asterisk are optional. * * @c train: @n * The location of the training dataset file. See @ref spec_data_file for the * specification of a dataset file. * * @c test:* @n * The location of a test dataset file. See @ref spec_data_file for the * specification of a dataset file. This is optional, if specified the * train/test split will be used for training. * * @c p: @n * The values of the @c p parameter of the algorithm to search over. The @c p * parameter is used in the @f$ \ell_p @f$ norm over the Huber weighted scalar * misclassification errors. Note: @f$ 1 \leq p \leq 2 @f$. * * @c kappa: @n * The values of the @c kappa parameter of the algorithm to search over. The * @c kappa parameter is used in the Huber hinge error over the scalar * misclassification errors. Note: @f$ \kappa > -1 @f$. * * @c lambda: @n * The values of the @c lambda parameter of the algorithm to search over. The * @c lambda parameter is used in the regularization term of the loss * function. Note: @f$ \lambda > 0 @f$. * * @c epsilon: @n * The values of the @c epsilon parameter of the algorithm to search over. The * @c epsilon parameter is used as the stopping parameter in the majorization * algorithm. Note that it often suffices to use only one epsilon value. Using * more than one value increases the size of the grid search considerably. * * @c weight: @n * The weight specifications for the algorithm to use. Two weight * specifications are implemented: the unit weights (index = 1) and the group * size correction weights (index = 2). See also gensvm_initialize_weights(). * * @c folds: @n * The number of cross validation folds to use. * * @c kernel:* @n * Kernel to use in training. Only one kernel can be specified. See KernelType * for available kernel functions. Note: if multiple kernel types are * specified on this line, only the last value will be used (see the * implementation of parse_kernel_str() for details). If no kernel is * specified, the @c LINEAR kernel will be used. * * @c gamma:* @n * Gamma parameters for the @c RBF, @c POLY, and @c SIGMOID kernels. This * parameter is only optional if the @c LINEAR kernel is specified. See * gensvm_kernel_dot_rbf(), gensvm_kernel_dot_poly(), and * gensvm_kernel_dot_sigmoid() for kernel specifications. * * @c coef:* @n * Coefficients for the @c POLY and @c SIGMOID kernels. This parameter is only * optional if the @c LINEAR or @c RBF kernels are used. See * gensvm_kernel_dot_poly(), and gensvm_kernel_dot_sigmoid() for kernel * specifications. * * @c degree:* @n * Degrees to search over in the grid search when the @c POLY kernel is * specified. With other kernel specifications this parameter is unnecessary. * See gensvm_kernel_dot_poly() for the polynomial kernel specification. * */ /** * @page spec_data_file Default Data File Specification * * This page describes the input file format for a dataset. This specification * is used by gensvm_read_data() and gensvm_write_predictions(). The data * file specification is the same as that used in MSVMpack * (verified in v. 1.3). * * The file is expected to be as follows * @verbatim n m x_11 x_12 ... x_1m y_1 x_21 x_22 ... x_2m y_2 ... x_n1 x_n2 ... x_nm y_n @endverbatim * * Here, @c n denotes the number of instances and @c m denotes the number of * predictors. The class labels @c y_i are expected in the final column of * each line. * * As an example, below the first 5 lines of the iris dataset are shown. * * @verbatim 150 4 5.10000 3.50000 1.40000 0.20000 1.00000 4.90000 3.00000 1.40000 0.20000 1.00000 4.70000 3.20000 1.30000 0.20000 1.00000 @endverbatim * */ /** * @page spec_libsvm_data_file LibSVM/SVMlight Data File Specification * * Here we briefly describe the input file format for a dataset stored in * LibSVM/SVMlight format. This is based on the LibSVM documentation. Files in * this format can be read by the function gensvm_read_data_libsvm(), and can * be used in the executables with the ``-x`` flag. * * The LibSVM/SVMlight file format is a sparse format and so only the nonzero * values are expected to be stored. Each value is therefore accompanied by * its index. In GenSVM, this index can be either 0-based or 1-based. The * basic file format is as follows: * @verbatim y_1 index1:value1 index2:value2 ... . . . @endverbatim * * For a training dataset, the class labels @c y_i are expected in the first * column of each line. Class labels can be left out of the file for a test * dataset (in which case the file only contains index/value pairs). * * As an example, below the first 5 lines of the iris dataset are shown. * * @verbatim 1 1:5.10000 2:3.50000 3:1.40000 4:0.20000 1 1:4.90000 2:3.00000 3:1.40000 4:0.20000 1 1:4.70000 2:3.20000 3:1.30000 4:0.20000 @endverbatim * */ /** * @page spec_model_file Model File Specification * * This page describes the input file format for a GenModel. This * specification is used by gensvm_read_model() and gensvm_write_model(). * The model file is designed to fully reproduce a GenModel. * * The model output file follows the format * @verbatim Output file for GenSVM (version 0.1) Generated on: Tue Jan 14 12:00:00 2014 (UTC +01:00) Model: p = 2.00 lambda = 0.001 kappa = 1.0 epsilon = 1e-06 weight_idx = 1 Data: filename = /path/to/data_file.txt n = 150 m = 4 K = 3 Output: -0.7693429935131153 -1.9335141926875414 +0.3425555992439160 +1.0939198172438194 +0.3100589593140404 +0.9872012663780092 +0.1319873613546321 +0.1207806485439152 +0.8052481376988456 +0.6507524553955120 @endverbatim * * The first two lines of the file mainly serve a logging purpose, and are * ignored when reading the model file. The model section fully describes the * model parameters. Next, the data section describes the data file that was * used in training and the size of the dataset. Finally, the output section * shows the augmented weight matrix GenModel::V, in row-major order. * * @todo * Write kernel specification to model file as well and adjust the format * above. */