diff options
| author | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2018-03-27 12:31:28 +0100 |
|---|---|---|
| committer | Gertjan van den Burg <gertjanvandenburg@gmail.com> | 2018-03-27 12:31:28 +0100 |
| commit | 004941896bac692d354c41a3334d20ee1d4627f7 (patch) | |
| tree | 2b11e42d8524843409e2bf8deb4ceb74c8b69347 /man/gensvm.grid.Rd | |
| parent | updates to GenSVM C library (diff) | |
| download | rgensvm-004941896bac692d354c41a3334d20ee1d4627f7.tar.gz rgensvm-004941896bac692d354c41a3334d20ee1d4627f7.zip | |
GenSVM R package
Diffstat (limited to 'man/gensvm.grid.Rd')
| -rw-r--r-- | man/gensvm.grid.Rd | 161 |
1 files changed, 161 insertions, 0 deletions
diff --git a/man/gensvm.grid.Rd b/man/gensvm.grid.Rd new file mode 100644 index 0000000..6dbec22 --- /dev/null +++ b/man/gensvm.grid.Rd @@ -0,0 +1,161 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/gensvm.grid.R +\name{gensvm.grid} +\alias{gensvm.grid} +\title{Cross-validated grid search for GenSVM} +\usage{ +gensvm.grid(X, y, param.grid = "tiny", refit = TRUE, scoring = NULL, + cv = 3, verbose = 0, return.train.score = TRUE) +} +\arguments{ +\item{X}{training data matrix. We denote the size of this matrix by +n_samples x n_features.} + +\item{y}{training vector of class labes of length n_samples. The number of +unique labels in this vector is denoted by n_classes.} + +\item{param.grid}{String (\code{'tiny'}, \code{'small'}, or \code{'full'}) +or data frame with parameter configurations to evaluate. Typically this is +the output of \code{expand.grid}. For more details, see "Using a Parameter +Grid" below.} + +\item{refit}{boolean variable. If true, the best model from cross validation +is fitted again on the entire dataset.} + +\item{scoring}{metric to use to evaluate the classifier performance during +cross validation. The metric should be an R function that takes two +arguments: y_true and y_pred and that returns a float such that higher +values are better. If it is NULL, the accuracy score will be used.} + +\item{cv}{the number of cross-validation folds to use or a vector with the +same length as \code{y} where each unique value denotes a test split.} + +\item{verbose}{integer to indicate the level of verbosity (higher is more +verbose)} + +\item{return.train.score}{whether or not to return the scores on the +training splits} +} +\value{ +A "gensvm.grid" S3 object with the following items: +\item{call}{Call that produced this object} +\item{param.grid}{Sorted version of the parameter grid used in training} +\item{cv.results}{A data frame with the cross validation results} +\item{best.estimator}{If refit=TRUE, this is the GenSVM model fitted with +the best hyperparameter configuration, otherwise it is NULL} +\item{best.score}{Mean cross-validated test score for the model with the +best hyperparameter configuration} +\item{best.params}{Parameter configuration that provided the highest mean +cross-validated test score} +\item{best.index}{Row index of the cv.results data frame that corresponds to +the best hyperparameter configuration} +\item{n.splits}{The number of cross-validation splits} +\item{n.objects}{The number of instances in the data} +\item{n.features}{The number of features of the data} +\item{n.classes}{The number of classes in the data} +\item{classes}{Array with the unique classes in the data} +\item{total.time}{Training time for the grid search} +\item{cv.idx}{Array with cross validation indices used to split the data} +} +\description{ +This function performs a cross-validated grid search of the +model parameters to find the best hyperparameter configuration for a given +dataset. This function takes advantage of GenSVM's ability to use warm +starts to speed up computation. The function uses the GenSVM C library for +speed. +} +\note{ +This function returns partial results when the computation is interrupted by +the user. +} +\section{Using a Parameter Grid}{ + +To evaluate certain paramater configurations, a data frame can be supplied +to the \code{param.grid} argument of the function. Such a data frame can +easily be generated using the R function \code{expand.grid}, or could be +created through other ways to test specific parameter configurations. + +Three parameter grids are predefined: +\describe{ +\item{\code{'tiny'}}{This parameter grid is generated by the function +\code{\link{gensvm.load.tiny.grid}} and is the default parameter grid. It +consists of parameter configurations that are likely to perform well on +various datasets.} +\item{\code{'small'}}{This grid is generated by +\code{\link{gensvm.load.small.grid}} and generates a data frame with 90 +configurations. It is typically fast to train but contains some +configurations that are unlikely to perform well. It is included for +educational purposes.} +\item{\code{'full'}}{This grid loads the parameter grid as used in the +GenSVM paper. It consists of 342 configurations and is generated by the +\code{\link{gensvm.load.full.grid}} function. Note that in the GenSVM paper +cross validation was done with this parameter grid, but the final training +step used \code{epsilon=1e-8}. The \code{\link{gensvm.refit}} function is +useful in this scenario.} +} + +When you provide your own parameter grid, beware that only certain column +names are allowed in the data frame corresponding to parameters for the +GenSVM model. These names are: + +\describe{ +\item{p}{Parameter for the lp norm. Must be in [1.0, 2.0].} +\item{kappa}{Parameter for the Huber hinge function. Must be larger than +-1.} +\item{lambda}{Parameter for the regularization term. Must be larger than 0.} +\item{weight}{Instance weight specification. Allowed values are "unit" for +unit weights and "group" for group-size correction weights} +\item{epsilon}{Stopping parameter for the algorithm. Must be larger than 0.} +\item{max.iter}{Maximum number of iterations of the algorithm. Must be +larger than 0.} +\item{kernel}{The kernel to used, allowed values are "linear", "poly", +"rbf", and "sigmoid". The default is "linear"} +\item{coef}{Parameter for the "poly" and "sigmoid" kernels. See the section +"Kernels in GenSVM" in the code{ink{gensvm-package}} page for more info.} +\item{degree}{Parameter for the "poly" kernel. See the section "Kernels in +GenSVM" in the code{ink{gensvm-package}} page for more info.} +\item{gamma}{Parameter for the "poly", "rbf", and "sigmoid" kernels. See the +section "Kernels in GenSVM" in the code{ink{gensvm-package}} page for more +info.} +} + +For variables that are not present in the \code{param.grid} data frame the +default parameter values in the \code{\link{gensvm}} function will be used. + +Note that this function reorders the parameter grid to make the warm starts +as efficient as possible, which is why the param.grid in the result will not +be the same as the param.grid in the input. +} +\examples{ +x <- iris[, -5] +y <- iris[, 5] + +# use the default parameter grid +grid <- gensvm.grid(x, y) + +# use a smaller parameter grid +pg <- expand.grid(p=c(1.0, 1.5, 2.0), kappa=c(-0.9, 1.0), epsilon=c(1e-3)) +grid <- gensvm.grid(x, y, param.grid=pg) + +# print the result +print(grid) + +# Using a custom scoring function (accuracy as percentage) +acc.pct <- function(yt, yp) { return (100 * sum(yt == yp) / length(yt)) } +grid <- gensvm.grid(x, y, scoring=acc.pct) + +} +\author{ +Gerrit J.J. van den Burg, Patrick J.F. Groenen \cr +Maintainer: Gerrit J.J. van den Burg <gertjanvandenburg@gmail.com> +} +\references{ +Van den Burg, G.J.J. and Groenen, P.J.F. (2016). \emph{GenSVM: A Generalized +Multiclass Support Vector Machine}, Journal of Machine Learning Research, +17(225):1--42. URL \url{http://jmlr.org/papers/v17/14-526.html}. +} +\seealso{ +\code{\link{predict.gensvm.grid}}, \code{\link{print.gensvm.grid}}, and +\code{\link{gensvm}}. +} + |
