#' @title Cross-validated grid search for GenSVM
#'
#' @description This function performs a cross-validated grid search of the 
#' model parameters to find the best hyperparameter configuration for a given 
#' dataset. This function takes advantage of GenSVM's ability to use warm 
#' starts to speed up computation. The function uses the GenSVM C library for 
#' speed. 
#'
#' @param x training data matrix. We denote the size of this matrix by 
#' n_samples x n_features.
#' @param y training vector of class labels of length n_samples. The number of 
#' unique labels in this vector is denoted by n_classes.
#' @param param.grid String (\code{'tiny'}, \code{'small'}, or \code{'full'}) 
#' or data frame with parameter configurations to evaluate.  Typically this is 
#' the output of \code{expand.grid}. For more details, see "Using a Parameter 
#' Grid" below.
#' @param refit boolean variable. If true, the best model from cross validation 
#' is fitted again on the entire dataset.
#' @param scoring metric to use to evaluate the classifier performance during 
#' cross validation. The metric should be an R function that takes two 
#' arguments: y_true and y_pred and that returns a float such that higher 
#' values are better. If it is NULL, the accuracy score will be used.
#' @param cv the number of cross-validation folds to use or a vector with the 
#' same length as \code{y} where each unique value denotes a test split.
#' @param verbose integer to indicate the level of verbosity (higher is more 
#' verbose)
#' @param return.train.score whether or not to return the scores on the 
#' training splits
#'
#' @return A "gensvm.grid" S3 object with the following items:
#' \item{call}{Call that produced this object}
#' \item{param.grid}{Sorted version of the parameter grid used in training}
#' \item{cv.results}{A data frame with the cross validation results}
#' \item{best.estimator}{If refit=TRUE, this is the GenSVM model fitted with 
#' the best hyperparameter configuration, otherwise it is NULL}
#' \item{best.score}{Mean cross-validated test score for the model with the 
#' best hyperparameter configuration}
#' \item{best.params}{Parameter configuration that provided the highest mean 
#' cross-validated test score}
#' \item{best.index}{Row index of the cv.results data frame that corresponds to 
#' the best hyperparameter configuration}
#' \item{n.splits}{The number of cross-validation splits}
#' \item{n.objects}{The number of instances in the data}
#' \item{n.features}{The number of features of the data}
#' \item{n.classes}{The number of classes in the data}
#' \item{classes}{Array with the unique classes in the data}
#' \item{total.time}{Training time for the grid search}
#' \item{cv.idx}{Array with cross validation indices used to split the data}
#'
#' @section Using a Parameter Grid:
#' To evaluate certain parameter configurations, a data frame can be supplied 
#' to the \code{param.grid} argument of the function. Such a data frame can 
#' easily be generated using the R function \code{expand.grid}, or could be 
#' created through other ways to test specific parameter configurations.
#'
#' Three parameter grids are predefined:
#' \describe{
#' \item{\code{'tiny'}}{This parameter grid is generated by the function 
#' \code{\link{gensvm.load.tiny.grid}} and is the default parameter grid. It 
#' consists of parameter configurations that are likely to perform well on 
#' various datasets.}
#' \item{\code{'small'}}{This grid is generated by 
#' \code{\link{gensvm.load.small.grid}} and generates a data frame with 90 
#' configurations. It is typically fast to train but contains some 
#' configurations that are unlikely to perform well. It is included for 
#' educational purposes.}
#' \item{\code{'full'}}{This grid loads the parameter grid as used in the 
#' GenSVM paper. It consists of 342 configurations and is generated by the 
#' \code{\link{gensvm.load.full.grid}} function. Note that in the GenSVM paper 
#' cross validation was done with this parameter grid, but the final training 
#' step used \code{epsilon=1e-8}. The \code{\link{gensvm.refit}} function is 
#' useful in this scenario.}
#' }
#'
#' When you provide your own parameter grid, beware that only certain column 
#' names are allowed in the data frame corresponding to parameters for the 
#' GenSVM model. These names are:
#'
#' \describe{
#' \item{p}{Parameter for the lp norm. Must be in [1.0, 2.0].}
#' \item{kappa}{Parameter for the Huber hinge function. Must be larger than 
#' -1.}
#' \item{lambda}{Parameter for the regularization term. Must be larger than 0.}
#' \item{weights}{Instance weights specification. Allowed values are "unit" for 
#' unit weights and "group" for group-size correction weights}
#' \item{epsilon}{Stopping parameter for the algorithm. Must be larger than 0.}
#' \item{max.iter}{Maximum number of iterations of the algorithm. Must be 
#' larger than 0.}
#' \item{kernel}{The kernel to used, allowed values are "linear", "poly", 
#' "rbf", and "sigmoid". The default is "linear"}
#' \item{coef}{Parameter for the "poly" and "sigmoid" kernels. See the section 
#' "Kernels in GenSVM" in the code{ink{gensvm-package}} page for more info.}
#' \item{degree}{Parameter for the "poly" kernel. See the section "Kernels in 
#' GenSVM" in the code{ink{gensvm-package}} page for more info.}
#' \item{gamma}{Parameter for the "poly", "rbf", and "sigmoid" kernels. See the 
#' section "Kernels in GenSVM" in the code{ink{gensvm-package}} page for more 
#' info.}
#' }
#'
#' For variables that are not present in the \code{param.grid} data frame the 
#' default parameter values in the \code{\link{gensvm}} function will be used.
#'
#' Note that this function reorders the parameter grid to make the warm starts 
#' as efficient as possible, which is why the param.grid in the result will not 
#' be the same as the param.grid in the input.
#'
#' @note
#' 1. This function returns partial results when the computation is interrupted 
#' by the user.
#' 2. The score.time reported in the results only covers the time needed to 
#' compute the score from the predictions and true class labels. It does not 
#' include the time to compute the predictions themselves.
#'
#' @author
#' Gerrit J.J. van den Burg, Patrick J.F. Groenen \cr
#' Maintainer: Gerrit J.J. van den Burg <gertjanvandenburg@gmail.com>
#'
#' @references
#' Van den Burg, G.J.J. and Groenen, P.J.F. (2016). \emph{GenSVM: A Generalized 
#' Multiclass Support Vector Machine}, Journal of Machine Learning Research, 
#' 17(225):1--42. URL \url{https://jmlr.org/papers/v17/14-526.html}.
#'
#' @seealso
#' \code{\link{predict.gensvm.grid}}, \code{\link{print.gensvm.grid}}, 
#' \code{\link{plot.gensvm.grid}}, \code{\link{gensvm}}, 
#' \code{\link{gensvm-package}}
#'
#' @export
#'
#' @importFrom stats sd
#'
#' @examples
#' x <- iris[, -5]
#' y <- iris[, 5]
#'
#' \donttest{
#' # use the default parameter grid
#' grid <- gensvm.grid(x, y, verbose=TRUE)
#' }
#'
#' # use a smaller parameter grid
#' pg <- expand.grid(p=c(1.0, 1.5, 2.0), kappa=c(-0.9, 1.0), epsilon=c(1e-3))
#' grid <- gensvm.grid(x, y, param.grid=pg)
#'
#' # print the result
#' print(grid)
#'
#' \donttest{
#' # Using a custom scoring function (accuracy as percentage)
#' acc.pct <- function(yt, yp) { return (100 * sum(yt == yp) / length(yt)) }
#' grid <- gensvm.grid(x, y, scoring=acc.pct)
#'
#' # With RBF kernel and very verbose progress printing
#' pg <- expand.grid(kernel=c('rbf'), gamma=c(1e-2, 1e-1, 1, 1e1, 1e2),
#'                   lambda=c(1e-8, 1e-6), max.iter=c(5000))
#' grid <- gensvm.grid(x, y, param.grid=pg, verbose=2)
#' }
#'
gensvm.grid <- function(x, y, param.grid='tiny', refit=TRUE, scoring=NULL, cv=3, 
                        verbose=0, return.train.score=TRUE)
{
    call <- match.call()

    n.objects <- nrow(x)
    n.features <- ncol(x)
    n.classes <- length(unique(y))

    if (n.objects != length(y)) {
        cat("Error: x and y are not the same length.\n")
        return(invisible(NULL))
    }

    if (is.character(param.grid)) {
        if (param.grid == 'tiny') {
            param.grid <- gensvm.load.tiny.grid()
        } else if (param.grid == 'small') {
            param.grid <- gensvm.load.small.grid()
        } else if (param.grid == 'full') {
            param.grid <- gensvm.load.full.grid()
        }
    }

    # Validate the range of the values for the gridsearch
    if (!gensvm.validate.param.grid(param.grid))
        return(invisible(NULL))

    # Sort the parameter grid for efficient warm starts
    param.grid <- gensvm.sort.param.grid(param.grid)

    # Expand and convert the parameter grid for use in the C function
    C.param.grid <- gensvm.expand.param.grid(param.grid, n.features)

    # Convert labels to integers
    classes <- sort(unique(y))
    y.clean <- match(y, classes)

    if (is.vector(cv) && length(cv) == n.objects) {
        folds <- sort(unique(cv))
        cv.idx <- match(cv, folds) - 1
        n.splits <- length(folds)
    } else {
        cv.idx <- gensvm.generate.cv.idx(n.objects, cv[1])
        n.splits <- cv
    }

    results <- .Call("R_gensvm_grid",
                 data.matrix(x),
                 as.integer(y.clean),
                 as.matrix(C.param.grid),
                 as.integer(nrow(C.param.grid)),
                 as.integer(ncol(C.param.grid)),
                 as.integer(cv.idx),
                 as.integer(n.splits),
                 as.integer(verbose),
                 as.integer(n.objects),
                 as.integer(n.features),
                 as.integer(n.classes)
                 )

    cv.results <- gensvm.cv.results(results, param.grid, cv.idx, y.clean, 
                                    scoring, 
                                    return.train.score=return.train.score)

    best.index <- which.min(cv.results$rank.test.score)[1]
    if (!is.na(best.index)) { # can occur when user interrupts
        best.score <- cv.results$mean.test.score[best.index]
        best.params <- param.grid[best.index, , drop=F]
        # Remove duplicate attributes from best.params
        attr(best.params, "out.attrs") <- NULL
    } else {
        best.score <- NA
        best.params <- list()
    }

    if (refit && !is.na(best.index)) {
        gensvm.args <- as.list(best.params)
        # Stupid factors...
        if ("weights" %in% names(gensvm.args))
            gensvm.args$weights <- as.character(gensvm.args$weights)
        if ("kernel" %in% names(gensvm.args))
            gensvm.args$kernel <- as.character(gensvm.args$kernel)
        gensvm.args$x <- x
        gensvm.args$y <- y
        gensvm.args$verbose <- if(verbose>1) 1 else 0
        if (verbose > 1)
            cat("Refitting with best parameters ...\n")
        best.estimator <- do.call(gensvm, gensvm.args)
    } else {
        best.estimator <- NULL
    }

    object <- list(call = call, param.grid = param.grid, 
                   cv.results = cv.results, best.estimator = best.estimator, 
                   best.score = best.score, best.params = best.params, 
                   best.index = best.index, n.splits = n.splits, 
                   n.objects = n.objects, n.features = n.features, 
                   n.classes = n.classes, classes = classes, 
                   total.time = results$total.time, cv.idx = cv.idx)
    class(object) <- "gensvm.grid"
    return(object)
}

#' @title Load a tiny parameter grid for the GenSVM grid search
#'
#' @description This function returns a parameter grid to use in the GenSVM 
#' grid search. This grid was obtained by analyzing the experiments done for 
#' the GenSVM paper and selecting the configurations that achieve accuracy 
#' within the 95th percentile on over 90% of the datasets. It is a good start 
#' for a parameter search with a reasonably high chance of achieving good 
#' performance on most datasets.
#'
#' Note that this grid is only tested to work well in combination with the 
#' linear kernel.
#'
#' @author
#' Gerrit J.J. van den Burg, Patrick J.F. Groenen \cr
#' Maintainer: Gerrit J.J. van den Burg <gertjanvandenburg@gmail.com>
#'
#' @references
#' Van den Burg, G.J.J. and Groenen, P.J.F. (2016). \emph{GenSVM: A Generalized 
#' Multiclass Support Vector Machine}, Journal of Machine Learning Research, 
#' 17(225):1--42. URL \url{https://jmlr.org/papers/v17/14-526.html}.
#'
#' @export
#'
#' @seealso
#' \code{\link{gensvm.grid}}, \code{\link{gensvm.load.small.grid}}, 
#' \code{\link{gensvm.load.full.grid}}.
#'
gensvm.load.tiny.grid <- function()
{
    df <- data.frame(
                     p=c(2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.5, 2.0, 2.0),
                     kappa=c(5.0, 5.0, 0.5, 5.0, -0.9, 5.0, 0.5, -0.9, 0.5, 0.5),
                     lambda=c(2^-16, 2^-18, 2^-18, 2^-18, 2^-18, 2^-14, 2^-18,
                              2^-18, 2^-16, 2^-16),
                     weights=c('unit', 'unit', 'unit', 'group', 'unit',
                               'unit', 'group', 'unit', 'unit', 'group')
                     )
    return(df)
}

#' @title Load a large parameter grid for the GenSVM grid search
#'
#' @description This loads the parameter grid from the GenSVM paper. It 
#' consists of 342 configurations and is constructed from all possible 
#' combinations of the following parameter sets:
#'
#' \code{p = c(1.0, 1.5, 2.0)}
#' \code{lambda = 2^seq(-18, 18, 2)}
#' \code{kappa = c(-0.9, 0.5, 5.0)}
#' \code{weights = c('unit', 'group')}
#'
#' @author
#' Gerrit J.J. van den Burg, Patrick J.F. Groenen \cr
#' Maintainer: Gerrit J.J. van den Burg <gertjanvandenburg@gmail.com>
#'
#' @references
#' Van den Burg, G.J.J. and Groenen, P.J.F. (2016). \emph{GenSVM: A Generalized 
#' Multiclass Support Vector Machine}, Journal of Machine Learning Research, 
#' 17(225):1--42. URL \url{https://jmlr.org/papers/v17/14-526.html}.
#'
#' @export
#'
#' @seealso
#' \code{\link{gensvm.grid}}, \code{\link{gensvm.load.tiny.grid}}, 
#' \code{\link{gensvm.load.full.grid}}.
#'
gensvm.load.full.grid <- function()
{
    df <- expand.grid(p=c(1.0, 1.5, 2.0), lambda=2^seq(-18, 18, 2),
                      kappa=c(-0.9, 0.5, 5.0), weights=c('unit', 'group'),
                      epsilon=c(1e-6))
    return(df)
}


#' @title Load the small parameter grid for the GenSVM grid search
#'
#' @description This function loads a small parameter grid to use for the 
#' GenSVM gridsearch. It contains all possible combinations of the following 
#' parameter sets:
#'
#' \code{p = c(1.0, 1.5, 2.0)}
#' \code{lambda = c(1e-8, 1e-6, 1e-4, 1e-2, 1)}
#' \code{kappa = c(-0.9, 0.5, 5.0)}
#' \code{weights= c('unit', 'group')}
#'
#' @author
#' Gerrit J.J. van den Burg, Patrick J.F. Groenen \cr
#' Maintainer: Gerrit J.J. van den Burg <gertjanvandenburg@gmail.com>
#'
#' @references
#' Van den Burg, G.J.J. and Groenen, P.J.F. (2016). \emph{GenSVM: A Generalized 
#' Multiclass Support Vector Machine}, Journal of Machine Learning Research, 
#' 17(225):1--42. URL \url{https://jmlr.org/papers/v17/14-526.html}.
#'
#' @export
#'
#' @seealso
#' \code{\link{gensvm.grid}}, \code{\link{gensvm.load.tiny.grid}}, 
#' \code{\link{gensvm.load.small.grid}}.
#'
gensvm.load.small.grid <- function()
{
    df <- expand.grid(p=c(1.0, 1.5, 2.0), lambda=c(1e-8, 1e-6, 1e-4, 1e-2, 1),
                      kappa=c(-0.9, 0.5, 5.0), weights=c('unit', 'group'))
    return(df)
}


#' @title Generate a vector of cross-validation indices
#'
#' @description
#' This function generates a vector of length \code{n} with values from 0 to 
#' \code{folds-1} to mark train and test splits.
#'
#' @param n the number of instances
#' @param folds the number of cross validation folds
#'
#' @return an array of length \code{n} with values in the range [0, folds-1] 
#' indicating the test fold of each instance.
#'
#' @author
#' Gerrit J.J. van den Burg, Patrick J.F. Groenen \cr
#' Maintainer: Gerrit J.J. van den Burg <gertjanvandenburg@gmail.com>
#'
#' @references
#' Van den Burg, G.J.J. and Groenen, P.J.F. (2016). \emph{GenSVM: A Generalized 
#' Multiclass Support Vector Machine}, Journal of Machine Learning Research, 
#' 17(225):1--42. URL \url{https://jmlr.org/papers/v17/14-526.html}.
#'
#' @seealso
#' \code{\link{gensvm.grid}}
gensvm.generate.cv.idx <- function(n, folds)
{
    cv.idx <- matrix(0, n, 1)

    big.folds <- n %% folds
    small.fold.size <- n %/% folds

    j <- 0
    for (i in 0:(small.fold.size * folds)) {
        while (TRUE) {
            idx <- round(runif(1, 1, n))
            if (cv.idx[idx] == 0) {
                cv.idx[idx] <- j
                j <- j + 1
                j <- (j %% folds)
                break
            }
        }
    }

    j <- 1
    i <- 0
    while (i < big.folds) {
        if (cv.idx[j] == 0) {
            cv.idx[j] <- i
            i <- i + 1
        }
        j <- j + 1
    }

    return(cv.idx)
}

gensvm.cv.results <- function(results, param.grid, cv.idx, y.true, 
                                     scoring, return.train.score=TRUE)
{
    n.candidates <- nrow(param.grid)
    n.splits <- length(unique(cv.idx))

    score <- if(is.function(scoring)) scoring else gensvm.accuracy

    # Build names and initialize the data.frame
    names <- c("mean.fit.time", "mean.score.time", "mean.test.score")
    if (return.train.score)
        names <- c(names, "mean.train.score")
    for (param in names(param.grid)) {
        names <- c(names, sprintf("param.%s", param))
    }
    names <- c(names, "rank.test.score")
    for (idx in sort(unique(cv.idx))) {
        names <- c(names, sprintf("split%i.test.score", idx))
        if (return.train.score)
            names <- c(names, sprintf("split%i.train.score", idx))
    }
    names <- c(names, "std.fit.time", "std.score.time", "std.test.score")
    if (return.train.score)
        names <- c(names, "std.train.score")

    df <- data.frame(matrix(ncol=length(names), nrow=n.candidates))
    colnames(df) <- names

    for (pidx in 1:n.candidates) {
        param <- param.grid[pidx, , drop=F]
        durations <- results$durations[pidx, ]
        predictions <- results$predictions[pidx, ]

        fit.times <- durations
        score.times <- c()
        test.scores <- c()
        train.scores <- c()

        is.missing <- any(is.na(durations))

        for (test.idx in sort(unique(cv.idx))) {
            score.time <- 0

            if (return.train.score) {
                y.train.pred <- predictions[cv.idx != test.idx]
                y.train.true <- y.true[cv.idx != test.idx]

                start.time <- proc.time()
                train.score <- score(y.train.true, y.train.pred)
                stop.time <- proc.time()
                score.time <- score.time + (stop.time - start.time)[3]

                train.scores <- c(train.scores, train.score)
            }

            y.test.pred <- predictions[cv.idx == test.idx]
            y.test.true <- y.true[cv.idx == test.idx]

            start.time <- proc.time()
            test.score <- score(y.test.true, y.test.pred)
            stop.time <- proc.time()
            score.time <- score.time + (stop.time - start.time)[3]

            test.scores <- c(test.scores, test.score)

            score.times <- c(score.times, score.time)
        }

        df$mean.fit.time[pidx] <- mean(fit.times)
        df$mean.score.time[pidx] <- if(is.missing) NA else mean(score.times)
        df$mean.test.score[pidx] <- mean(test.scores)
        df$std.fit.time[pidx] <- sd(fit.times)
        df$std.score.time[pidx] <- if(is.missing) NA else sd(score.times)
        df$std.test.score[pidx] <- sd(test.scores)
        if (return.train.score) {
            df$mean.train.score[pidx] <- mean(train.scores)
            df$std.train.score[pidx] <- sd(train.scores)
        }

        for (parname in names(param.grid)) {
            header <- sprintf("param.%s", parname)
            val <- param[[parname]]
            if (is.factor(val))
                val <- levels(val)[val]
            df[[header]][pidx] <- val
        }

        j <- 1
        for (test.idx in sort(unique(cv.idx))) {
            lbl <- sprintf("split%i.test.score", test.idx)
            df[[lbl]][pidx] <- test.scores[j]
            if (return.train.score) {
                lbl <- sprintf("split%i.train.score", test.idx)
                df[[lbl]][pidx] <- train.scores[j]
            }
            j <- j + 1
        }
    }

    df$rank.test.score <- gensvm.rank.score(df$mean.test.score)

    return(df)
}

gensvm.sort.param.grid <- function(param.grid)
{
    all.cols <- c("kernel", "coef", "degree", "gamma", "weights", "kappa",
                  "lambda", "p", "epsilon", "max.iter")

    order.args <- NULL
    for (name in all.cols) {
        if (name %in% colnames(param.grid)) {
            if (name == "epsilon")
                order.args <- cbind(order.args, -param.grid[[name]])
            else
                order.args <- cbind(order.args, param.grid[[name]])
        }
    }
    sorted.pg <- param.grid[do.call(order, as.list(as.data.frame(order.args))), ]

    rownames(sorted.pg) <- NULL

    return(sorted.pg)
}

gensvm.expand.param.grid <- function(pg, n.features)
{
    if ("kernel" %in% colnames(pg)) {
        all.kernels <- c("linear", "poly", "rbf", "sigmoid")
        pg$kernel <- match(pg$kernel, all.kernels) - 1
    } else {
        pg$kernel <- 0
    }

    if ("weights" %in% colnames(pg)) {
        all.weights <- c("unit", "group")
        pg$weights <- match(pg$weights, all.weights)
    } else {
        pg$weights <- 1
    }

    if ("gamma" %in% colnames(pg)) {
        pg$gamma[pg$gamma == "auto"] <- 1.0/n.features
    } else {
        pg$gamma <- 1.0/n.features
    }

    if (!("degree" %in% colnames(pg)))
        pg$degree <- 2.0
    if (!("coef" %in% colnames(pg)))
        pg$coef <- 0.0
    if (!("p" %in% colnames(pg)))
        pg$p <- 1.0
    if (!("lambda" %in% colnames(pg)))
        pg$lambda <- 1e-8
    if (!("kappa" %in% colnames(pg)))
        pg$kappa <- 0.0
    if (!("epsilon" %in% colnames(pg)))
        pg$epsilon <- 1e-6
    if (!("max.iter" %in% colnames(pg)))
        pg$max.iter <- 1e8

    C.param.grid <- data.frame(kernel=pg$kernel, coef=pg$coef, 
                               degree=pg$degree, gamma=pg$gamma, 
                               weights=pg$weights, kappa=pg$kappa, 
                               lambda=pg$lambda, p=pg$p, epsilon=pg$epsilon, 
                               max.iter=pg$max.iter)
    return(C.param.grid)
}

#' @title Compute the ranks for the numbers in a given vector
#'
#' @description This function computes the ranks for the values in an array. 
#' The highest value gets the smallest rank. Ties are broken by assigning the 
#' smallest value. The smallest rank is 1.
#'
#' @param x array of numeric values
#'
#' @return array with the ranks of the values in the input array.
#'
gensvm.rank.score <- function(x)
{
    x <- as.array(x)
    l <- length(x)
    r <- 1
    ranks <- as.vector(matrix(0, l, 1))
    ranks[which(is.na(x))] <- NA
    while (!all(mapply(is.na, x))) {
        m <- max(x, na.rm=T)
        idx <- which(x == m)
        ranks[idx] <- r
        r <- r + length(idx)
        x[idx] <- NA
    }

    return(ranks)
}