aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGertjan van den Burg <gertjanvandenburg@gmail.com>2021-01-09 22:14:01 +0000
committerGertjan van den Burg <gertjanvandenburg@gmail.com>2021-01-09 22:14:01 +0000
commitbd8e6991b350a69fd0e08720711ede17261b1025 (patch)
treec6d4b039690ecb1d82f590c0335e8ec2f5ff4da5
parentDocumentation updates (diff)
downloadsparsestep-bd8e6991b350a69fd0e08720711ede17261b1025.tar.gz
sparsestep-bd8e6991b350a69fd0e08720711ede17261b1025.zip
Update readme with mini-tutorial
-rw-r--r--.github/images/sparsestep_prostate_1.pngbin0 -> 20549 bytes
-rw-r--r--.github/images/sparsestep_prostate_2.pngbin0 -> 23840 bytes
-rw-r--r--README.md193
3 files changed, 143 insertions, 50 deletions
diff --git a/.github/images/sparsestep_prostate_1.png b/.github/images/sparsestep_prostate_1.png
new file mode 100644
index 0000000..8f53392
--- /dev/null
+++ b/.github/images/sparsestep_prostate_1.png
Binary files differ
diff --git a/.github/images/sparsestep_prostate_2.png b/.github/images/sparsestep_prostate_2.png
new file mode 100644
index 0000000..b76492f
--- /dev/null
+++ b/.github/images/sparsestep_prostate_2.png
Binary files differ
diff --git a/README.md b/README.md
index 0ab35c9..d77d513 100644
--- a/README.md
+++ b/README.md
@@ -1,36 +1,145 @@
-SparseStep R Package
-====================
+# SparseStep R Package
-Paper: [SparseStep: Approximating the Counting Norm for Sparse
+SparseStep is an R package for sparse regularized regression and provides an
+alternative to methods such as best subset selection, elastic net, lasso, and
+lars. The SparseStep method is introduced in the following paper:
+
+[SparseStep: Approximating the Counting Norm for Sparse
Regularization](https://arxiv.org/abs/1701.06967) by G.J.J. van den Burg,
P.J.F. Groenen, and A. Alfons (*Arxiv preprint arXiv:1701.06967 [stat.ME]*,
2017).
-GitHub:
-[https://github.com/GjjvdBurg/SparseStep](https://github.com/GjjvdBurg/SparseStep).
-
-Introduction
-------------
-
-This R package implements the SparseStep method for solving the regression
-problem with a sparsity constraint on the parameters. The package is
-extensively documented through the builtin R documentation. See:
-
- ?'sparsestep-package'
- ?sparsestep
- ?path.sparsestep
-
-for more information.
-
-Installation
-------------
-
-This package can be installed through CRAN:
-
- install.packages('sparsestep')
-
-Reference
----------
+This R package can be easily installed by running
+``install.packages('sparsestep')`` in R. If you use the package in your work,
+please cite the above reference using, for instance, the following BibTeX
+entry:
+
+```bibtex
+@article{vandenburg2017sparsestep,
+ title = {{SparseStep}: Approximating the Counting Norm for Sparse Regularization},
+ author = {{Van den Burg}, G. J. J. and Groenen, P. J. F. and Alfons, A.},
+ journal = {arXiv preprint arXiv:1701.06967},
+ year = {2017}
+}
+```
+
+## Introduction
+
+The SparseStep method solves the regression problem regularized with the
+[`l_0` norm](https://en.wikipedia.org/wiki/Lp_space#When_p_=_0). Since the
+`l_0` term is highly non-convex and therefore difficult to optimize, this
+non-convexity is introduced gradually in SparseStep during optimization. As in
+other regularized regression methods such as ridge regression and lasso, a
+regularization parameter ``lambda`` can be specified to control the amount of
+regularization. The choice of regularization parameter affects how many
+non-zero variables remain in the final model.
+
+We will give a quick guide to SparseStep using the Prostate dataset from the
+book [Elements of Statistical
+Learning](https://web.stanford.edu/~hastie/ElemStatLearn/).
+
+We will show a few examples of running SparseStep on the Prostate dataset from
+the [lasso2](https://cran.r-project.org/web/packages/lasso2/index.html)
+package. First we load the data and create a data matrix and outcome vector:
+
+```r
+> prostate <-
+> read.table("http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data")
+> X <- prostate[prostate$train == T, c(-1, -10)]
+> X <- as.matrix(X)
+> y <- prostate[prostate$train == T, 1]
+> y <- as.vector(y)
+```
+
+The easiest way to fit a SparseStep model is to use the ``path.sparsestep``
+function. This estimates the entire path of solutions for the SparseStep model
+for different values of the regularization parameter using a [golden section
+search](https://en.wikipedia.org/wiki/Golden-section_search) algorithm.
+
+```r
+> path <- path.sparsestep(X, y)
+Found maximum value of lambda: 2^( 7 )
+Found minimum value of lambda: 2^( -3 )
+Running search in interval [ -3 , 7 ] ...
+Running search in interval [ -3 , 2 ] ...
+Running search in interval [ -3 , -0.5 ] ...
+Running search in interval [ -3 , -1.75 ] ...
+Running search in interval [ -0.5 , 2 ] ...
+Running search in interval [ -0.5 , 0.75 ] ...
+Running search in interval [ 0.125 , 0.75 ] ...
+Running search in interval [ 2 , 7 ] ...
+
+> plot(path, col=1:nrow(path$beta)) # col specifies colors to matplot
+> legend('topleft', legend=rownames(path$beta), lty=1, col=1:nrow(path$beta))
+```
+
+In the resulting plot we can see the coefficients of the features that are
+included in the model at different values of ``lambda``:
+
+![SparseStep regression on Prostate dataset](./.github/images/sparsestep_prostate_1.png)
+
+The coefficients of the model can be obtained using ``coef(path)``, which
+returns a sparse matrix:
+
+```r
+> coef(path)
+9 x 9 sparse Matrix of class "dgCMatrix"
+ s0 s1 s2 s3 s4 s5 s6 s7
+Intercept 1.31349155 1.313491553 1.313491553 1.31349155 1.313491553 1.31349155 1.3134916 1.313492
+lweight -0.11336968 -0.113485291 . . . . . .
+age 0.02010188 0.020182049 0.018605327 0.01491472 0.018704172 0.01623212 . .
+lbph -0.05698125 -0.059026246 -0.069116923 . . . . .
+svi 0.03511645 . . . . . . .
+lcp 0.41845469 0.423398063 0.420516410 0.43806447 0.433449263 0.38174743 0.3887863 .
+gleason 0.22438690 0.222333394 0.236944796 0.23503609 . . . .
+pgg45 -0.00911273 -0.009084031 -0.008949463 -0.00853420 -0.004328518 . . .
+lpsa 0.57545508 0.580111724 0.561063637 0.53017309 0.528953966 0.51473225 0.5336907 0.754266
+ s8
+Intercept 1.313492
+lweight .
+age .
+lbph .
+svi .
+lcp .
+gleason .
+pgg45 .
+lpsa .
+```
+
+Note that the final model included in ``coef(beta)`` is a intercept-only
+model, which is generally not very useful. Predicting out-of-sample data can
+be done easily using the ``predict`` function.
+
+By default SparseStep centers the regressors and outcome variable ``y`` and
+normalizes the regressors ``X`` to ensure that the regularization is applied
+evenly among them and the intercept is not penalized. If you prefer to use a
+constant term in the regression and penalize this as well, you'll have to
+transform the input data and disable the intercept:
+
+```r
+> Z <- cbind(constant=1, X)
+> path <- path.sparsestep(Z, y, intercept=F)
+...
+> plot(path, col=1:nrow(path$beta))
+> legend('bottomright', legend=rownames(path$beta), lty=1, col=1:nrow(path$beta))
+```
+
+Note that since we add the constant through the data matrix it is subject to
+regularization and therefore sparsity:
+
+![SparseStep regression on Prostate dataset (with
+constant)](./.github/images/sparsestep_prostate_2.png)
+
+For more information and examples, please see the documentation included with
+the package. In particular, the following pages are good places to start:
+
+```r
+> ?'sparsestep-package'
+> ?sparsestep
+> ?path.sparsestep
+```
+
+## Reference
If you use SparseStep in any of your projects, please cite the paper using the
information available through the R command:
@@ -51,26 +160,10 @@ or use the following BibTeX code:
keywords = {Statistics - Methodology, 62J05, 62J07},
}
-License
--------
-
- Copyright 2016, G.J.J. van den Burg.
-
- SparseStep is free software: you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation, either version 3 of the License, or
- (at your option) any later version.
-
- SparseStep is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with SparseStep. If not, see <http://www.gnu.org/licenses/>.
-
- For more information please contact:
-
- G.J.J. van den Burg
- email: gertjanvandenburg@gmail.com
+## Notes
+This package is licensed under GPLv3. Please see the LICENSE file for more
+information. If you have any questions or comments about this package, please
+open an issue [on GitHub](https://github.com/GjjvdBurg/sparsestep) (don't
+hesitate, you're helping to make this project better for everyone!). If you
+prefer to use email, please write to ``gertjanvandenburg at gmail dot com``.