Update readme with mini-tutorial

author: Gertjan van den Burg <gertjanvandenburg@gmail.com> 2021-01-09 22:14:01 +0000
committer: Gertjan van den Burg <gertjanvandenburg@gmail.com> 2021-01-09 22:14:01 +0000
commit: bd8e6991b350a69fd0e08720711ede17261b1025 (patch)
tree: c6d4b039690ecb1d82f590c0335e8ec2f5ff4da5
parent: Documentation updates (diff)
download: sparsestep-bd8e6991b350a69fd0e08720711ede17261b1025.tar.gz
sparsestep-bd8e6991b350a69fd0e08720711ede17261b1025.zip
3 files changed, 143 insertions, 50 deletions
diff --git a/.github/images/sparsestep_prostate_1.png b/.github/images/sparsestep_prostate_1.png
new file mode 100644
index 0000000..8f53392
--- /dev/null
+++ b/.github/images/sparsestep_prostate_1.png
diff --git a/.github/images/sparsestep_prostate_2.png b/.github/images/sparsestep_prostate_2.png
new file mode 100644
index 0000000..b76492f
--- /dev/null
+++ b/.github/images/sparsestep_prostate_2.png
diff --git a/README.md b/README.md
index 0ab35c9..d77d513 100644
--- a/README.md
+++ b/README.md
@@ -1,36 +1,145 @@
-SparseStep R Package
-====================
+# SparseStep R Package
 
-Paper: [SparseStep: Approximating the Counting Norm for Sparse 
+SparseStep is an R package for sparse regularized regression and provides an 
+alternative to methods such as best subset selection, elastic net, lasso, and 
+lars. The SparseStep method is introduced in the following paper:
+
+[SparseStep: Approximating the Counting Norm for Sparse 
 Regularization](https://arxiv.org/abs/1701.06967) by G.J.J. van den Burg, 
 P.J.F. Groenen, and A. Alfons (*Arxiv preprint arXiv:1701.06967 [stat.ME]*, 
 2017).
 
-GitHub: 
-[https://github.com/GjjvdBurg/SparseStep](https://github.com/GjjvdBurg/SparseStep).
-
-Introduction
-------------
-
-This R package implements the SparseStep method for solving the regression 
-problem with a sparsity constraint on the parameters. The package is 
-extensively documented through the builtin R documentation. See:
-
-    ?'sparsestep-package'
-    ?sparsestep
-    ?path.sparsestep
-
-for more information.
-
-Installation
-------------
-
-This package can be installed through CRAN:
-
-    install.packages('sparsestep')
-
-Reference
----------
+This R package can be easily installed by running 
+``install.packages('sparsestep')`` in R. If you use the package in your work, 
+please cite the above reference using, for instance, the following BibTeX 
+entry:
+
+```bibtex
+@article{vandenburg2017sparsestep,
+  title = {{SparseStep}: Approximating the Counting Norm for Sparse Regularization},
+  author = {{Van den Burg}, G. J. J. and Groenen, P. J. F. and Alfons, A.},
+  journal = {arXiv preprint arXiv:1701.06967},
+  year = {2017}
+}
+```
+
+## Introduction
+
+The SparseStep method solves the regression problem regularized with the 
+[`l_0` norm](https://en.wikipedia.org/wiki/Lp_space#When_p_=_0). Since the 
+`l_0` term is highly non-convex and therefore difficult to optimize, this 
+non-convexity is introduced gradually in SparseStep during optimization. As in 
+other regularized regression methods such as ridge regression and lasso, a 
+regularization parameter ``lambda`` can be specified to control the amount of 
+regularization.  The choice of regularization parameter affects how many 
+non-zero variables remain in the final model.
+
+We will give a quick guide to SparseStep using the Prostate dataset from the 
+book [Elements of Statistical 
+Learning](https://web.stanford.edu/~hastie/ElemStatLearn/). 
+
+We will show a few examples of running SparseStep on the Prostate dataset from 
+the [lasso2](https://cran.r-project.org/web/packages/lasso2/index.html) 
+package. First we load the data and create a data matrix and outcome vector:
+
+```r
+> prostate <- 
+> read.table("http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data")
+> X <- prostate[prostate$train == T, c(-1, -10)]
+> X <- as.matrix(X)
+> y <- prostate[prostate$train == T, 1]
+> y <- as.vector(y)
+```
+
+The easiest way to fit a SparseStep model is to use the ``path.sparsestep`` 
+function. This estimates the entire path of solutions for the SparseStep model 
+for different values of the regularization parameter using a [golden section 
+search](https://en.wikipedia.org/wiki/Golden-section_search) algorithm.
+
+```r
+> path <- path.sparsestep(X, y)
+Found maximum value of lambda: 2^( 7 )
+Found minimum value of lambda: 2^( -3 )
+Running search in interval [ -3 , 7 ] ...
+Running search in interval [ -3 , 2 ] ...
+Running search in interval [ -3 , -0.5 ] ...
+Running search in interval [ -3 , -1.75 ] ...
+Running search in interval [ -0.5 , 2 ] ...
+Running search in interval [ -0.5 , 0.75 ] ...
+Running search in interval [ 0.125 , 0.75 ] ...
+Running search in interval [ 2 , 7 ] ...
+
+> plot(path, col=1:nrow(path$beta))     # col specifies colors to matplot
+> legend('topleft', legend=rownames(path$beta), lty=1, col=1:nrow(path$beta))
+```
+
+In the resulting plot we can see the coefficients of the features that are 
+included in the model at different values of ``lambda``:
+
+![SparseStep regression on Prostate dataset](./.github/images/sparsestep_prostate_1.png)
+
+The coefficients of the model can be obtained using ``coef(path)``, which 
+returns a sparse matrix:
+
+```r
+> coef(path)
+9 x 9 sparse Matrix of class "dgCMatrix"
+                   s0           s1           s2          s3           s4         s5        s6       s7
+Intercept  1.31349155  1.313491553  1.313491553  1.31349155  1.313491553 1.31349155 1.3134916 1.313492
+lweight   -0.11336968 -0.113485291  .            .           .           .          .         .
+age        0.02010188  0.020182049  0.018605327  0.01491472  0.018704172 0.01623212 .         .
+lbph      -0.05698125 -0.059026246 -0.069116923  .           .           .          .         .
+svi        0.03511645  .            .            .           .           .          .         .
+lcp        0.41845469  0.423398063  0.420516410  0.43806447  0.433449263 0.38174743 0.3887863 .
+gleason    0.22438690  0.222333394  0.236944796  0.23503609  .           .          .         .
+pgg45     -0.00911273 -0.009084031 -0.008949463 -0.00853420 -0.004328518 .          .         .
+lpsa       0.57545508  0.580111724  0.561063637  0.53017309  0.528953966 0.51473225 0.5336907 0.754266
+                s8
+Intercept 1.313492
+lweight   .
+age       .
+lbph      .
+svi       .
+lcp       .
+gleason   .
+pgg45     .
+lpsa      .
+```
+
+Note that the final model included in ``coef(beta)`` is a intercept-only 
+model, which is generally not very useful. Predicting out-of-sample data can 
+be done easily using the ``predict`` function.
+
+By default SparseStep centers the regressors and outcome variable ``y`` and 
+normalizes the regressors ``X`` to ensure that the regularization is applied 
+evenly among them and the intercept is not penalized. If you prefer to use a 
+constant term in the regression and penalize this as well, you'll have to 
+transform the input data and disable the intercept:
+
+```r
+> Z <- cbind(constant=1, X)
+> path <- path.sparsestep(Z, y, intercept=F)
+...
+> plot(path, col=1:nrow(path$beta))
+> legend('bottomright', legend=rownames(path$beta), lty=1, col=1:nrow(path$beta))
+```
+
+Note that since we add the constant through the data matrix it is subject to 
+regularization and therefore sparsity:
+
+![SparseStep regression on Prostate dataset (with 
+constant)](./.github/images/sparsestep_prostate_2.png)
+
+For more information and examples, please see the documentation included with 
+the package. In particular, the following pages are good places to start:
+
+```r
+> ?'sparsestep-package'
+> ?sparsestep
+> ?path.sparsestep
+```
+
+## Reference
 
 If you use SparseStep in any of your projects, please cite the paper using the 
 information available through the R command:
@@ -51,26 +160,10 @@ or use the following BibTeX code:
       keywords = {Statistics - Methodology, 62J05, 62J07},
     }
 
-License
--------
-
-    Copyright 2016, G.J.J. van den Burg.
-
-    SparseStep is free software: you can redistribute it and/or modify
-    it under the terms of the GNU General Public License as published by
-    the Free Software Foundation, either version 3 of the License, or
-    (at your option) any later version.
-
-    SparseStep is distributed in the hope that it will be useful,
-    but WITHOUT ANY WARRANTY; without even the implied warranty of
-    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-    GNU General Public License for more details.
-
-    You should have received a copy of the GNU General Public License
-    along with SparseStep. If not, see <http://www.gnu.org/licenses/>.
-
-    For more information please contact:
-
-    G.J.J. van den Burg
-    email: gertjanvandenburg@gmail.com
+## Notes
 
+This package is licensed under GPLv3. Please see the LICENSE file for more 
+information. If you have any questions or comments about this package, please 
+open an issue [on GitHub](https://github.com/GjjvdBurg/sparsestep) (don't 
+hesitate, you're helping to make this project better for everyone!). If you 
+prefer to use email, please write to ``gertjanvandenburg at gmail dot com``.
author	Gertjan van den Burg <gertjanvandenburg@gmail.com>	2021-01-09 22:14:01 +0000
committer	Gertjan van den Burg <gertjanvandenburg@gmail.com>	2021-01-09 22:14:01 +0000
commit	bd8e6991b350a69fd0e08720711ede17261b1025 (patch)
tree	c6d4b039690ecb1d82f590c0335e8ec2f5ff4da5
parent	Documentation updates (diff)
download	sparsestep-bd8e6991b350a69fd0e08720711ede17261b1025.tar.gz sparsestep-bd8e6991b350a69fd0e08720711ede17261b1025.zip