Title: | Holistic Generalized Linear Models |
---|---|
Description: | Holistic generalized linear models (HGLMs) extend generalized linear models (GLMs) by enabling the possibility to add further constraints to the model. The 'holiglm' package simplifies estimating HGLMs using convex optimization. Additional information about the package can be found in the reference manual, the 'README' and the accompanying paper <doi:10.18637/jss.v108.i07>. |
Authors: | Benjamin Schwendinger [aut, cre], Florian Schwendinger [aut], Laura Vana [aut] |
Maintainer: | Benjamin Schwendinger <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-02-18 06:59:32 UTC |
Source: | https://github.com/cran/holiglm |
The holistic generalized linear models package simplifies estimating generalized linear models under constraints. The constraints can be used to,
bound the domains of specific covariates,
impose linear constraints on the covariates,
induce sparsity via best subset selection,
impose sparsity on groups of variables,
restrict the pairwise correlation between the selected coefficients,
impose sign coherence constraints on selected covariates and
force all predictors within a group either to be selected or not.
This sophisticated constraints are internally implemented via conic optimization. However, the package is designed such that the user, is not required to be familiar with conic optimization but is only required to have basic R knowledge.
Benjamin Schwendinger (Maintainer [email protected])
Florian Schwendinger
Laura Vana
Holistic regression
Schwendinger, B., Schwendinger, F., & Vana, L. (2024).
Holistic Generalized Linear Models.
doi:10.18637/jss.v108.i07.
Bertsimas, D., & King, A. (2016). OR Forum-An Algorithmic Approach to Linear Regression Operations Research 64(1):2-16. doi:10.1287/opre.2015.1436
Bertsimas, D., & Li, M. L. (2020). Scalable Holistic Linear Regression. Operations Research Letters 48 (3): 203–8. doi:10.1016/j.orl.2020.02.008.
Constrained regression
McDonald, J. W., & Diamond, I. D. (1990).
On the Fitting of Generalized Linear Models with Nonnegativity Parameter Constraints.
Biometrics, 46 (1): 201–206.
doi:10.2307/2531643
Slawski, M., & Hein, M. (2013). Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization. Electronic Journal of Statistics, 7: 3004-3056. doi:10.1214/13-EJS868
Carrizosa, E., Olivares-Nadal, A. V., & Ramírez-Cobo, P. (2020). Integer Constraints for Enhancing Interpretability in Linear Regression. SORT. Statistics and Operations Research Transactions, 44: 67-98. doi:10.2436/20.8080.02.95.
Lawson, C. L., & Hanson, R. J. (1995). Solving least squares problems. Society for Industrial and Applied Mathematics. Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611971217
Generalized Linear Models
McCullagh, P., & Nelder, J. A. (2019).
Generalized Linear Models (2nd ed.)
Routledge.
doi:10.1201/9780203753736.
Conic Optimization
Boyd, S., & Vandenberghe, L. (2004).
Convex Optimization (1st ed.)
Cambridge University Press.
https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf.
doi:10.1017/cbo9780511804441
Theußl, S., Schwendinger, F., & Hornik, K. (2020). ROI: An Extensible R Optimization Infrastructure. Journal of Statistical Software 94 (15): 1–64. doi:10.18637/jss.v094.i15.
The function returns a logical vector which is TRUE
for all
active (i.e., non-zero) coefficients in the fitted model and FALSE
otherwise.
active_coefficients(object, ...) acoef(object, ...)
active_coefficients(object, ...) acoef(object, ...)
object |
an object inheriting from |
... |
optional arguments currently ignored. |
a logical vector giving the active coefficients.
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) fit <- hglm(y ~ ., constraints = k_max(3), data = dat) active_coefficients(fit)
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) fit <- hglm(y ~ ., constraints = k_max(3), data = dat) active_coefficients(fit)
A simple function for aggregating binomial data, from
a form where y
contains only 0
and 1
and X
could contain duplicated rows, into a format where
y
is the matrix of counted successes and failures
and X
does not contain duplicates. If X
contains
factor variables, the model matrix corresponding to X
will be returned.
agg_binomial(formula, data, as_list = TRUE)
agg_binomial(formula, data, as_list = TRUE)
formula |
a formula object defining the aggregation. |
data |
a |
as_list |
a logical giving if the return value
should be a |
A list
(or data.frame
) containing aggregated
binomial data with counted successes and failures.
set.seed(12345) data <- data.frame(y = rbinom(50, 1, 0.7), a = factor(sample(c(1, 2), 50, TRUE)), b = factor(sample(c(1, 2, 3), 50, TRUE))) agg_binomial(y ~ ., data)
set.seed(12345) data <- data.frame(y = rbinom(50, 1, 0.7), a = factor(sample(c(1, 2), 50, TRUE)), b = factor(sample(c(1, 2, 3), 50, TRUE))) agg_binomial(y ~ ., data)
Convert an object of class "hglm_model"
into a ROI optimization problem (OP
).
## S3 method for class 'hglm_model' as.OP(x)
## S3 method for class 'hglm_model' as.OP(x)
x |
an object inheriting from |
This function is mainly for internal use and advanced users which want of
alter the model object or the underlying optimization problem.
This function converts the model object created by hglm_model
into a conic optimization problem solveable via ROI_solve
.
A ROI object of class "OP"
.
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) # Use hglm with option dry_run model <- hglm(y ~ ., data = dat, dry_run = TRUE) op <- as.OP(model) # User hglm_model x <- model.matrix(y ~ ., data = dat) model <- hglm_model(x, dat[["y"]]) op <- as.OP(model)
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) # Use hglm with option dry_run model <- hglm(y ~ ., data = dat, dry_run = TRUE) op <- as.OP(model) # User hglm_model x <- model.matrix(y ~ ., data = dat) model <- hglm_model(x, dat[["y"]]) op <- as.OP(model)
This data set contains the daily count of rented bikes from the the Capital Bikeshare system in Washington D.C., USA, for the years 2011 and 2012. The dataset is already prepared (correct types + factor encodings) for model building.
A data.frame of dimension 731 x 12 containing daily data related to related bikes.
a date vector giving the date of the rental.
a factor with levels 'spring', 'summer', 'fall' and 'winter'.
a factor with levels '2011' and '2012'.
a factor with levels 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov' and 'Dec'.
a boolean vector indicating if the day is a holiday.
a factor with levels 'good', 'neutral', 'bad' and 'very bad' giving the weather situation.
a numeric vector containing max-normalized temperature in Celsius with 41 as maximum.
a numeric vector containing max-normalized feeling temperature in Celsius with 50 as maximum.
a numeric vector containing max-normalized humidity with 100 as maximum.
a numeric vector containing max-normalized windspeed with 67 as maximum.
an integer vector containing counts of rented bikes.
Fanaee-T, Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
data("bike") hglm(formula = cnt ~ ., data=bike, family="poisson")
data("bike") hglm(formula = cnt ~ ., data=bike, family="poisson")
The function returns different types of coefficients for a model.
## S3 method for class 'hglm' coef(object, type = c("unscaled", "scaled", "selected"), ...)
## S3 method for class 'hglm' coef(object, type = c("unscaled", "scaled", "selected"), ...)
object |
an object inheriting from |
type |
Default value is |
... |
optional arguments currently ignored. |
The types "scaled"
and "unscaled"
refer to the coefficients
of the scaled/unscaled optimization problem. Type "selected"
refers
to the active coefficients in the model active_coefficients
.
a vector containing the unscaled, scaled or selected coefficients.
dat <- rhglm(1000, 1:3) fit <- hglm(y ~ ., data = dat) coef(fit) coef(fit, type="scaled") coef(fit, type="selected")
dat <- rhglm(1000, 1:3) fit <- hglm(y ~ ., data = dat) coef(fit) coef(fit, type="scaled") coef(fit, type="selected")
Utility function for constructing covariance matrices based on
a simple triplet format (simple_triplet_matrix
).
cov_matrix(k, i, j, v)
cov_matrix(k, i, j, v)
k |
an integer giving the number of rows and columns of the constructed covariance matrix. |
i |
an integer vector giving the row indices. |
j |
an integer vector giving the row indices. |
v |
a numeric vector giving the corresponding values. |
A dense matrix
of covariances.
Other simulation:
rhglm()
cov_matrix(5, c(1, 2), c(2, 3), c(0.8, 0.9))
cov_matrix(5, c(1, 2), c(2, 3), c(0.8, 0.9))
Forces all covariates in the specified group to have the same coefficient.
group_equal(vars)
group_equal(vars)
vars |
a vector specifying the indices or names of the covariates to which the constraint shall be applied. |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
Other Constraint-Constructors:
group_inout()
,
group_sparsity()
,
include()
,
k_max()
,
linear()
,
lower()
,
pairwise_sign_coherence()
,
rho_max()
,
sign_coherence()
,
upper()
dat <- rhglm(100, c(1, 2, 3, 4, 5, 6)) constraints <- group_equal(vars = c("x1", "x3")) hglm(y ~ ., constraints = constraints, data = dat)
dat <- rhglm(100, c(1, 2, 3, 4, 5, 6)) constraints <- group_equal(vars = c("x1", "x3")) hglm(y ~ ., constraints = constraints, data = dat)
Forces coefficients of the covariates in the specified group to be either all zero or all nonzero.
group_inout(vars)
group_inout(vars)
vars |
a vector specifying the indices or names of the covariates to which the constraint shall be applied. |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
Other Constraint-Constructors:
group_equal()
,
group_sparsity()
,
include()
,
k_max()
,
linear()
,
lower()
,
pairwise_sign_coherence()
,
rho_max()
,
sign_coherence()
,
upper()
dat <- rhglm(100, c(1, 2, 3, 4, 5, 6)) constraints <- group_inout(c("x1", "x2", "x3")) hglm(y ~ ., constraints = constraints, data = dat)
dat <- rhglm(100, c(1, 2, 3, 4, 5, 6)) constraints <- group_inout(c("x1", "x2", "x3")) hglm(y ~ ., constraints = constraints, data = dat)
Constraint which restricts the number of covariates selected from a specific group.
group_sparsity(vars, k = 1L)
group_sparsity(vars, k = 1L)
vars |
a vector specifying the indices or names of the covariates to which the group constraint shall be applied. |
k |
an integer giving the maximum number of covariates to be included in the model from the specified group. |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
Other Constraint-Constructors:
group_equal()
,
group_inout()
,
include()
,
k_max()
,
linear()
,
lower()
,
pairwise_sign_coherence()
,
rho_max()
,
sign_coherence()
,
upper()
dat <- rhglm(100, c(1, 2, 0, 4, 5, 0)) constraints <- group_sparsity(c("x1", "x2", "x5"), 1L) hglm(y ~ ., constraints = constraints, data = dat)
dat <- rhglm(100, c(1, 2, 0, 4, 5, 0)) constraints <- group_sparsity(c("x1", "x2", "x5"), 1L) hglm(y ~ ., constraints = constraints, data = dat)
Fit a generalized linear model under holistic constraints.
hglm( formula, family = gaussian(), data, constraints = NULL, weights = NULL, scaler = c("auto", "center_standardization", "center_minmax", "standardization", "minmax", "off"), scale_response = NULL, big_m = 100, solver = "auto", control = list(), dry_run = FALSE, approx = FALSE, object_size = c("normal", "big"), ... ) holiglm( formula, family = gaussian(), data, constraints = NULL, weights = NULL, scaler = c("auto", "center_standardization", "center_minmax", "standardization", "minmax", "off"), scale_response = NULL, big_m = 100, solver = "auto", control = list(), dry_run = FALSE, approx = FALSE, object_size = c("normal", "big"), ... ) hglm_seq( k_seq, formula, family = gaussian(), data, constraints = NULL, weights = NULL, scaler = c("auto", "center_standardization", "center_minmax", "standardization", "minmax", "off"), big_m = 100, solver = "auto", control = list(), object_size = c("normal", "big"), parallel = FALSE )
hglm( formula, family = gaussian(), data, constraints = NULL, weights = NULL, scaler = c("auto", "center_standardization", "center_minmax", "standardization", "minmax", "off"), scale_response = NULL, big_m = 100, solver = "auto", control = list(), dry_run = FALSE, approx = FALSE, object_size = c("normal", "big"), ... ) holiglm( formula, family = gaussian(), data, constraints = NULL, weights = NULL, scaler = c("auto", "center_standardization", "center_minmax", "standardization", "minmax", "off"), scale_response = NULL, big_m = 100, solver = "auto", control = list(), dry_run = FALSE, approx = FALSE, object_size = c("normal", "big"), ... ) hglm_seq( k_seq, formula, family = gaussian(), data, constraints = NULL, weights = NULL, scaler = c("auto", "center_standardization", "center_minmax", "standardization", "minmax", "off"), big_m = 100, solver = "auto", control = list(), object_size = c("normal", "big"), parallel = FALSE )
formula |
an object of class |
family |
a description of the error distribution and link function to be used in the model. |
data |
a |
constraints |
a list of 'HGLM' constraints stored in a list of class |
weights |
an optional vector of 'prior weights' to be used for the estimation. |
scaler |
a character string giving the name of the scaling function (default is |
scale_response |
a boolean whether the response shall be standardized or not. Can only
be used with family |
big_m |
an upper bound for the coefficients, needed for the big-M constraint.
Required to inherit from |
solver |
a character string giving the name of the solver to be used for the estimation. |
control |
a list of control parameters passed to |
dry_run |
a logical; if |
approx |
a logical; if |
object_size |
a character string giving the object size, allowed values
are |
... |
For ‘approx’: further arguments passed to or from other methods. |
k_seq |
an integer vector giving the values of |
parallel |
whether estimation of sequence shall be parallelized |
In the case of binding linear constraints the standard errors are corrected, more information about the correction can be found in Schwendinger, Schwendinger and Vana (2024) doi:10.18637/jss.v108.i07.
An object of class "hglm"
inheriting from "glm"
.
Schwendinger B., Schwendinger F., Vana L. (2024). Holistic Generalized Linear Models doi:10.18637/jss.v108.i07
Bertsimas, D., & King, A. (2016). OR Forum-An Algorithmic Approach to Linear Regression Operations Research 64(1):2-16. doi:10.1287/opre.2015.1436
McCullagh, P., & Nelder, J. A. (2019). Generalized Linear Models (2nd ed.) Routledge. doi:10.1201/9780203753736.
Dobson, A. J., & Barnett, A. G. (2018). An Introduction to Generalized Linear Models (4th ed.) Chapman and Hall/CRC. doi:10.1201/9781315182780
Chares, Robert. (2009). “Cones and Interior-Point Algorithms for Structured Convex Optimization involving Powers and Exponentials.”
Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95 (3): 759–771. Oxford University Press. doi:10.1093/biomet/asn034
Zhu, J., Wen, C., Zhu, J., Zhang, H., & Wang, X. (2020). A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117 (52): 33117–33123. doi:10.1073/pnas.2014241117
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) hglm(y ~ ., constraints = NULL, data = dat) # estimation without constraints hglm(y ~ ., constraints = NULL, data = dat) # estimation with an upper bound on the number of coefficients to be selected hglm(y ~ ., constraints = k_max(3), data = dat) # estimation without intercept hglm(y ~ . - 1, data = dat)
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) hglm(y ~ ., constraints = NULL, data = dat) # estimation without constraints hglm(y ~ ., constraints = NULL, data = dat) # estimation with an upper bound on the number of coefficients to be selected hglm(y ~ ., constraints = k_max(3), data = dat) # estimation without intercept hglm(y ~ . - 1, data = dat)
Fit a generalized linear model under constraints.
hglm_fit( model, constraints = NULL, big_m, solver = "auto", control = list(), dry_run = FALSE, approx = FALSE, object_size = c("normal", "big") )
hglm_fit( model, constraints = NULL, big_m, solver = "auto", control = list(), dry_run = FALSE, approx = FALSE, object_size = c("normal", "big") )
model |
a 'HGLM' model (object of class |
constraints |
a list of 'HGLM' constraints stored in a list of class |
big_m |
an upper bound for the coefficients, needed for the big-M constraint.
Required to inherit from |
solver |
a character string giving the name of the solver to be used for the estimation. |
control |
a list of control parameters passed to |
dry_run |
a logical; if |
approx |
a logical; if |
object_size |
a character string giving the object size, allowed values
are |
an object of class "hglm.fit"
inheriting from "glm"
.
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) x <- model.matrix(y ~ ., data = dat) model <- hglm_model(x, y = dat[["y"]]) fit <- hglm_fit(model, constraints = k_max(3))
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) x <- model.matrix(y ~ ., data = dat) model <- hglm_model(x, y = dat[["y"]]) fit <- hglm_fit(model, constraints = k_max(3))
Create a HGLM model object.
hglm_model( x, y, family = gaussian(), weights = NULL, frame = NULL, solver = "auto", approx = FALSE )
hglm_model( x, y, family = gaussian(), weights = NULL, frame = NULL, solver = "auto", approx = FALSE )
x |
a numeric matrix giving the design matrix. |
y |
a vector giving the response variables. |
family |
a description of the error distribution and link function to be used in the model. |
weights |
an optional vector of 'prior weights' to be used for the estimation. |
frame |
an optional model frame object. |
solver |
a character string giving the name of the solver to be used for the estimation. |
approx |
a logical; if |
No standardization prior to fitting the model takes place. If a x or y standardization is wanted, the user has to do this beforehand.
An object of class "hglm_model"
.
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) x <- model.matrix(y ~ ., data = dat) hglm_model(x, y = dat[["y"]])
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) x <- model.matrix(y ~ ., data = dat) hglm_model(x, y = dat[["y"]])
hglmc
ObjectsGeneric functions for holistic 'GLM' constraints.
## S3 method for class 'hglmc' c(...) is.hglmc(x)
## S3 method for class 'hglmc' c(...) is.hglmc(x)
... |
multiple objects inheriting from |
x |
an R object. |
The 'HGLM' constraints are all of class "hglmc"
and can be combined with the typical combine function c()
.
To verify that an object is a 'HGLM' constraint, the function
is.hglmc
can be used.
The combine function c()
returns an object of class "hglmc"
.
The is.hglmc
function returns TRUE
if the object inherits
from class "hglmc"
otherwise FALSE
.
constraints <- c(k_max(7), pairwise_sign_coherence()) is.hglmc(constraints)
constraints <- c(k_max(7), pairwise_sign_coherence()) is.hglmc(constraints)
Ensures that all covariates specified by vars
coefficients are active.
include(vars)
include(vars)
vars |
an integer vector specifying the indices for covariates which have to be in the model. |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
Other Constraint-Constructors:
group_equal()
,
group_inout()
,
group_sparsity()
,
k_max()
,
linear()
,
lower()
,
pairwise_sign_coherence()
,
rho_max()
,
sign_coherence()
,
upper()
dat <- rhglm(100, c(1, 2, 3, 4, 5, 6)) constraints <- include(vars = c("x1", "x3")) hglm(y ~ ., constraints = constraints, data = dat)
dat <- rhglm(100, c(1, 2, 3, 4, 5, 6)) constraints <- include(vars = c("x1", "x3")) hglm(y ~ ., constraints = constraints, data = dat)
Constraint on the maximum number of covariates to be used in the model.
k_max(k)
k_max(k)
k |
an positive integer with |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
If an intercept is used, the upper bound on is given by number of columns of the model matrix.
If no intercept is used, the upper bound on is given by number of columns of the model matrix.
Other Constraint-Constructors:
group_equal()
,
group_inout()
,
group_sparsity()
,
include()
,
linear()
,
lower()
,
pairwise_sign_coherence()
,
rho_max()
,
sign_coherence()
,
upper()
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) hglm(y ~ ., constraints = k_max(3), data = dat)
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) hglm(y ~ ., constraints = k_max(3), data = dat)
Linear Constraint
linear(L, dir, rhs, on_big_m = FALSE)
linear(L, dir, rhs, on_big_m = FALSE)
L |
a named vector or matrix defining the linear constraints on the coefficients of the covariates. |
dir |
a character vector giving the direction of the linear constraints. |
rhs |
a numeric vector giving the right hand side of the linear constraint. |
on_big_m |
a logical indicating if the constraint should be imposed on the big-M related binary variables. |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
Lawson, C. L., & Hanson, R. J. (1995). Solving least squares problems. Society for Industrial and Applied Mathematics. Society for Industrial and Applied Mathematics. doi:10.1137/1.9781611971217
Other Constraint-Constructors:
group_equal()
,
group_inout()
,
group_sparsity()
,
include()
,
k_max()
,
lower()
,
pairwise_sign_coherence()
,
rho_max()
,
sign_coherence()
,
upper()
# vector constraint beta <- c(1, -2, 3) dat <- rhglm(100, beta) constraints <- c(linear(c(x1 = 2, x2 = 1), "==", 0), rho_max(1)) hglm(y ~ ., data = dat, constraints = constraints) # matrix constraint dat <- rhglm(100, c(1, -2, 3, 4, 5, 6, 7)) mat <- diag(2) colnames(mat) <- c("x1", "x5") constraints <- c(linear(mat, c("==", "=="), c(-1, 3)), rho_max(1)) hglm(y ~ ., data = dat, constraints = constraints)
# vector constraint beta <- c(1, -2, 3) dat <- rhglm(100, beta) constraints <- c(linear(c(x1 = 2, x2 = 1), "==", 0), rho_max(1)) hglm(y ~ ., data = dat, constraints = constraints) # matrix constraint dat <- rhglm(100, c(1, -2, 3, 4, 5, 6, 7)) mat <- diag(2) colnames(mat) <- c("x1", "x5") constraints <- c(linear(mat, c("==", "=="), c(-1, 3)), rho_max(1)) hglm(y ~ ., data = dat, constraints = constraints)
Set a lower bound on the coefficients of specific covariates.
lower(kvars)
lower(kvars)
kvars |
a named vector giving the lower bounds. The names should correspond to the names of the covariates in the model matrix. |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
McDonald, J. W., & Diamond, I. D. (1990). On the Fitting of Generalized Linear Models with Nonnegativity Parameter Constraints. Biometrics, 46 (1): 201–206. doi:10.2307/2531643
Slawski, M., & Hein, M. (2013). Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization. Electronic Journal of Statistics, 7: 3004-3056. doi:10.1214/13-EJS868
Other Constraint-Constructors:
group_equal()
,
group_inout()
,
group_sparsity()
,
include()
,
k_max()
,
linear()
,
pairwise_sign_coherence()
,
rho_max()
,
sign_coherence()
,
upper()
set.seed(0) dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) constraints <- lower(c(x2 = 0, x5 = 1)) hglm(y ~ ., constraints = constraints, data = dat) # non-negative least squares dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) constraints <- lower(setNames(double(5), paste0("x", 1:5))) hglm(y ~ ., constraints = constraints, data = dat)
set.seed(0) dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) constraints <- lower(c(x2 = 0, x5 = 1)) hglm(y ~ ., constraints = constraints, data = dat) # non-negative least squares dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) constraints <- lower(setNames(double(5), paste0("x", 1:5))) hglm(y ~ ., constraints = constraints, data = dat)
Ensures that coefficients of covariates which exhibit strong pairwise correlation have a coherent sign.
pairwise_sign_coherence( rho = 0.8, exclude = "(Intercept)", big_m = 100, eps = 1e-06, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs"), method = c("pearson", "kendall", "spearman") )
pairwise_sign_coherence( rho = 0.8, exclude = "(Intercept)", big_m = 100, eps = 1e-06, use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs"), method = c("pearson", "kendall", "spearman") )
rho |
a value in the range [0,1] specifying the maximum allowed collinearity between pairs of covariates. |
exclude |
a character vector giving the names of the covariates to be excluded from the constraint (default is "(Intercept)"). |
big_m |
a double giving the big-M parameter. |
eps |
a double giving the epsilon for the equal sign constraint.
Since most numerical solvers can only handle constraints up to some epsilon,
e.g., the constraint |
use |
an optional character string giving a method for computing
covariances in the presence of missing values.
The parameter is passed to |
method |
a character string indicating which correlation coefficient
is to be computed.
The parameter is passed to |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
Carrizosa, E., Olivares-Nadal, A. V., & Ramírez-Cobo, P. (2020). Integer Constraints for Enhancing Interpretability in Linear Regression. SORT. Statistics and Operations Research Transactions, 44: 67-98. doi:10.2436/20.8080.02.95.
Other Constraint-Constructors:
group_equal()
,
group_inout()
,
group_sparsity()
,
include()
,
k_max()
,
linear()
,
lower()
,
rho_max()
,
sign_coherence()
,
upper()
constraints <- c(k_max(7), pairwise_sign_coherence())
constraints <- c(k_max(7), pairwise_sign_coherence())
Obtains predictions from a fitted holistic generalized linear model object.
## S3 method for class 'hglm' predict(object, newdata = NULL, type = c("link", "response"), ...)
## S3 method for class 'hglm' predict(object, newdata = NULL, type = c("link", "response"), ...)
object |
a fitted object of class inheriting from "hglm". |
newdata |
an optional data frame containing new observations for which predictions are to be made. If ommitted, the fitted linear predictors are used. |
type |
the type of predictions to be made. Possible values are
|
... |
optional arguments currently ignored. |
A vector of predicted values. If type = "link"
, the predicted
values are in the link scale; if type = "response"
, the predicted
values are in the response scale.
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) fit <- hglm(y ~ ., constraints = k_max(3), data = dat) pred <- predict(fit) pred2 <- predict(fit, newdata=dat)
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) fit <- hglm(y ~ ., constraints = k_max(3), data = dat) pred <- predict(fit) pred2 <- predict(fit, newdata=dat)
A simple data generator for testing and example purposes.
rhglm( n, beta, sigma = diag(length(beta) - 1L), family = gaussian(), truncate_mu = FALSE, as_list = FALSE, ... )
rhglm( n, beta, sigma = diag(length(beta) - 1L), family = gaussian(), truncate_mu = FALSE, as_list = FALSE, ... )
n |
the number of observations to be created. |
beta |
a numeric vector giving the magnitude of the coefficients (the first element is assumed to be the intercept). |
sigma |
a positive-definite symmetric matrix giving the covariance
structure of the covariates (passed to |
family |
the family of the inverse link. |
truncate_mu |
a logical giving if mu should be truncated if necessary. |
as_list |
a logical (default is |
... |
addtional optional parameters. The arguments are passed to the random variables generating function of the response. |
A data.frame
(or list
) containing the generated data.
Other simulation:
cov_matrix()
rhglm(10, 1:5) rhglm(10, 1:5, family = binomial())
rhglm(10, 1:5) rhglm(10, 1:5, family = binomial())
Constraint which ensures that only one covariate out of a pair of covariates with a correlation of at least rho
will be included in the final model.
rho_max( rho = 0.8, exclude = "(Intercept)", use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs"), method = c("pearson", "kendall", "spearman") )
rho_max( rho = 0.8, exclude = "(Intercept)", use = c("everything", "all.obs", "complete.obs", "na.or.complete", "pairwise.complete.obs"), method = c("pearson", "kendall", "spearman") )
rho |
a value in the range [0,1] specifying, the maximum allowed collinearity between pairs of covariates. |
exclude |
variables to be excluded form the pairwise
correlation constraints (default is |
use |
an optional character string giving a method for computing
co-variances in the presence of missing values.
The parameter is passed to |
method |
a character string indicating which correlation coefficient
is to be computed. See |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
Other Constraint-Constructors:
group_equal()
,
group_inout()
,
group_sparsity()
,
include()
,
k_max()
,
linear()
,
lower()
,
pairwise_sign_coherence()
,
sign_coherence()
,
upper()
beta <- 1:3 Sigma <- cov_matrix(k = length(beta) - 1L, 1, 2, 0.9) dat <- rhglm(100, beta, sigma = Sigma) hglm(y ~ ., constraints = rho_max(0.8), data = dat)
beta <- 1:3 Sigma <- cov_matrix(k = length(beta) - 1L, 1, 2, 0.9) dat <- rhglm(100, beta, sigma = Sigma) hglm(y ~ ., constraints = rho_max(0.8), data = dat)
Auxiliary function to scale the linear constraint matrices to be consistent with the scaled model matrix.
scale_constraint_matrix(L, xs, ys = 1)
scale_constraint_matrix(L, xs, ys = 1)
L |
a matrix giving the linear constraints. |
xs |
a vector of length |
ys |
a double giving the scaling of the response. |
Constraint which ensures that the coefficients of the specified covariates have a coherent sign.
sign_coherence(vars, big_m = 100, eps = 1e-06)
sign_coherence(vars, big_m = 100, eps = 1e-06)
vars |
a character vector giving the names of the covariates the constraint should be applied to. |
big_m |
a double giving the big-M parameter. |
eps |
a double giving the epsilon used to ensure that the constraint holds. |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
Carrizosa, E., Olivares-Nadal, A. V., & Ramírez-Cobo, P. (2020). Integer Constraints for Enhancing Interpretability in Linear Regression. SORT. Statistics and Operations Research Transactions, 44: 67-98. doi:10.2436/20.8080.02.95.
Other Constraint-Constructors:
group_equal()
,
group_inout()
,
group_sparsity()
,
include()
,
k_max()
,
linear()
,
lower()
,
pairwise_sign_coherence()
,
rho_max()
,
upper()
dat <- rhglm(100, c(1, -2, 3, 4, 5, 6)) constraints <- sign_coherence(c("x1", "x3")) hglm(y ~ ., constraints = constraints, data = dat)
dat <- rhglm(100, c(1, -2, 3, 4, 5, 6)) constraints <- sign_coherence(c("x1", "x3")) hglm(y ~ ., constraints = constraints, data = dat)
The solution of the underlying optimization problem,
can be accessed via the method 'solution'
.
## S3 method for class 'hglm' solution( x, type = c("primal", "dual", "aux", "psd", "msg", "objval", "status", "status_code"), force = FALSE, ... )
## S3 method for class 'hglm' solution( x, type = c("primal", "dual", "aux", "psd", "msg", "objval", "status", "status_code"), force = FALSE, ... )
x |
an object of type |
type |
a character giving the name of the solution to be extracted. |
force |
a logical to control the return value in the case that the
status code is equal to 1 (i.e. something went wrong).
By default force is |
... |
further arguments passed to or from other methods. |
the extracted solution.
Update the Model Object
update_objective(model, op)
update_objective(model, op)
model |
an object inheriting from |
op |
an |
Set a upper bound on the coefficient of specific covariates.
upper(kvars)
upper(kvars)
kvars |
a named vector giving the upper bounds. The names should correspond to the names of the covariates in the model matrix. |
A holistic generalized model constraint, object inheriting from class "hglmc"
.
McDonald, J. W., & Diamond, I. D. (1990). On the Fitting of Generalized Linear Models with Nonnegativity Parameter Constraints. Biometrics, 46 (1): 201–206. doi:10.2307/2531643
Slawski, M., & Hein, M. (2013). Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization. Electronic Journal of Statistics, 7: 3004-3056. doi:10.1214/13-EJS868
Other Constraint-Constructors:
group_equal()
,
group_inout()
,
group_sparsity()
,
include()
,
k_max()
,
linear()
,
lower()
,
pairwise_sign_coherence()
,
rho_max()
,
sign_coherence()
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) constraints <- upper(c(x1 = 0, x4 = 1)) hglm(y ~ ., constraints = constraints, data = dat)
dat <- rhglm(100, c(1, 2, -3, 4, 5, -6)) constraints <- upper(c(x1 = 0, x4 = 1)) hglm(y ~ ., constraints = constraints, data = dat)