Title: | Plots for Model Sensitivity and Variable Importance |
---|---|
Description: | Draws tornado plots for model sensitivity to univariate changes. Implements methods for many modeling methods including linear models, generalized linear models, survival regression models, and arbitrary machine learning models in the caret package. Also draws variable importance plots. |
Authors: | Rob Carnell [aut, cre] |
Maintainer: | Rob Carnell <[email protected]> |
License: | GPL-3 |
Version: | 0.2.0 |
Built: | 2024-11-01 05:10:56 UTC |
Source: | https://github.com/bertcarnell/tornado |
Generic Importance Plot
importance(model_final, ...)
importance(model_final, ...)
model_final |
a model object |
... |
arguments passed to other methods |
an object of type importance_plot
type |
the type of importance plot |
data |
the importance data required for the plot |
importance.glm
importance.lm
importance.cv.glmnet
importance.survreg
Plot Variable Importance for a GLMNET model
## S3 method for class 'cv.glmnet' importance(model_final, model_data, form, dict = NA, nperm = 500, ...)
## S3 method for class 'cv.glmnet' importance(model_final, model_data, form, dict = NA, nperm = 500, ...)
model_final |
a model object |
model_data |
the data used to fit the model |
form |
the model formula |
dict |
a variable dictionary for plotting |
nperm |
the number of permutations used to calculate the importance |
... |
arguments passed to other methods |
an object of type importance_plot
type |
the type of importance plot |
data |
the importance data required for the plot |
if (requireNamespace("glmnet", quietly = TRUE)) { form <- formula(mpg ~ cyl*wt*hp) mf <- model.frame(form, data = mtcars) mm <- model.matrix(mf, mf) gtest <- glmnet::cv.glmnet(x = mm, y = mtcars$mpg, family = "gaussian") imp <- importance(gtest, mtcars, form, nperm = 50) plot(imp) }
if (requireNamespace("glmnet", quietly = TRUE)) { form <- formula(mpg ~ cyl*wt*hp) mf <- model.frame(form, data = mtcars) mm <- model.matrix(mf, mf) gtest <- glmnet::cv.glmnet(x = mm, y = mtcars$mpg, family = "gaussian") imp <- importance(gtest, mtcars, form, nperm = 50) plot(imp) }
GLM variable importance plot
## S3 method for class 'glm' importance(model_final, model_null, dict = NA, ...)
## S3 method for class 'glm' importance(model_final, model_null, dict = NA, ...)
model_final |
a model object |
model_null |
a glm object for the null model |
dict |
a dictionary to translate the model variables to plotting variables |
... |
arguments passed to other methods |
an object of type importance_plot
type |
the type of importance plot |
data |
the importance data required for the plot |
gtest <- glm(mpg ~ cyl*wt*hp + gear + carb, data=mtcars, family=gaussian) gtestreduced <- glm(mpg ~ 1, data=mtcars, family=gaussian) imp <- importance(gtest, gtestreduced) plot(imp) gtest <- glm(mpg ~ cyl + wt + hp + gear + carb, data=mtcars, family=gaussian) gtestreduced <- glm(mpg ~ 1, data=mtcars, family=gaussian) imp <- importance(gtest, gtestreduced) plot(imp) gtest <- glm(vs ~ wt + disp + gear, data=mtcars, family=binomial(link="logit")) gtestreduced <- glm(vs ~ 1, data=mtcars, family=binomial(link="logit")) imp <- importance(gtest, gtestreduced) plot(imp)
gtest <- glm(mpg ~ cyl*wt*hp + gear + carb, data=mtcars, family=gaussian) gtestreduced <- glm(mpg ~ 1, data=mtcars, family=gaussian) imp <- importance(gtest, gtestreduced) plot(imp) gtest <- glm(mpg ~ cyl + wt + hp + gear + carb, data=mtcars, family=gaussian) gtestreduced <- glm(mpg ~ 1, data=mtcars, family=gaussian) imp <- importance(gtest, gtestreduced) plot(imp) gtest <- glm(vs ~ wt + disp + gear, data=mtcars, family=binomial(link="logit")) gtestreduced <- glm(vs ~ 1, data=mtcars, family=binomial(link="logit")) imp <- importance(gtest, gtestreduced) plot(imp)
Linear Model variable importance plot
## S3 method for class 'lm' importance(model_final, model_null, dict = NA, ...)
## S3 method for class 'lm' importance(model_final, model_null, dict = NA, ...)
model_final |
a model object |
model_null |
a |
dict |
a dictionary to translate the model variables to plotting variables |
... |
arguments passed to other methods |
an object of type importance_plot
type |
the type of importance plot |
data |
the importance data required for the plot |
gtest <- lm(mpg ~ cyl*wt*hp + gear + carb, data=mtcars) gtestreduced <- lm(mpg ~ 1, data=mtcars) imp <- importance(gtest, gtestreduced) plot(imp) gtest <- lm(mpg ~ cyl + wt + hp + gear + carb, data=mtcars) gtestreduced <- lm(mpg ~ 1, data=mtcars) imp <- importance(gtest, gtestreduced) plot(imp)
gtest <- lm(mpg ~ cyl*wt*hp + gear + carb, data=mtcars) gtestreduced <- lm(mpg ~ 1, data=mtcars) imp <- importance(gtest, gtestreduced) plot(imp) gtest <- lm(mpg ~ cyl + wt + hp + gear + carb, data=mtcars) gtestreduced <- lm(mpg ~ 1, data=mtcars) imp <- importance(gtest, gtestreduced) plot(imp)
Create a variable importance plot for a survreg model
## S3 method for class 'survreg' importance(model_final, model_data, dict = NA, nperm = 500, ...)
## S3 method for class 'survreg' importance(model_final, model_data, dict = NA, nperm = 500, ...)
model_final |
a model object |
model_data |
the data used to fit the model |
dict |
a plotting dictionary for models terms |
nperm |
the number of permutations used to calculate the importance |
... |
arguments passed to other methods |
an object of type importance_plot
type |
the type of importance plot |
data |
the importance data required for the plot |
model_final <- survival::survreg(survival::Surv(futime, fustat) ~ ecog.ps*rx + age, data = survival::ovarian, dist = "weibull") imp <- importance(model_final, survival::ovarian, nperm = 50) plot(imp)
model_final <- survival::survreg(survival::Surv(futime, fustat) ~ ecog.ps*rx + age, data = survival::ovarian, dist = "weibull") imp <- importance(model_final, survival::ovarian, nperm = 50) plot(imp)
Importance Plot for the caret::train objects
## S3 method for class 'train' importance(model_final, ...)
## S3 method for class 'train' importance(model_final, ...)
model_final |
a model object |
... |
arguments passed to other methods |
an object of type importance_plot
type |
the type of importance plot |
data |
the importance data required for the plot |
if (requireNamespace("caret", quietly = TRUE) & requireNamespace("randomForest", quietly = TRUE)) { model_final <- caret::train(x = subset(mtcars, select = -mpg), y = mtcars$mpg, method = "rf") imp <- importance(model_final) plot(imp) }
if (requireNamespace("caret", quietly = TRUE) & requireNamespace("randomForest", quietly = TRUE)) { model_final <- caret::train(x = subset(mtcars, select = -mpg), y = mtcars$mpg, method = "rf") imp <- importance(model_final) plot(imp) }
Plot an Importance Plot object
## S3 method for class 'importance_plot' plot( x, plot = TRUE, nvar = NA, col_imp_alone = "#69BE28", col_imp_cumulative = "#427730", geom_bar_control = list(fill = "#69BE28"), ... )
## S3 method for class 'importance_plot' plot( x, plot = TRUE, nvar = NA, col_imp_alone = "#69BE28", col_imp_cumulative = "#427730", geom_bar_control = list(fill = "#69BE28"), ... )
x |
a |
plot |
boolean to determine if the plot is displayed, or just returned |
nvar |
the number of variables to plot in order of importance |
col_imp_alone |
the color used for the variance explained by each variable alone |
col_imp_cumulative |
the color used for the cumulative variance explained |
geom_bar_control |
list of arguments to control the plotting of |
... |
future arguments |
the plot
gtest <- lm(mpg ~ cyl + wt + hp + gear + carb, data = mtcars) gtestreduced <- lm(mpg ~ 1, data = mtcars) imp <- importance(gtest, gtestreduced) plot(imp) gtest <- survival::survreg(survival::Surv(futime, fustat) ~ ecog.ps*rx + age, data = survival::ovarian, dist = "weibull") imp <- importance(gtest, survival::ovarian, nperm = 50) plot(imp)
gtest <- lm(mpg ~ cyl + wt + hp + gear + carb, data = mtcars) gtestreduced <- lm(mpg ~ 1, data = mtcars) imp <- importance(gtest, gtestreduced) plot(imp) gtest <- survival::survreg(survival::Surv(futime, fustat) ~ ecog.ps*rx + age, data = survival::ovarian, dist = "weibull") imp <- importance(gtest, survival::ovarian, nperm = 50) plot(imp)
Plot a Tornado Plot object
## S3 method for class 'tornado_plot' plot( x, plot = TRUE, nvar = NA, xlabel = "Model Response", sensitivity_colors = c("grey", "#69BE28"), geom_bar_control = list(width = NULL), geom_point_control = list(fill = "black", col = "black"), ... )
## S3 method for class 'tornado_plot' plot( x, plot = TRUE, nvar = NA, xlabel = "Model Response", sensitivity_colors = c("grey", "#69BE28"), geom_bar_control = list(width = NULL), geom_point_control = list(fill = "black", col = "black"), ... )
x |
a |
plot |
boolean to determine if the plot is displayed, or just returned |
nvar |
the number of variables to plot |
xlabel |
a label for the x-axis |
sensitivity_colors |
a two element character vector of the bar colors for a lower value and upper value |
geom_bar_control |
a list of |
geom_point_control |
a list of |
... |
future arguments |
the plot
gtest <- lm(mpg ~ cyl*wt*hp, data = mtcars) tp <- tornado(gtest, type = "PercentChange", alpha = 0.10, xlabel = "MPG") plot(tp)
gtest <- lm(mpg ~ cyl*wt*hp, data = mtcars) tp <- tornado(gtest, type = "PercentChange", alpha = 0.10, xlabel = "MPG") plot(tp)
importance_plot
print data in an importance_plot
## S3 method for class 'importance_plot' print(x, ...)
## S3 method for class 'importance_plot' print(x, ...)
x |
the object to be printed |
... |
further arguments passed to |
gtest <- glm(vs ~ wt + disp + gear, data=mtcars, family=binomial(link="logit")) gtestreduced <- glm(vs ~ 1, data=mtcars, family=binomial(link="logit")) g <- importance(gtest, gtestreduced) print(g)
gtest <- glm(vs ~ wt + disp + gear, data=mtcars, family=binomial(link="logit")) gtestreduced <- glm(vs ~ 1, data=mtcars, family=binomial(link="logit")) g <- importance(gtest, gtestreduced) print(g)
tornado_plot
print data in a tornado_plot
## S3 method for class 'tornado_plot' print(x, ...)
## S3 method for class 'tornado_plot' print(x, ...)
x |
the object to be printed |
... |
further arguments passed to |
gtest <- lm(mpg ~ cyl*wt*hp, data = mtcars) tp <- tornado(gtest, type = "PercentChange", alpha = 0.10, xlabel = "MPG") print(tp)
gtest <- lm(mpg ~ cyl*wt*hp, data = mtcars) tp <- tornado(gtest, type = "PercentChange", alpha = 0.10, xlabel = "MPG") print(tp)
Quantile for Ordered Factors
## S3 method for class 'ordered' quantile(x, probs = seq(0, 1, 0.25), ...)
## S3 method for class 'ordered' quantile(x, probs = seq(0, 1, 0.25), ...)
x |
an ordered factor |
probs |
the desired quatiles |
... |
arugments passed on |
ordered factor levels at the desired quantiles
quantile(ordered(rep(c("C","B","A"), each=30), levels=c("C","B","A")), probs <- seq(0, 1, 0.25))
quantile(ordered(rep(c("C","B","A"), each=30), levels=c("C","B","A")), probs <- seq(0, 1, 0.25))
A tornado plot is a visualization of the range of outputs expected from a variety of inputs, or alternatively, the sensitivity of the output to the range of inputs. The center of the tornado is plotted at the response expected from the mean of each input variable. For a given variable, the width of the tornado is determined by the range of the variable, a multiplicative factor of the variable, or a quantile of the variable. Variables are ordered vertically with the widest bar at the top and narrowest at the bottom. Only one variable is moved from its mean value at a time. Factors or categorical variables have also been added to these plots by plotting dots at the resulting output as each factor is varied through all of its levels. The base factor level is chosen as the input variable for the center of the tornado.
tornado(model, type, alpha, dict, ...)
tornado(model, type, alpha, dict, ...)
model |
a model object |
type |
|
alpha |
the level of change, the percentile level, or the number of standard deviations |
dict |
a dictionary to translate variables for the plot. The dictionary
must be a list or data.frame with elements |
... |
further arguments, not used |
a tornado_plot
object
type |
the type of tornado plot |
data |
the data required for the plot |
family |
the model family if available |
tornado.lm
, tornado.glm
, tornado.cv.glmnet
, tornado.survreg
, tornado.coxph
, tornado.train
A tornado plot is a visualization of the range of outputs expected from a variety of inputs, or alternatively, the sensitivity of the output to the range of inputs. The center of the tornado is plotted at the response expected from the mean of each input variable. For a given variable, the width of the tornado is determined by the range of the variable, a multiplicative factor of the variable, or a quantile of the variable. Variables are ordered vertically with the widest bar at the top and narrowest at the bottom. Only one variable is moved from its mean value at a time. Factors or categorical variables have also been added to these plots by plotting dots at the resulting output as each factor is varied through all of its levels. The base factor level is chosen as the input variable for the center of the tornado.
## S3 method for class 'coxph' tornado(model, type = "PercentChange", alpha = 0.1, dict = NA, modeldata, ...)
## S3 method for class 'coxph' tornado(model, type = "PercentChange", alpha = 0.1, dict = NA, modeldata, ...)
model |
a model object |
type |
|
alpha |
the level of change, the percentile level, or the number of standard deviations |
dict |
a dictionary to translate variables for the plot. The dictionary
must be a list or data.frame with elements |
modeldata |
the data used to fit the model |
... |
further arguments, not used |
a tornado_plot
object
type |
the type of tornado plot |
data |
the data required for the plot |
family |
the model family if available |
gtest <- survival::coxph(survival::Surv(stop, event) ~ rx + size + number, survival::bladder) torn <- tornado(gtest, modeldata = survival::bladder, type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "Risk")
gtest <- survival::coxph(survival::Surv(stop, event) ~ rx + size + number, survival::bladder) torn <- tornado(gtest, modeldata = survival::bladder, type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "Risk")
A tornado plot is a visualization of the range of outputs expected from a variety of inputs, or alternatively, the sensitivity of the output to the range of inputs. The center of the tornado is plotted at the response expected from the mean of each input variable. For a given variable, the width of the tornado is determined by the range of the variable, a multiplicative factor of the variable, or a quantile of the variable. Variables are ordered vertically with the widest bar at the top and narrowest at the bottom. Only one variable is moved from its mean value at a time. Factors or categorical variables have also been added to these plots by plotting dots at the resulting output as each factor is varied through all of its levels. The base factor level is chosen as the input variable for the center of the tornado.
## S3 method for class 'cv.glmnet' tornado( model, type = "PercentChange", alpha = 0.1, dict = NA, modeldata, form, s = "lambda.1se", ... )
## S3 method for class 'cv.glmnet' tornado( model, type = "PercentChange", alpha = 0.1, dict = NA, modeldata, form, s = "lambda.1se", ... )
model |
a model object |
type |
|
alpha |
the level of change, the percentile level, or the number of standard deviations |
dict |
a dictionary to translate variables for the plot. The dictionary
must be a list or data.frame with elements |
modeldata |
the raw data used to fit the glmnet model |
form |
the model formula |
s |
Value(s) of the penalty parameter |
... |
further arguments, not used |
a tornado_plot
object
type |
the type of tornado plot |
data |
the data required for the plot |
family |
the model family if available |
if (requireNamespace("glmnet", quietly = TRUE)) { form <- formula(mpg ~ cyl*wt*hp) mf <- model.frame(form, data = mtcars) mm <- model.matrix(form, data = mf) gtest <- glmnet::cv.glmnet(x = mm, y= mtcars$mpg, family = "gaussian") torn <- tornado(gtest, modeldata = mtcars, form = formula(mpg ~ cyl*wt*hp), s = "lambda.1se", type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "MPG") }
if (requireNamespace("glmnet", quietly = TRUE)) { form <- formula(mpg ~ cyl*wt*hp) mf <- model.frame(form, data = mtcars) mm <- model.matrix(form, data = mf) gtest <- glmnet::cv.glmnet(x = mm, y= mtcars$mpg, family = "gaussian") torn <- tornado(gtest, modeldata = mtcars, form = formula(mpg ~ cyl*wt*hp), s = "lambda.1se", type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "MPG") }
A tornado plot is a visualization of the range of outputs expected from a variety of inputs, or alternatively, the sensitivity of the output to the range of inputs. The center of the tornado is plotted at the response expected from the mean of each input variable. For a given variable, the width of the tornado is determined by the range of the variable, a multiplicative factor of the variable, or a quantile of the variable. Variables are ordered vertically with the widest bar at the top and narrowest at the bottom. Only one variable is moved from its mean value at a time. Factors or categorical variables have also been added to these plots by plotting dots at the resulting output as each factor is varied through all of its levels. The base factor level is chosen as the input variable for the center of the tornado.
## S3 method for class 'glm' tornado(model, type = "PercentChange", alpha = 0.1, dict = NA, ...)
## S3 method for class 'glm' tornado(model, type = "PercentChange", alpha = 0.1, dict = NA, ...)
model |
a model object |
type |
|
alpha |
the level of change, the percentile level, or the number of standard deviations |
dict |
a dictionary to translate variables for the plot. The dictionary
must be a list or data.frame with elements |
... |
further arguments, not used |
a tornado_plot
object
type |
the type of tornado plot |
data |
the data required for the plot |
family |
the model family if available |
gtest <- glm(mpg ~ cyl*wt*hp, data = mtcars, family = gaussian) torn <- tornado(gtest, type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "MPG")
gtest <- glm(mpg ~ cyl*wt*hp, data = mtcars, family = gaussian) torn <- tornado(gtest, type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "MPG")
A tornado plot is a visualization of the range of outputs expected from a variety of inputs, or alternatively, the sensitivity of the output to the range of inputs. The center of the tornado is plotted at the response expected from the mean of each input variable. For a given variable, the width of the tornado is determined by the range of the variable, a multiplicative factor of the variable, or a quantile of the variable. Variables are ordered vertically with the widest bar at the top and narrowest at the bottom. Only one variable is moved from its mean value at a time. Factors or categorical variables have also been added to these plots by plotting dots at the resulting output as each factor is varied through all of its levels. The base factor level is chosen as the input variable for the center of the tornado.
## S3 method for class 'lm' tornado(model, type = "PercentChange", alpha = 0.1, dict = NA, ...)
## S3 method for class 'lm' tornado(model, type = "PercentChange", alpha = 0.1, dict = NA, ...)
model |
a model object |
type |
|
alpha |
the level of change, the percentile level, or the number of standard deviations |
dict |
a dictionary to translate variables for the plot. The dictionary
must be a list or data.frame with elements |
... |
further arguments, not used |
a tornado_plot
object
type |
the type of tornado plot |
data |
the data required for the plot |
family |
the model family if available |
gtest <- lm(mpg ~ cyl*wt*hp, data = mtcars) torn <- tornado(gtest, type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "MPG")
gtest <- lm(mpg ~ cyl*wt*hp, data = mtcars) torn <- tornado(gtest, type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "MPG")
A tornado plot is a visualization of the range of outputs expected from a variety of inputs, or alternatively, the sensitivity of the output to the range of inputs. The center of the tornado is plotted at the response expected from the mean of each input variable. For a given variable, the width of the tornado is determined by the range of the variable, a multiplicative factor of the variable, or a quantile of the variable. Variables are ordered vertically with the widest bar at the top and narrowest at the bottom. Only one variable is moved from its mean value at a time. Factors or categorical variables have also been added to these plots by plotting dots at the resulting output as each factor is varied through all of its levels. The base factor level is chosen as the input variable for the center of the tornado.
## S3 method for class 'survreg' tornado(model, type = "PercentChange", alpha = 0.1, dict = NA, modeldata, ...)
## S3 method for class 'survreg' tornado(model, type = "PercentChange", alpha = 0.1, dict = NA, modeldata, ...)
model |
a model object |
type |
|
alpha |
the level of change, the percentile level, or the number of standard deviations |
dict |
a dictionary to translate variables for the plot. The dictionary
must be a list or data.frame with elements |
modeldata |
the data used to fit the model |
... |
further arguments, not used |
a tornado_plot
object
type |
the type of tornado plot |
data |
the data required for the plot |
family |
the model family if available |
gtest <- survival::survreg(survival::Surv(futime, fustat) ~ ecog.ps + rx, survival::ovarian, dist='weibull', scale=1) torn <- tornado(gtest, modeldata = survival::ovarian, type = "PercentChange", alpha = 0.10, xlabel = "futime") plot(torn, xlabel = "Survival Time")
gtest <- survival::survreg(survival::Surv(futime, fustat) ~ ecog.ps + rx, survival::ovarian, dist='weibull', scale=1) torn <- tornado(gtest, modeldata = survival::ovarian, type = "PercentChange", alpha = 0.10, xlabel = "futime") plot(torn, xlabel = "Survival Time")
A tornado plot is a visualization of the range of outputs expected from a variety of inputs, or alternatively, the sensitivity of the output to the range of inputs. The center of the tornado is plotted at the response expected from the mean of each input variable. For a given variable, the width of the tornado is determined by the range of the variable, a multiplicative factor of the variable, or a quantile of the variable. Variables are ordered vertically with the widest bar at the top and narrowest at the bottom. Only one variable is moved from its mean value at a time. Factors or categorical variables have also been added to these plots by plotting dots at the resulting output as each factor is varied through all of its levels. The base factor level is chosen as the input variable for the center of the tornado.
## S3 method for class 'train' tornado( model, type = "PercentChange", alpha = 0.1, dict = NA, class_number = NA, ... )
## S3 method for class 'train' tornado( model, type = "PercentChange", alpha = 0.1, dict = NA, class_number = NA, ... )
model |
a model object |
type |
|
alpha |
the level of change, the percentile level, or the number of standard deviations |
dict |
a dictionary to translate variables for the plot. The dictionary
must be a list or data.frame with elements |
class_number |
for classification models, which number of the class that will be plotted |
... |
further arguments, not used |
a tornado_plot
object
type |
the type of tornado plot |
data |
the data required for the plot |
family |
the model family if available |
if (requireNamespace("caret", quietly = TRUE) & requireNamespace("randomForest", quietly = TRUE)) { gtest <- caret::train(x = subset(mtcars, select = -mpg), y = mtcars$mpg, method = "rf") torn <- tornado(gtest, type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "MPG") }
if (requireNamespace("caret", quietly = TRUE) & requireNamespace("randomForest", quietly = TRUE)) { gtest <- caret::train(x = subset(mtcars, select = -mpg), y = mtcars$mpg, method = "rf") torn <- tornado(gtest, type = "PercentChange", alpha = 0.10) plot(torn, xlabel = "MPG") }