Fits regularized logistic regression models of tree mortality for a single species, returning the model object, its growth predictions, its coefficient of determination when applied to the training set (and optionally, a user-provided test set), and the fitted model coefficients.
mortality_model( training, outcome_var, iterations = 1, rare_comps = "none", density_suffix = "none", test = NULL )
training | A dataframe containing a separate row for each focal x neighbor tree pair and a column for each variable to include in the model, including the outcome variable. Outcome variable must be numeric, with value 1 indicating 'dead' and value 0 indicating 'alive'. Other variables can be of any type, including character. Any character variables will be converted to factors before fitting the model. |
---|---|
outcome_var | Name of column in |
iterations | Number of times the model should be fitted. |
rare_comps | Minimum number of interactions a competitor species must appear in to remain separate from the "RARE" category (see details). If not specified, no "RARE" category will be created and all competitor species will remain separate. |
density_suffix | Suffix of columns containing species-specific densities
e.g. if density columns are of the form |
test | An optional dataframe of test data that must be in exactly the same format as the training data i.e. all the same columns with the same names. |
A list containing four or five elements:
mod
is a fitted glmnet model
object - if iterations > 1
this will be the model with the lowest
cross-validated mean square error
obs_pred
is a dataframe containing
observed and predicted growth - if iterations > 1
this will
correspond to the best model
R_squared
is the coefficient of
determination - if iterations > 1
this will correspond to the best
model
test_R_squared
is the coefficient of determination of the best
model on the test data (this element will not appear if no test data are
provided)
mod_coef
is a data frame containing the fitted coefficients,
cross-validated mean square error (mse
), and coefficient of
determination for each fitted model with rows in ascending order of
mse
All variables in the user-provided training data other than tree_id and the
indicated outcome variable are included as explanatory variables in the
model. The regularized regression model is fitted using the glmnet package.
Predictions, R-squared, and model coefficients returned are all based on the
"lambda.1se"
model, but all models with different lambda values are
included in the returned mod object - see glmnet documentation for details.
As the glmnet model fitting process is stochastic, the fitted model can
differ with each run. The iterations
argument allows the user to
specify how many times the model should be fitted. If the model is fitted
more than once, the fitted coefficients for all models will be returned
but only the model object for the best model (lowest cross-validated mean
square error) will be returned.
Rare competitor species can be grouped together as "RARE" for modeling if
desired. The optional argument rare_comps
is a number indicating the
minimum number of interactions a species must appear in to remain separate
from the "RARE" category. If handling of rare competitor species is requested
and species-specific densities are included in training
, the densities
of rare species can be summed together under "RARE_density" using the
optional argument density_suffix
.
# See vignette "Modeling tree growth and mortality"