Fits regularized logistic regression models of tree mortality for a single species, returning the model object, its growth predictions, its coefficient of determination when applied to the training set (and optionally, a user-provided test set), and the fitted model coefficients.

mortality_model(
  training,
  outcome_var,
  iterations = 1,
  rare_comps = "none",
  density_suffix = "none",
  test = NULL
)

Arguments

training

A dataframe containing a separate row for each focal x neighbor tree pair and a column for each variable to include in the model, including the outcome variable. Outcome variable must be numeric, with value 1 indicating 'dead' and value 0 indicating 'alive'. Other variables can be of any type, including character. Any character variables will be converted to factors before fitting the model.

outcome_var

Name of column in training containing the outcome variable, provided as a string.

iterations

Number of times the model should be fitted.

rare_comps

Minimum number of interactions a competitor species must appear in to remain separate from the "RARE" category (see details). If not specified, no "RARE" category will be created and all competitor species will remain separate.

density_suffix

Suffix of columns containing species-specific densities e.g. if density columns are of the form speciesA_dens, this argument should have the value "_dens". This optional argument is only used if the argument rare_comps is specified.

test

An optional dataframe of test data that must be in exactly the same format as the training data i.e. all the same columns with the same names.

Value

A list containing four or five elements:

  • mod is a fitted glmnet model object - if iterations > 1 this will be the model with the lowest cross-validated mean square error

  • obs_pred is a dataframe containing observed and predicted growth - if iterations > 1 this will correspond to the best model

  • R_squared is the coefficient of determination - if iterations > 1 this will correspond to the best model

  • test_R_squared is the coefficient of determination of the best model on the test data (this element will not appear if no test data are provided)

  • mod_coef is a data frame containing the fitted coefficients, cross-validated mean square error (mse), and coefficient of determination for each fitted model with rows in ascending order of mse

Details

All variables in the user-provided training data other than tree_id and the indicated outcome variable are included as explanatory variables in the model. The regularized regression model is fitted using the glmnet package. Predictions, R-squared, and model coefficients returned are all based on the "lambda.1se" model, but all models with different lambda values are included in the returned mod object - see glmnet documentation for details.

As the glmnet model fitting process is stochastic, the fitted model can differ with each run. The iterations argument allows the user to specify how many times the model should be fitted. If the model is fitted more than once, the fitted coefficients for all models will be returned but only the model object for the best model (lowest cross-validated mean square error) will be returned.

Rare competitor species can be grouped together as "RARE" for modeling if desired. The optional argument rare_comps is a number indicating the minimum number of interactions a species must appear in to remain separate from the "RARE" category. If handling of rare competitor species is requested and species-specific densities are included in training, the densities of rare species can be summed together under "RARE_density" using the optional argument density_suffix.

Examples

# See vignette "Modeling tree growth and mortality"