This vignette demonstrates how the neighborhood description and growth calculation functions of forestexplorR can be used to setup a neighborhood model of tree growth or a neighborhood model of tree mortality.
To build a neighborhood model of tree growth, we first need to describe tree neighborhoods. The forestexplorR functions neighborhoods()
and neighborhood_summary()
make this very simple.
# Construct neighborhoods for all trees in all stands
nbhds <- neighborhoods(mapping, radius = 10)
# Describe neighborhoods using angular species-specific densities
nbhd_summ <- neighborhood_summary(nbhds, id_column = "tree_id", radius = 10,
densities = "angular")
# Combine neighborhoods with their summaries
nbhds <- nbhds %>%
left_join(nbhd_summ %>% select(-c(shannon, simpson, evenness)),
by = "tree_id")
We have now summarized the neighborhoods of all trees, but our summaries are inaccurate for trees whose neighborhood overlaps a stand boundary i.e. we did not sample their entire neighborhood. Rather than estimating the missing portions of these trees’ neighborhoods, we will exclude them from the model.
# Define neighborhood radius
nb_rad <- 10
# Remove trees whose neighborhood overlaps a stand boundary
nbhds <- nbhds %>%
filter(x_coord >= nb_rad & x_coord <= 100 - nb_rad &
y_coord >= nb_rad & y_coord <= 100 - nb_rad)
Next we calculate annual growth for our trees. We will also remove any trees that were only measured once (because we cannot calculate a growth rate for them) or had a nonsensical negative growth rate.
# Calculate annual growth for all trees
growth <- growth_summary(tree)
#> [1] "Warning: 139 trees exhibited negative annual growth"
# Remove trees that were only measured once and/or had negative growth
growth <- growth %>%
filter(growth$first_record != growth$last_record &
annual_growth >= 0)
Now we can add the growth rates to the neighborhoods data. For our model, we will use size_corr_growth
because this is the closest to being normally distributed. When combining we use inner_join()
because not all trees in nbhds
have growth data e.g. trees measured only one time.
nbhds <- nbhds %>%
inner_join(growth %>% select(tree_id, size_corr_growth),
by = "tree_id")
We can also include additional covariates in our model such as abiotic data. In fact we recommend that for models including data from multiple mapped stands, additional explanatory variables that account for the major axes of variation among the plots are included to account for potential spatial autocorrelation. The code below adds abiotic data to nbhds
from the built-in dataset stand_abiotic
.
nbhds <- nbhds %>%
left_join(stand_abiotic, by = "stand_id")
Finally, we can drop some of the columns from nbhds
that we don’t need for our model.
Now we have a dataset containing all the variables we will use for this modeling demonstration.
growth_model()
: Create a neighborhood tree growth modelWhen modeling tree growth, it is common practice to create a separate model for each focal tree species because the drivers of growth are likely to differ among species. The code below isolates a single tree species and then removes the species variable because it is no longer useful.
The growth_model()
function also offers test set validation, so next we will divide our dataset into a training and test set.
# Create vector of tree ids
all_tree_ids <- unique(one_species$tree_id)
# Assign 80% of trees to training
training_trees <- sample(all_tree_ids, length(all_tree_ids) * 0.8)
# Define training and test
training <- one_species %>%
filter(tree_id %in% training_trees)
test <- setdiff(one_species, training)
Now we can call the growth_model()
function, which fits a linear regularized regression model of tree growth. In the most basic use case, this function requires only a training set and the name of the dependent variable, which should be a column in the training set. We set the seed because the model fitting process is stochastic.
set.seed(1000)
basic_growth_mod <- growth_model(training, outcome_var = "size_corr_growth")
The output of the above code is a list containing 4 elements. The first element mod
is the model object. The second element obs_pred
is a data frame of observed and model predicted growth for each tree. The third element is the R^2 value of the model. The fourth element is the estimated coefficients.
# R^2 value
basic_growth_mod$R_squared
#> [1] 0.2102422
# Observations and model predictions
basic_growth_mod$obs_pred
#> # A tibble: 1,804 x 3
#> tree_id observations predictions
#> <chr> <dbl> <dbl>
#> 1 AB08000300012 0.165 0.138
#> 2 AB08000300013 0.170 0.128
#> 3 AB08000300014 0.129 0.125
#> 4 AB08000300021 0.160 0.138
#> # ... with 1,800 more rows
The evaluate the robustness of the stochastic model fitting process there is an optional argument (iterations
) that specifies how many times the model should be fit. When this argument is specified, the model object, r-squared value and table of observations and predictions refer to the model that had the lowest mean square error. However, the mod_coef
element will contain the fitted coefficients for all fitted models.
set.seed(1000)
iter_growth_mod <- growth_model(training, outcome_var = "size_corr_growth",
iterations = 10)
iter_growth_mod$R_squared
#> [1] 0.2116699
The growth_model()
function can also handle rare competitor tree species. The optional argument rare_comps
specifies the minimum number of interactions a competitor species must be a part of to be considered a common competitor. Any species with fewer interactions will be grouped together under the new species identity of "RARE"
. If species-specific densities are included as covariates in the training data, a "RARE_density"
variable can also be created by indicating the suffix attached to density variables under the argument density_suffix
.
set.seed(1000)
rare_growth_mod <- growth_model(training, outcome_var = "size_corr_growth",
rare_comps = 100,
density_suffix = "_angle_sum")
# Check for RARE in coefficients
names(rare_growth_mod$mod_coef)
#> [1] "(Intercept)" "prox" "sps_compABAM" "sps_compCANO"
#> [5] "sps_compPSME" "sps_compRARE" "sps_compTABR" "sps_compTHPL"
#> [9] "sps_compTSHE" "sps_compTSME" "dbh_comp" "sps_richness"
#> [13] "ABAM_angle_sum" "ALRU_angle_sum" "CANO_angle_sum" "PISI_angle_sum"
#> [17] "PSME_angle_sum" "TABR_angle_sum" "THPL_angle_sum" "TSHE_angle_sum"
#> [21] "TSME_angle_sum" "all_angle_sum" "precip_mm" "temp_C"
#> [25] "elev_m" "aet_mm" "pet_mm" "RARE_density"
#> [29] "mse" "R_squared"
Finally, the growth_model()
function can evaluate the ability of the fitted model to predict the growth of trees in a test set i.e. trees not used in the model fitting process. To do this, a test set must be provided under the optional argument test
. Predictive ability is quantified with an R^2 value that constitutes a new element in the output list.
set.seed(1000)
test_growth_mod <- growth_model(training, outcome_var = "size_corr_growth",
test = test)
# Ability to predict test data
test_growth_mod$test_R_squared
#> [1] 0.1507348
mortality_model()
: Create a neighborhood tree mortality modelThe data formatting for a neighborhood mortality model is very similar to that for a neighborhood growth model. First we can calculate our neighborhood summaries and remove trees whose neighborhood overlaps the stand boundary:
# Construct neighborhoods for all trees in all stands
nbhds <- neighborhoods(mapping, radius = 10)
# Describe neighborhoods using angular species-specific densities
nbhd_summ <- neighborhood_summary(nbhds, id_column = "tree_id", radius = 10,
densities = "angular")
# Combine neighborhoods with their summaries
nbhds <- nbhds %>%
left_join(nbhd_summ %>% select(-c(shannon, simpson, evenness)),
by = "tree_id")
# Define neighborhood radius
nb_rad <- 10
# Remove trees whose neighborhood overlaps a stand boundary
nbhds <- nbhds %>%
filter(x_coord >= nb_rad & x_coord <= 100 - nb_rad &
y_coord >= nb_rad & y_coord <= 100 - nb_rad)
Next we extract the mortality data for each tree from our tree measurement data frame and join it to our neighborhoods data:
mortality <- tree %>%
group_by(tree_id) %>%
summarize(mort = max(mort))
nbhds <- nbhds %>%
inner_join(mortality, by = "tree_id")
Then we add the stand abiotic data and remove the columns we don’t want to include as model covariates:
nbhds <- nbhds %>%
left_join(stand_abiotic, by = "stand_id")
nbhds <- nbhds %>%
select(-c(stand_id, dbh, abh, x_coord, y_coord, id_comp, abh_comp))
Now we can subset to a single focal species:
For this example we will not use test set validation, so we can now fit the mortality model:
set.seed(1000)
basic_mort_mod <- mortality_model(one_species, outcome_var = "mort")
Here is the model output:
# R^2 value
basic_mort_mod$R_squared
#> [1] -0.1733919
# Fitted coefficients
t(basic_mort_mod$mod_coef)
#> 1
#> (Intercept) -1.23106822
#> prox 0.00000000
#> sps_compABAM 0.00000000
#> sps_compABGR 0.00000000
#> sps_compABLA 0.00000000
#> sps_compABPR 0.00000000
#> sps_compALSI 0.00000000
#> sps_compALVI 0.00000000
#> sps_compCANO 0.01702032
#> sps_compPICO 0.00000000
#> sps_compPIEN 0.00000000
#> sps_compPIMO 0.00000000
#> sps_compPOBA 0.00000000
#> sps_compPSME 0.00000000
#> sps_compTABR 0.00000000
#> sps_compTHPL 0.00000000
#> sps_compTSHE 0.00000000
#> sps_compTSME 0.00000000
#> dbh_comp 0.00000000
#> sps_richness 0.00000000
#> ABAM_angle_sum 0.29046254
#> ABGR_angle_sum -0.02033387
#> ABLA_angle_sum 0.01281999
#> ABPR_angle_sum 0.00000000
#> ALRU_angle_sum 0.00000000
#> ALSI_angle_sum 0.00000000
#> ALVI_angle_sum 0.00000000
#> CANO_angle_sum 0.09187474
#> PICO_angle_sum 0.00000000
#> PIEN_angle_sum 0.00000000
#> PIMO_angle_sum -0.01332930
#> PISI_angle_sum 0.00000000
#> POBA_angle_sum 0.00000000
#> PSME_angle_sum 0.00000000
#> TABR_angle_sum 0.00000000
#> THPL_angle_sum 0.00000000
#> TSHE_angle_sum 0.00000000
#> TSME_angle_sum -0.07790914
#> all_angle_sum 0.46876611
#> precip_mm 0.00000000
#> temp_C -0.00208531
#> elev_m -0.07239877
#> aet_mm -0.05362160
#> pet_mm 0.00000000
#> mse 1.01458525
#> R_squared -0.17339192