Checks a tree measurements dataset for missing data and formatting that could lead to errors when applying other functions in this package. Outputs a table of the trees with data issues and a table summarizing the number of trees in the mapping dataset that have data issues.

tree_check(tree_data, map_data)

Arguments

tree_data

Data frame containing tree measurement data where each row represents a single measurement of a single tree. Should contain the columns tree_id, stand_id, species, year and dbh. If a mort column (i.e. mortality data used for the mortality_model function) it will be checked for missing values but no warning will be produced if this column is absent. Any additional columns will be ignored by this function.

map_data

Data frame containing tree mapping data. Should contain the columns tree_id, stand_id, species, x_coord, and y_coord. Any additional columns will be ignored by this function. Note that this function does not check the mapping data, which should first be checked with the function mapping_check.

Value

A list containing two elements:

  • problem_trees is a data frame containing the tree ids found to have data issues and a description of the issue

  • issue_summary is a data frame that shows the number and percentage of trees with at least one issue and with each of the specific issues

Details

The data issues checked for are: presence of required columns, single tree ids referring to multiple trees, trees having no associated mapping data, missing dbh, stand id, species, or measurement year. The provided tree_data is also checked for the presence of a mort column containing mortality data; if this column is found, a check for missing mortality data is also performed. This function does not check for misspelled stand ids or species, which should be checked independently. The common issue of negative growth rates resulting from measurement error are not checked here, but are checked by growth_summary.

Tree ids indicated to have data issues according to this function are not necessarily unusable. For instance, missing year data could be inferred from knowledge on when certain stands were measured. The tree ids with data issues should therefore be investigated further rather than being excluded from further analyses right away.

Examples

tree_check_test <- tree_check(messy_tree, mapping)
#> [1] "Potential formatting problems detected: please review output and correct errors or remove problem trees if necessary before continuing analysis"