MGD Sections

IPCC good practice guidance

Significance and key category analysis

Design decisions relevant to national forest monitoring systems

Activity data x emission/removal factor tools

Fully integrated tools

Groundbased observations

Emissions/removals factors

Above and belowground biomass

Dead wood and litter pools

Soil organic carbon

Emissions from prescribed fires and wildfires

National choices in emissions and removals factor estimation

Emission and removal factor uncertainties

Estimating total emissions/removals and its uncertainty

Guiding principles – Estimation and uncertainty

Financial considerations

Sampling

Country examples – Tier 3 integration

Brief review of the potential for estimation of biomass by remote sensing

mgd_Appendix_H
Actions


Appendix B Sampling
Robust and reliable estimation of carbon in forest systems based on sampling must consider the following principles:
Identifying population units
The population is the total number of items, or units under consideration. Population units being sampled can range from plots to trees to points. Whatever type is chosen, the population units must be clearly identifiable, and any exclusions and their treatment noted. When sampling to calibrate an allometric model for example, the logical unit is a tree, but care is needed to deal with different parts – e.g. for the roots what is the practical minimum diameter to be considered? Plots for measuring forest stand characteristics can vary in size with examples ranging from 0.01 ha to over 1 ha, and can also include clusters of subplots (related to each other through their spatial placements) or designs where sizebased subpopulations are only measured on parts of a plot. Plot shape can be related to remotely sensed data attributes (e.g. pixel size of optical sensors) and are usually rectangular, square or circular. Optimum size and shape of plots will vary with forest conditions, with small area plots more typical in relatively homogeneous populations while larger plots are required in tropical forests where large trees result in high spatial variation in biomass. The combination of field and RS data may require larger plots, to achieve correspondence between ground conditions and the minimum mapping unit.
Selecting which individuals in the population to sample
Individuals are selected for either of two general sampling approaches – probabilitybased or modelbased.
Probabilitybased approaches rely on the ability to assign a probability of selection to each individual in the population. With such probability samples, samplebased estimates of parameters such as the mean or total can be inferred to represent the entire population. For example, simple random sampling, the most basic of these designs, assigns an equal probability to each individual. More efficient designbased approaches may be employed when some structure in the population can be reliably identified. For example, stratified sampling uses strata of relatively homogenous subpopulations to improve the efficiency of a given sampling effort. Designbased (or probabilitybased) inference requires probability samples, whereas modelbased inference can use, but does not require, probability samples.
Modelbased sampling can be used to select individuals to help parameterize a model. For this purpose, individuals do not need to be selected using a probabilitybased design, but rather are often selected to cover the range over which the model will be applied. Individuals may be selected to cover critical locations in the model domain, e.g. at the extremes, inflection points or where straight line relationships are anticipated. The way the individuals for measurement are identified and located should be transparent. Once the model has been constructed, it can be used with modelbased inference to infer estimates of population parameters.
These two approaches are not mutually exclusive, e.g. modelbased approaches have been used within designbased approaches like stratified random sampling (Wood & Schreuder, 1986). Box 31 provides more detail on designbased and modelbased sampling.
Box 31: Design and model based sampling
Designbased sampling, also known as probabilitybased sampling, is a widelyknown sampling system. In this system, sample locations are selected by a predetermined random (probability based) process. The most frequent examples are simple random sampling, systematic sampling with a randomly selected starting point, and stratified random sampling, but cluster, double and sequential sampling approaches are also common. Every possible location must have a probability greater than zero of selection into the sample with the randomization process determining the particular sample locations. The probabilities are the sole basis for drawing conclusions or "inferences"  usually formulated as probability statements  from the sample about the population size (total, mean), proportion of the population with given characteristics (such as disturbance or occurrence of a rare species), or variance. This means that, if a sample is selected correctly according to the chosen random design, any inference based on these probabilities is valid and calculations do not rely on any assumption about the spatial distribution or other pattern in the population. Apart from measurement and observation errors and the errors from using allometric models, sampling is the only source of stochasticity considered and the effects of this uncertainty can be readily calculated. NFIs are typical probabilitybased sampling systems with plots established on systemic grids (with or without stratification) where the probability of selection for each plot (within a stratum) is equal and known. Probability sampling designs do not preclude unequal probabilities of selection into the sample. Examples include sampling proportional to, size (as in point sampling or variable radius sampling) or proportional to a prediction (estimated volume or height as in 3P sampling – Probability Proportional to Prediction).
Modelbased sampling systems hypothesise the existence of a model that relates predictor (X or independent) variables to the response (Y, or dependent) variables of interest. A sample is drawn to allow inferences about this model, and the distribution of data around the model predictions. Two types of inference are therefore made under modelbased sampling, concerning: (i) the values at locations unvisited during sampling; and (ii) parameters of the model, including the confidence intervals of the parameterised model. Estimates of the mean Y in a modelbased system would be based on the inferences about the model at the value of the mean X. For example, a modelbased system that uses LIDAR as a predictor variable might rely on an assumption that biomass is linearly related to the mean height above the ground of the returns per unit area. A purposive sample of field locations could be drawn to parameterise this model and the mean biomass of the forest could be estimated from this parameterised model and the mean LIDAR return over the entire forest. Accuracy of these estimates would depend on the legitimacy of the assumed model and the actual sample locations (within the model space). Inferences at specific locations could also be made although these will be less precise than the population mean estimates. Modelbased systems do not assume that the probabilities of any sample location (pair of X and Y variables) are determined by the design, but rather they are an outcome of the chosen random model – for any given X, the Y values are likely to be centred around the model mean. Where the variation in Y around the model prediction is less than the total variation in Y, modelbased systems can provide increased precision of estimates.
Selecting the number of individuals to sample
Sample size
To select a ground sample, the first step is to determine the sample size which is usually predetermined (sample size, n). Predetermined sample size approaches include those where: (i) the sample size is fixed by the available budget or need to have historical consistency; (ii) a systematic approach is adopted to sample selection (e.g. by use of spatial grid of predetermined resolution); (iii) a predetermined estimate has been made of the number required to produce usefully precise estimates. Predetermined sample sizes to produce usefully precise estimates for the targeted population (or subpopulation or stratum), or for parameter estimation in the case of modelbased sampling, must be based on estimates of the variability of the (sub) populations, which may be available from existing data or reconnaissance surveys. Useful estimates are often defined in terms of the precision desired which in many cases is taken to be 10% as a default at the 95% confidence interval. The estimated sample size required under simple random sampling of a population (or a stratum within a population) is:
where σ is the sample standard deviation expressed as a percentage of the mean when the sample is used alone to produce an estimate or σ is the standard deviation of the residual errors if the sample is used in combination with auxiliary data (e.g. remotely sensed data or existing maps) to produce an estimate, P is half the width of the interval, also expressed as a percentage of the mean and t is taken from the t distribution with degrees of freedom equal to n minus the number of parameters being estimated, at the confidence desired, commonly 0.05 corresponding to 95% confidence. Sample sizes to detect rare occurrences (e.g. disturbance in forests such as deforestation) may need to be relatively large under simple random sampling designs. For example, a sample of size of n > 300 is required if annual levels of forest disturbance were expected to be only about 1% of the population units , and sample units were selected via simple random sampling. Stratified sampling can increase efficiency significantly.
Supplementary sampling
Supplementary sampling may be used where an NFI or other extensive plotbased measurement system with a predefined sample size is already in place but does not adequately cover the whole population, or results in a precision that is too small to be reliable for the proposed forest monitoring system. Given the need for random selection (ability to determine the individuals to be selected) in probability sampling, the selection of additional sample units will be difficult in some circumstances. Where a systematic approach to sampling was originally used (e.g. sample locations at the intersection of a regularly spaced grid that was randomly overlaid on the population), additional sampling points can be assigned as an extension of that grid into areas originally excluded. Such an extension is particularly relevant when individuals in the original sample had been excluded due to tenure (e.g. by not including land managed by an Agricultural or Conservation Department even though it included forest by the national definition). The extended areas should maintain a separate identity if a stratified approach is used, but the systematic grid may be manipulated (e.g. only select every 2nd intersection) to ensure the sample size within the new stratum is appropriate (the number of samples per ha does not need to be constant between strata). Alternatively, if the stratum boundaries have not altered since the original sample but it has been determined that the precision of the stratum parameter estimates is insufficient, additional sample units can be selected using the original sampling approach (e.g. truly random or, more commonly relaying the same systematic grid but randomly choosing additional intersection points).
Where the original sample was not systematic and the population or strata boundaries have changed, it is difficult to add sample units under a designbased approach. One possibility could be to draw an entirely new probability sample, calculate estimates from each sample separately, and then combine the estimates. Otherwise a modelbased approach may be more appropriate. The original sample data may be used to parameterise the hypothesized model, with additional sample units chosen to improve the precision of the inferences about that model. For example, the original sample may be used to parameterise a model that relates LIDAR data or canopy characteristics to plot measurements of carbon. Additional plots should be established in strata not included in the original sample to ensure the hypothesised model is appropriate for the extended population. Under a modelbased system, the additional sample units need not use the original method of sample selection as inferences are not based on the selection design. Consequently if the inferences about the model are insufficiently precise (e.g. confidence limits of the model around the strata mean are too wide) then additional, ad hoc, sample points can be added provided they use the same plot measurement protocols of the original sample. Under a modelbased approach using a linear model, additional sample units that add the most information tend to be those measured at the extremes of the independent value range (e.g. tallest forests as determined by LIDAR) although sampling covering the full range of dependent variables, irrespective of how the underlying population is clumped along this range, is useful to ensure the model is appropriate.
Using sample measurements to make inferences about the target population
The number of individuals selected for field measurement must be sufficient to make it likely that estimates of population means and sampling errors will be sufficiently accurate and precise to cover the variability within the population of interest).
Where population parameters are estimated from the sum of subsamples or separate models or relationships, double counting of pools must be avoided. All errors must, as far as possible, be identified, and quantified. These include sampling errors, measurement errors, and model errors.
Effective application of sampling strategies and models often relies on stratification by climate (rainfall, temperature) or broad environmental conditions (altitude, topography, soil type), possibly integrated into biogeoclimatic zones. Such data may also be used directly to develop growth indices (e.g. net primary productivity) or as input into growth models or for prediction of carbon allocation ratios. Networks of weather stations and historical records can be enhanced through spatial modelling approaches to develop climate surfaces for use as input into models or for more effective stratification.
Permanent plots, can be used to improve the accuracy of change estimation when repeatedly measured over time. However if these plots are treated in a way that is different from the rest of the forest (e.g. not harvested or thinned in the same way), or if the original population changes due to the removal of specific types of land without a corresponding removal of plots, the permanent plot sample will no longer be representative of the current forest. Remotelysensed data, such as canopy cover or disturbance, may be used to determine whether the permanent plots have been treated in a nonrepresentative fashion. If the permanent plots are no longer representative of the larger forest, then new plots may be required to represent more accurately the current condition. If a subset of the already established plots continues to be representative, these can continue to be used by regarding them as a stratum or strata.
Alternatively, permanent plots may be incorporated into an approach whereby models and remotely sensed auxiliary variables are used to increase precision. Sampling with partial replacement systems where a proportion of plots are replaced each measurement period has been used in the past as a compromise to estimating change and current condition, but have generally been found to be a complex compromise and difficult to maintain.