Appendix B   Sampling

Robust and reliable estimation of carbon in forest systems based on sampling must consider the following principles:
Identifying population units
The population is the total number of items, or units under consideration. Population units being sampled can range from plots to trees to points. Whatever type is chosen, the population units must be clearly identifiable, and any exclusions and their treatment noted. When sampling to calibrate an allometric model for example, the logical unit is a tree, but care is needed to deal with different parts – e.g. for the roots what is the practical minimum diameter to be considered? Plots for measuring forest stand characteristics can vary in size with examples ranging from 0.01 ha to over 1 ha, and can also include clusters of sub-plots (related to each other through their spatial placements) or designs where size-based sub-populations are only measured on parts of a plot. Plot shape can be related to remotely sensed data attributes (e.g. pixel size of optical sensors) and are usually rectangular, square or circular. Optimum size and shape of plots will vary with forest conditions, with small area plots more typical in relatively homogeneous populations while larger plots are required in tropical forests where large trees result in high spatial variation in biomass. The combination of field and RS data may require larger plots, to achieve correspondence between ground conditions and the minimum mapping unit.
Selecting which individuals in the population to sample
Individuals are selected for either of two general sampling approaches – probability-based or model-based.
Probability-based approaches rely on the ability to assign a probability of selection to each individual in the population. With such probability samples, sample-based estimates of parameters such as the mean or total can be inferred to represent the entire population. For example, simple random sampling, the most basic of these designs, assigns an equal probability to each individual. More efficient design-based approaches may be employed when some structure in the population can be reliably identified. For example, stratified sampling uses strata of relatively homogenous sub-populations to improve the efficiency of a given sampling effort. Design-based (or probability-based) inference requires probability samples, whereas model-based inference can use, but does not require, probability samples.
Model-based sampling can be used to select individuals to help parameterize a model. For this purpose, individuals do not need to be selected using a probability-based design, but rather are often selected to cover the range over which the model will be applied. Individuals may be selected to cover critical locations in the model domain, e.g. at the extremes, inflection points or where straight line relationships are anticipated. The way the individuals for measurement are identified and located should be transparent. Once the model has been constructed, it can be used with model-based inference to infer estimates of population parameters.
These two approaches are not mutually exclusive, e.g. model-based approaches have been used within design-based approaches like stratified random sampling (Wood & Schreuder, 1986). Box 31 provides more detail on design-based and model-based sampling.

Box 31: Design and model -based sampling

Design-based sampling, also known as probability-based sampling, is a widely-known sampling system. In this system, sample locations are selected by a pre-determined random (probability based) process. The most frequent examples are simple random sampling, systematic sampling with a randomly selected starting point, and stratified random sampling, but cluster, double and sequential sampling approaches are also common. Every possible location must have a probability greater than zero of selection into the sample with the randomization process determining the particular sample locations. The probabilities are the sole basis for drawing conclusions or "inferences" - usually formulated as probability statements - from the sample about the population size (total, mean), proportion of the population with given characteristics (such as disturbance or occurrence of a rare species), or variance. This means that, if a sample is selected correctly according to the chosen random design, any inference based on these probabilities is valid and calculations do not rely on any assumption about the spatial distribution or other pattern in the population. Apart from measurement and observation errors and the errors from using allometric models, sampling is the only source of stochasticity considered and the effects of this uncertainty can be readily calculated. NFIs are typical probability-based sampling systems with plots established on systemic grids (with or without stratification) where the probability of selection for each plot (within a stratum) is equal and known. Probability sampling designs do not preclude unequal probabilities of selection into the sample. Examples include sampling proportional to, size (as in point sampling or variable radius sampling) or proportional to a prediction (estimated volume or height as in 3P sampling – Probability Proportional to Prediction).
Model-based sampling systems hypothesise the existence of a model that relates predictor (X or independent) variables to the response (Y, or dependent) variables of interest. A sample is drawn to allow inferences about this model, and the distribution of data around the model predictions. Two types of inference are therefore made under model-based sampling, concerning: (i) the values at locations unvisited during sampling; and (ii) parameters of the model, including the confidence intervals of the parameterised model. Estimates of the mean Y in a model-based system would be based on the inferences about the model at the value of the mean X. For example, a model-based system that uses LIDAR as a predictor variable might rely on an assumption that biomass is linearly related to the mean height above the ground of the returns per unit area. A purposive sample of field locations could be drawn to parameterise this model and the mean biomass of the forest could be estimated from this parameterised model and the mean LIDAR return over the entire forest. Accuracy of these estimates would depend on the legitimacy of the assumed model and the actual sample locations (within the model space). Inferences at specific locations could also be made although these will be less precise than the population mean estimates. Model-based systems do not assume that the probabilities of any sample location (pair of X and Y variables) are determined by the design, but rather they are an outcome of the chosen random model – for any given X, the Y values are likely to be centred around the model mean. Where the variation in Y around the model prediction is less than the total variation in Y, model-based systems can provide increased precision of estimates.
Selecting the number of individuals to sample
Sample size
To select a ground sample, the first step is to determine the sample size which is usually predetermined (sample size, n). Predetermined sample size approaches include those where: (i) the sample size is fixed by the available budget or need to have historical consistency; (ii) a systematic approach is adopted to sample selection (e.g. by use of spatial grid of pre-determined resolution); (iii) a predetermined estimate has been made of the number required to produce usefully precise estimates. Predetermined sample sizes to produce usefully precise estimates for the targeted population (or sub-population or stratum), or for parameter estimation in the case of model-based sampling, must be based on estimates of the variability of the (sub-) populations, which may be available from existing data or reconnaissance surveys. Useful estimates are often defined in terms of the precision desired which in many cases is taken to be 10% as a default at the 95% confidence interval. The estimated sample size required under simple random sampling of a population (or a stratum within a population) is:
where σ is the sample standard deviation expressed as a percentage of the mean when the sample is used alone to produce an estimate or σ is the standard deviation of the residual errors if the sample is used in combination with auxiliary data (e.g. remotely sensed data or existing maps) to produce an estimate, P is half the width of the interval, also expressed as a percentage of the mean and t is taken from the t distribution with degrees of freedom equal to n minus the number of parameters being estimated, at the confidence desired, commonly 0.05 corresponding to 95% confidence. Sample sizes to detect rare occurrences (e.g. disturbance in forests such as deforestation) may need to be relatively large under simple random sampling designs. For example, a sample of size of n > 300 is required if annual levels of forest disturbance were expected to be only about 1% of the population units , and sample units were selected via simple random sampling. Stratified sampling can increase efficiency significantly.
Supplementary sampling
Supplementary sampling may be used where an NFI or other extensive plot-based measurement system with a predefined sample size is already in place but does not adequately cover the whole population, or results in a precision that is too small to be reliable for the proposed forest monitoring system. Given the need for random selection (ability to determine the individuals to be selected) in probability sampling, the selection of additional sample units will be difficult in some circumstances. Where a systematic approach to sampling was originally used (e.g. sample locations at the intersection of a regularly spaced grid that was randomly overlaid on the population), additional sampling points can be assigned as an extension of that grid into areas originally excluded. Such an extension is particularly relevant when individuals in the original sample had been excluded due to tenure (e.g. by not including land managed by an Agricultural or Conservation Department even though it included forest by the national definition). The extended areas should maintain a separate identity if a stratified approach is used, but the systematic grid may be manipulated (e.g. only select every 2nd intersection) to ensure the sample size within the new stratum is appropriate (the number of samples per ha does not need to be constant between strata). Alternatively, if the stratum boundaries have not altered since the original sample but it has been determined that the precision of the stratum parameter estimates is insufficient, additional sample units can be selected using the original sampling approach (e.g. truly random or, more commonly re-laying the same systematic grid but randomly choosing additional intersection points).