Summer 2011 Papers

Since I last posted, THREE of the papers I’ve mentioned here previously have become available online! Here are their details, abstracts are below. Email me if you can’t get hold of them yourself.

Millington, J.D.A., Walters, M.B., Matonis, M.S., Laurent, E.J., Hall, K.R. and Liu, J. (2011) Combined long-term effects of variable tree regeneration and timber management on forest songbirds and timber production Forest Ecology and Management 262 718-729 doi: 10.1016/j.foreco.2011.05.002

Millington, J.D.A. and Perry, G.L.W. (2011) Multi-model inference in biogeography Geography Compass 5(7) 448-530 doi: 10.1111/j.1749-8198.2011.00433.x

Millington, J.D.A., Demeritt, D. and Romero-Calcerrada, R. (2011) Participatory evaluation of agent-based land use models Journal of Land Use Science 6(2-3) 195-210 doi:10.1080/1747423X.2011.558595

Millington, J.D.A. et al. (2011) Combined long-term effects of variable tree regeneration and timber management on forest songbirds and timber production Forest Ecology and Management 262 718-729
The structure of forest stands is an important determinant of habitat use by songbirds, including species of conservation concern. In this paper, we investigate the combined long-term impacts of variable tree regeneration and timber management on stand structure, songbird occupancy probabilities, and timber production in northern hardwood forests. We develop species-specific relationships between bird species occupancy and forest stand structure for canopy-dependent black-throated green warbler (Dendroica virens), eastern wood-pewee (Contopus virens), least flycatcher (Empidonax minimus) and rose-breasted grosbeak (Pheucticus ludovicianus) from field data collected in northern hardwood forests of Michigan’s Upper Peninsula. We integrate these bird-forest structure relationships with a forest simulation model that couples a forest-gap tree regeneration submodel developed from our field data with the US Forest Service Forest Vegetation Simulator (Ontario variant). Our bird occupancy models are better than null models for all species, and indicate species-specific responses to management-related forest structure variables. When simulated over a century, higher overall tree regeneration densities and greater proportions of commercially high value, deer browse-preferred, canopy tree Acer saccharum (sugar maple) than low-value, browse-avoided subcanopy tree Ostrya virginiana (ironwood) ensure conditions allowing larger harvests of merchantable timber and had greater impacts on bird occupancy probability change. Compared to full regeneration, no regeneration over 100 years reduces merchantable timber volumes by up to 25% and drives differences in bird occupancy probability change of up to 30%. We also find that harvest prescriptions can be tailored to affect both timber removal volumes and bird occupancy probability simultaneously, but only when regeneration is adequate. When regeneration is poor (e.g., 25% or less of trees succeed in regenerating), timber harvest prescriptions have a greater relative influence on bird species occupancy probabilities than on the volume of merchantable timber harvested. However, regeneration density and composition, particularly the density of Acer saccharum regenerating, have the greatest long-term effects on canopy bird occupancy probability. Our results imply that forest and wildlife managers need to work together to ensure tree regeneration density and composition are adequate for both timber production and the maintenance of habitat for avian species over the long-term. Where tree regeneration is currently poor (e.g., due to deer herbivory), forest and wildlife managers should pay particularly close attention to the long-term impacts of timber harvest prescriptions on bird species.

Millington, J.D.A. and Perry, G.L.W. (2011) Multi-model inference in biogeography Geography Compass 5(7) 448-530
Multi-model inference (MMI) aims to contribute to the production of scientific knowledge by simultaneously comparing the evidence data provide for multiple hypotheses, each represented as a model. With roots in the method of ‘multiple working hypotheses’, MMI techniques have been advocated as an alternative to null-hypothesis significance testing. In this paper, we review two complementary MMI techniques – model selection and model averaging – and highlight examples of their use by biogeographers. Model selection provides a means to simultaneously compare multiple models to evaluate how well each is supported by data, and potentially to identify the best supported model(s). When model selection indicates no clear ‘best’ model, model averaging is useful to account for parameter uncertainty. Both techniques can be implemented in information-theoretic and Bayesian frameworks and we outline the debate about interpretations of the different approaches. We summarise recommendations for avoiding philosophical and methodological pitfalls, and suggest when each technique is best used. We advocate a pragmatic approach to MMI, one that emphasises the ‘thoughtful, science-based, a priori’ modelling that others have argued is vital to ensure valid scientific inference.

Millington et al. (2011) Participatory evaluation of agent-based land use models Journal of Land Use Science 6(2-3) 195-210
A key issue facing contemporary agent-based land-use models (ABLUMs) is model evaluation. In this article, we outline some of the epistemological problems facing the evaluation of ABLUMs, including the definition of boundaries for modelling open systems. In light of these issues and given the characteristics of ABLUMs, participatory model evaluation by local stakeholders may be a preferable avenue to pursue. We present a case study of participatory model evaluation for an agent-based model designed to examine the impacts of land-use/cover change on wildfire regimes for a region of Spain. Although model output was endorsed by interviewees as credible, several alterations to model structure were suggested. Of broader interest, we found that some interviewees conflated model structure with scenario boundary conditions. If an interactive participatory modelling approach is not possible, an emphasis on ensuring that stakeholders understand the distinction between model structure and scenario boundary conditions will be particularly important.

Multi-Model Inference in Biogeography

Earlier this month I, along with George Perry, finished a review on multi-model inference for the journal Geography Compass. Geography Compass publishes state-of-the-art reviews aimed at students, researchers and non-specialist scholars. The manuscript is currently under review so we’ll have to see what the reviewers think before the paper is published. Once it’s available I’ll re-post here. In the meantime, here’s the abstract to whet your appetite;

Multi-model inference (MMI) aims to produce scientific knowledge by simultaneously comparing the evidence data provide for multiple hypotheses, each represented as a model. Stemming from the method of ‘multiple working hypotheses’, MMI techniques have been advocated as an alternative to null hypothesis significance testing. These techniques will likely be particularly useful in research fields such as biogeography where formal experimentation is difficult and data are often influenced by uncontrolled factors. In this paper we review two complementary MMI techniques – model selection and model averaging – and highlight examples of their use in the biogeography literature. Both techniques can be implemented in a Bayesian framework and we outline the debate about different interpretations of probability. We summarise recommendations for avoiding philosophical and methodological pitfalls, and suggest circumstances in which each technique will likely be most useful in. We finish by advocating a pragmatic approach to MMI, one that emphasises the ‘thoughtful, science-based, a priori’ modelling others have argued is vital to ensure valid scientific inference.

Bayesian Modelling in Biogeography

Recently I was asked to write a review of the current state-of-the-art of model selection and Bayesian approaches to modelling in biogeography for the Geography Compass journal. The intended audience for the paper will be interested but non-expert, and the paper will “…summarize important research developments in a scholarly way but for a non-specialist audience”. With this in mind, the structure I expect I will aim for will look something like this:

i) Introduction to the general issue of model inference (i.e., “What is the best model to use?”). This section will likely discuss the modelling philosophy espoused by Burnham and Anderson and also highlight some of the criticisms of null-hypothesis testing using p-values. Then I might lead into possible alternatives (to standard p-value testing) such as:

ii) AIC approaches (to find the ‘best approximating model’)

iii) Bayesian approaches (including Bayesian Model Averaging, as I’ve discussed on this blog previously)

iv) Some applied examples (including my deer density modelling for example)

vi) A brief summary

I also expect I will try to offer some practical hint and tips, possibly using boxes with example R code (maybe for the examples in iv). Other published resources I’ll draw on will likely include the excellent books by Ben Bolker and Michael McCarthy. As things progress I may post more, and I’ll be sure to post again when the paper is available to read in full.

Bird Occupancy Modelling

Birds have been given short shrift in my posts blog posts about the Michigan UP ecological-economic modelling project. It’s not that we have forgotten about them, it’s just that before we got to incoporating them into our modelling there were other things to deal with first. Now that we’ve made progress on modelling deer distribution it’s time to turn our attention to how we can represent the potential impacts of forest management on bird habitat so that we might better understand the tradeoffs that will need to be negotiated to achieve both economic and ecological sustainability.

Ovenbird (Seiurus aurocapillus)
Ovenbird (Seiurus aurocapillus)

One of the things we want to do is link our bird-vegetation modelling with Laila Racevskis‘ assessment of the economic value of bird species she did during her PhD research. Laila assessed local residents’ willingess-to-pay for ensuring the conservation of several bird species of concern in our study area. If we can use our model to examine the effects of different timber management plans (each yielding different timber volumes) on the number of bird species present in an area we can use Laila’s data to examine the economic tradeoffs between different management approaches. The first thing we need to do to achieve this is be able to estimate how many bird species would be present in a given forest stand.

Right now the plan is to estimate the presence of songbird species of concern in forest stands by using the data Ed Laurent collected during his PhD research at MSU. To this end I’ve been doing some reading on the latest occupancy modelling approaches and reviewing the literature on its application to birds in managed forests. Probably the most popular current approach was developed recently by Darryl Mackenzie and colleagues – it allows the the estimation of whether a site is occupied by a given species or not when we know that our detection is imperfect (i.e. when we know we have false negative observations in our bird presence data). The publication of some nice overviews of this approach (e.g. Mackenzie 2006) plus the development of software to perform the analyses are likely to be at the root of this popularity.

The basic idea of the approach is that if we are able to make multiple observations at a site (and if we assume that bird populations and habitat do not change between these observations) we can use the probability of each bird observation history at a site across all the sites to form a model likelihood. This likelihood can then be used to estimate the parameters using any likelihood-based estimation procedure. Covariates can be used to model both the probability of observation and detection (i.e. we can account for factors that may have hindered bird observation such a wind strength or the time of day). I won’t go into further detail here because there’s an excellent online book that will lead you through the modelling process, and you can download the software and try it yourself.

Two recent papers have used this approach to investigate bird species presence given different forest conditions. DeWan et al. 2009 used Mackenzie’s occupancy modelling approach to examine impacts of urbanization on forest birds in New York State (they do a good job of explaining how they apply Mackenzie’s approach to their data and study area). DeWan considered landscape variables such as perimeter-area ratios of habitat patches and proximity to urban area to create occupancy models for 9 birds species at ~100 sites. They found that accounting for imperfect bird detection was important and that habitat patch “perimeter-area ratio had the most consistent influence on both detection probability and occupancy” (p989).

In a slightly different approach Smith et al. 2008 estimated site occupancy of the black-throated blue warbler (Dendroica caerulescens) and ovenbird (Seiurus aurocapillus) in 20 northern hardwood-conifer forest stands in Vermont. At each bird observation site they had also collected stand structure variables including basal area, understory density and tree diameters (in contrast to DeWan et al who only considered landscape-level variables). Smith et al. write their results “demonstrate that stand-level forest structure can be used to predict the occurrence of forest songbirds in northern hardwood-conifer forests” (p43) and “suggest that the role of stand-level vegetation may have been underestimated in the past” (p36).

Our approach will take the best aspects from both these studies; the large sample size of DeWan et al. with the consideration of stand-level variables like Smith et al. More on this again soon I expect.

Synthetic Trees

When testing and using simulation models we often need to use synthetic data. This might be because we want to examine the effects of different initial conditions on our model output, or simply because we have insufficient data to examine a system at the scale we would like to. The ecological-economic modelling project I’m currently working on is in both these situations, and over the last week or two I’ve been working on generating synthetic tree-level data so that we can initialize our model of forest stand change for testing and scenario development. Here’s a brief overview of how I’ve approached the task of producing a ‘treelist generator’ from the empirical data we have for over 60,000 trees in Northern Hardwood stands across Upper Michigan.

One of the key measures we can use to characterise forest stands is basal area (BA). We can assume that for each stand we generate a treelist for there is some ‘target BA’ that we are aiming to produce. As well as hitting a target BA, we also need to make sure that the tree diameter-at-breast-height (DBH) size-class distribution and species composition are representative of the stands in our empirical data. Therefore, our the first step is to look at the diameter size-class distribution of the stands we want to emulate. We can do this by plotting histograms of the frequency of trees of different diameter for each stand. In the empirical data we see two characteristic distributions (Fig 1).

Fig 1. Example stand tree count histograms

The distribution on the left has very many more trees in the smaller size classes as a result of stands self-thinning (as larger trees compete for finite resources). The second distribution, in which the smallest size classes are under-represented and larger size classes have relatively more trees, does not fit so well with the theoretical, self-thinning DBH size-class distribution. Stands with a distribution like this have probably been influenced by other factors (for example deer browse on the smaller trees). However, it turns out that both these DBH size-class distributions can be pretty well described by the gamma probability distribution (Fig 2).

Fig 2. Example stand gamma probability distributions for data in Fig 1

The gamma distribution has two parameters, a shape parameter we will call alpha and a scale parameter we will call beta. Interestingly, in the stands I examined (dominated by Sugar Maple and Ironwood) there are two different linear relationships between the parameters. The relationship between alpha and beta for 80% of stands represents the ‘self-thinning’ distribution, and the other 20% represent distributions in which small DBH classes are under-represented. We use these relationships – along with the fact that the range of values of alpha for all stands has a log-normal distribution – to generate characteristic DBH size-class distributions;

  1. sample a value of alpha from a normal distribution (subsequently reconvert using 10alpha),
  2. for the two different relationships use Bayesian linear regression to find mean and 95% credible intervals for the slope and intercept of a regression line between alpha and beta,
  3. use the value of alpha with the regression parameters to produce a value of beta.

So now for each stand we have a target basal area, and parameters for the DBH size class distribution. The next step is to add trees to the stand with diameters specified by the probability distribution. Each time we add a tree, basal area is added to the stand. The basal area for a tree is calculated by:

TreeBA = TreeDensity * (0.005454* diameter2)

[Tree density can be calculated for each tree because we know the sampling strategy used to collect empirical data on our timber cruise, whether on a fixed area plot, n-tree or with a prism].

Once we get within 1% of our target BA we stop adding trees to the stand [we’ll satisfy ourselves with a 1% accuracy because the size of tree that we allocate each time is sampled from a probability distribution and so we it is unlikely we will be able to hit our target exactly]. The trees in our (synthetic) stand should now (theoretically) have the appropriate DBH size-class distribution and basal area.

With a number of trees in now in our synthetic stand, each with a DBH value, the next step is to assign each tree to a species so that the stand has a representative species composition. For now, the two species we are primarily interested in are Sugar Maple and Ironwood. However, we will also allow trees in our stands to be Red Maple, White Ash, Black Cherry or ‘other’ (these are the next most common species in stands dominated by Sugar Maple and Ironwood). First we estimate the proportion of the trees in each species. In stands with Sugar Maple and Ironwood deer selectively browse Sugar Maple, allowing Ironwood a competitive advantage. Correspondingly, in the empirical data we observe a strong linear and inverse relationship between the abundance of Sugar Maple and Ironwood (Fig 3).

Fig 3. Relationship between stand Sugar Maple and Ironwood abundance

To assign species proportions we first estimate the proportion of Sugar Maple from the empirical data. Next, using the strong inverse relationship above we estimate the corresponding proportion of Ironwood (sampled using normal distribution with mean and standard deviation from from Bayesian linear regression). The remaining species proportions are assigned according to the frequency of their presence in the empirical data.

Now we use these proportions to assign a species to individual trees. Because physiology varies between species, the probability that a tree is of a given size also varies between species. For example, Ironwood very seldom reach DBH greater than 25 cm and the vast majority (almost 99% in our data) are smaller than 7.6 cm (3 inches) in diameter. Consequently, first we assign the appropriate number Ironwood to trees according to their empirical size-class distribution, before then assigning all other trees to the remaining species (using a uniform distribution).

The final step in generating our treelist is to assign each tree a height and a canopy ratio. We do this using empirical relationships between diameter and height for each species that are available in the literature (e.g. Pacala et al. 1994). And we’re done!

In the model I’m developing, these stands can be assigned a spatial location either using a pre-existing empirical map or using a synthetic land cover map with known characteristics (generated for example using the modified random clusters method, as the SIMMAP 2.0 software does). In either case we can now run the model multiple times to investigate the dynamics and consequences of different initial conditions. More on that in the future.

Ensemble Modelling

Uncertainty is an inherent part of modelling. Models by definition are simplified representations of reality – as such they are all wrong. Being wrong doesn’t necessarily make them useless, but it helps to have some idea about how wrong they are for them to be most useful. That’s why we should always try to provide some means to assess the uncertainty in our models output. Producing multiple realisations of a model and its results – a model ensemble – is one way to do this.

Depending on our model we can use ensemble methods to examine four different sources of modelling uncertainty:

a) Model Structure: how appropriate are our model variables, relationships and rules?

b) Model Parameters: what numerical values appropriately represent the strengths of relationships between variables?

c) Initial Conditions: how does our uncertainty in the initial state of the system we are modelling propagate through the model to the output?

d) Boundary Conditions: how do alternative (uncertain) scenarios of events that perturb our model influence output?

In their recent paper, Arujo and New show how ensemble modelling might be used to assess the impacts of these different kinds of uncertainty on model output. They advocate the use of multiple models within an ensemble forecasting framework and argue that more robust forecasts can be achieved via the appropriate analysis of ensemble forecasts. This is important because projections between different models can be so variable as to compromise their usefulness for guiding policy decisions. For example, upon examining nine bioclimatic models for four South African plant species, Pearson et al. found that for different scenarios of future climate predicted changes in species distribution varied from 92% loss to 322% gain between the different models! It’s uncertainty like this that stifles debate about the presence and impacts of anthropogenic climate change. Araujo and New go on to discuss the uses and limitations of ensemble modelling for supporting policy decisions in biodiversity conservation.

In a previous post I discussed how Bayesian methods can be used to examine uncertainty in model structure. I’ve been using Bayesian Model Averaging to help me identify which are the most appropriate predictors of local winter white-tailed deer density for our UP Forest project. Using the relationships developed via that modelling process I’ve produced spatial estimates of deer density in northern hardwood stands for a section of our study area (example below).

Hopefully forest managers will find this sort of modelling useful for their planning (I’ll ask them sometime). However, I think this sort of product will be even more useful if I can provide the managers with a spatial estimate of uncertainty in the deer density estimates. This is important not only to emphasise that there is uncertainty in the model results generally, but also to highlight where (in the landscape) the model is more or less likely to be correct. Here’s the uncertainty map corresponding with the deer density estimate map above.

In this map the lighter colours (yellows and greens) indicate less certainty in the deer density estimate at that point. If managers were to take action in this landscape to reduce deer densities they could use a combination of the maps to find locations where deer densities are estimated to be high with low uncertainty.

To be more specific, the uncertainty map above is the standard deviation of 1,000 deer density estimate maps (correspondingly the deer density map is the mean of these 1,000 models). For each of the 1,000 deer density estimates I used slightly different model parameter values, each chosen with a certain probability. These 1,000 realisations are my model ensemble. The probability a value would be chosen for use as a parameter in any of the 1,000 models was specified by a (normal) probability distribution which came from the mean and standard deviation provided by the original Bayesian regression model. To produce the 1,000 models and sample their parameter values from a probability distribution I wrote my own program which made use of the standalone R math libraries built by Chris Jennings.

Appropriately representing and communicating uncertainty in model output is vital if models and modelling is to be useful for non-modellers (e.g., policy-makers, natural resource managers, etc.). Spatial ensemble modelling helps us to do this by identifying locations where we are more or less confident about our model output.

Bayesian Multimodel Inference

The abstract we submitted to the ESA Meeting was accepted a while back. Since we submitted it, Megan and I have been back in the field for some additional data collection and I’ve been doing some new analyses. Some of these new analyses are the result of my attendance at the Bayesian statistics workshop at US-IALE in Snowbird. Since then I’ve been learning more by picking the brain of a former advisor, George Perry, and doing a fair bit of reading (reading list with links at bottom). And of course, using my own data has helped a lot.

One of the main questions I’m facing, as many ecologists often do, is “which variables should be in my regression model?” This question lies at the core of model inference and assumes that it is appropriate to infer ecological process from data by searching for the single model that represents reality most accurately. However, as Link and Barker put it:

“It would be nice if there were no uncertainty about models. In such an ideal world, a single model would be available; the data analyst would be in the enviable position of having only to choose the best method for fitting model parameters based on the available data. The choice would be completely determined by the statistician’s theory, a theory which regards the model as exact depiction of the process that generated the data.

“It is clearly wrong to use the data to choose a model and then to conduct subsequent inference as though the selected model were chosen a priori: to do so is to fail to acknowledge the uncertainties present in the model selection process, and to incestuously use the data for two purposes.”

Thus, it usually more appropriate to undertake a process of multi-model inference and search for the ‘best’ possible model (given current data) rather than a single ‘true’ model. I’ve been looking into the use of Bayesian Model Averaging to address this issue. Bayesian approaches take prior knowledge (i.e., a probability distribution) and data about a system and combine them with a model to produce posterior knowledge (i.e., another probability distribution). This approach differs from the frequentist approach to statistics which calculates probabilities based on the idea of a (hypothetical) long-run of outcomes from a sequence of repeated experiments.

For example, estimating the parameters of a linear regression model using a Bayesian approach differs from a frequentist ordinary least squares (OLS) approach in two ways:

i) a Bayesian approach considers the parameter to be a random variable that might take a range of values each with a given probability, rather than being fixed with unknown probability,

ii) a Bayesian approach conditions the parameter estimate probability on the sample data at hand and not as the result of a set of multiple hypothetical independent samples (as the OLS approach does).

If there is little prior information available about the phenomena being modelled, ‘uninformative priors’ (e.g., a normal distribution with a relatively large variance about a mean of zero) can be used. In this case, the parameter estimates produced by the Bayesian linear regression will be very similar to those produced by regular OLS regression. The difference is in the error estimates and what they represent; a 95% confidence interval produced by a Bayesian analysis specifies that there is a 95% chance that the true value is within that interval given the data analyzed, whereas a 95% confidence interval from a frequentist (OLS) approach implies that if (hypothetical) data were sampled a large number of times, the parameter estimate for those samples would lie within that interval 95% of those times.

There has been debate recently in ecological circles about the merits of Bayesian versus frequentist approaches. Whilst some have strongly advocated the use of Bayesian approaches (e.g., McCarthy 2007), others have suggested a more pluralistic approach (e.g., Stephens et al. 2005). One of the main concerns with the approach of frequentist statistics is related to a broader criticism of the abuse and misuse of the P-value. For example, in linear regression models P-values are often used to examine the hypothesis that the slope of a regression line is not equal to zero (by rejecting the null hypothesis that is equal to zero). Because the slope of a regression line on a two-dimensional plot indicates the rate of change of one measure with respect to the other, a non-zero slope indicates that as one measure changes, so does the other. Consequently it is often inferred that a processes represented by one measure had an effect, or caused, the change in the other). However, as Ben Bolker points out in his excellent book:

“…common sense tells us that the null hypothesis must be false, because [the slope] can’t be exactly zero [due to the inherent variation and error in our data] — which makes the p value into a statement about whether we have enough data to detect a non-zero slope, rather than about whether the slope is actually different from zero.”

This is not to say there’s isn’t a place for null hypothesis testing using P-values in the frequentist approach. As Stephens et al. argue, “marginalizing the use of null-hypothesis testing, ecologists risk rejecting a powerful, informative and well-established analytical tool.” To the pragmatist, using whatever (statistical) tool available seems eminently more sensible than placing all one’s eggs in one basket. The important point is to try to make sure that the hypotheses one tests with P-values are ecologically meaningful.

Back to Bayesian Model Averaging (BMA). BMA provides a method to account for uncertainty in model structure by calculating (approximate) posterior probabilities for each possible model (i.e., combination of variables) that could be constructed from a set of independent variables (see Adrian Raftery’s webpage for details and examples of BMA implementation). The ‘model set’ is all possible combinations of variables (equal to 2n models, where n is the number of variables in the set). The important thing to remember with these probabilities is that it is the probability that the model is the best one from the model set considered – the probability of other models with variables not measured or included in the model set obviously can’t be calculated.

The advantage over other model selection procedures like stepwise regression is that the output provides a measure of the performance of many models, rather than simply providing the single ‘best’ model. For example, here’s a figure I derived from the output BMA provides:

The figure shows BMA results for the five models with highest posterior probability of being the best candidate model from a hypothetical model set. The probability that each model is the best in the model set is shown at top for each model – Model 1 has almost 23% chance that it is the best model given the data available. Dark blocks indicate the corresponding variable (row) is included in a given model – so Model 1 contains variables A and B, whereas Model 2 contains Variable A only. Posterior probabilities of variables being included in the best model (in the model set) are shown to the right of the blocks – as we might expect given that Variable A is present in the five most probable models it has the highest chance of being included in the best model. Click for a larger image.

BMA also provides a posterior probability for each variable being included in the best candidate model. One of the cool things about the variable posterior probability is that it can be used to produce a weighted mean value from all the models for each variable parameter estimate, each with their own Bayesian confidence interval. The weight for each parameter estimate is the probability that variable is present in the ‘best’ model. Thus, the ‘average’ model accounts for uncertainty in variable selection in the best candidate model in the individual parameter estimates.

I’ve been using these approaches to investigate the potential factors influencing local winter white-tailed deer density in in managed forests of Michigan’s Upper Peninsula. One of the most frequently used, and freely available, software packages for Bayesian statistics is WinBUGS. However, because I like to use R I’ve been exploring the packages available in that statistical language environment. Specifically, the BRugs package makes use of many OpenBUGS components (you actually provide R with a script in WinBUGS format to run) and the BMA package provides functionality for model averaging. We’re in the final stages of writing a manuscript incorporating these analyses – once it’s a bit more polished (and submitted) I’ll provide an abstract.

Reading List
Model Inference: Burnham and Anderson 2002, Stephens et al. 2005, Link and Barker 2006, Stephens et al. 2007

Introduction to Bayesian Statistics: Bolker 2007 [webpage with chapter pre-prints and exercises here], McCarthy 2007

Discussion of BMA methods: Hoeting et al. 1999, Adrian Raftery’s Webpage

Examples of BMA application: Wintle et al. 2003, Thomson et al. 2007

Criticisms of Stepwise Regression: James and McCulloch 1990, Whittingham et al. 2006

Regional partitioning for wildfire regime characterization

Fighting wildfires is a strategic operation. In fire-prone areas of the world, such as California and the Mediterranean Basin, it is important that managers allocate and position fire trucks, water bombers and human fire-fighters in locations that minimize the response time to reach new fires. Not only is this important to reduce risk to human lives and livelihoods, the financial demands of fighting a prolonged campaign against multiple fires demands that resources be used as economically as possible.

Characterizing the wildfire regime of an area (the frequency, timing and magnitude of all fires) can be very useful for this sort of planning. If an area burns more frequently, or with greater intensity, on average, fire-fighting resources might be better placed in or near these areas. The relationship between the frequency of fires and the area they burn is one the characteristics that is particularly interesting from this perspective.

As I’ve written about previously with colleagues, although it is well accepted that the frequency-area distribution of wildfires is ‘heavy-tailed’ (i.e. there are many, many more small fires than large fires), the exact nature of this distribution is still debated. One of the distributions that is frequently used is the power-law distribution. Along with my former advisors Bruce Malamud and George Perry, I examined how this characteristic of the wildfire regime, the power-law frequency-area distribution, varied for different regions across the continental USA (see Malamud et al. 2005). Starting with previously defined ‘ecoregions’ (area with characterized by similar vegetation, climate and topography) we mapped how the frequency-area relationship varied in space, finding a systematic change from east to west across the country.

More recently, Paolo Fiorucci and colleagues (Fiorucci et al. 2008) have taken a slightly different approach. Rather than starting with pre-defined spatial regions and calculating the frequency-area distribution of all the fires in each region, they have devised a method that splits a large area into smaller regions based on the wildfire data for the entire area. Thus, they use the data to define the spatial differentiation of regions with similar wildfire regime characteristics a posteriori rather than imposing the spatial differentiation a priori.

Fiorucci and his colleagues apply their method to a dataset of 6,201 fires (each with an area greater than 0.01 sq km) that burned between 1987 and 2004 in the Liguria region of Italy (5400 sq km). They show that estimates of a measure of the wildfire frequency-area relationship (in this case the power-law distribution) of a given area varies significantly depending on how regions within that area are partitioned spatially. Furthermore, they found differences in spatial patterns of the frequency-area relationship between climatic seasons.

Using both a priori (the approach of Malamud et al. 2005) and a posteriori (the approach of Fiorucci et al. 2008) spatial delineation of wildfire regime areas, whilst simultaneously considering patterns in the processes believed to be driving wildfire regimes (such as climate, vegetation and topography), will lead to better understanding of wildfire regimes. That is, future research in this area will be well advised to look at the problem of wildfire regime characterization from both perspectives – data-driven and process-driven. The approach developed by Fiorucci et al. also provide much promise for a more rigorous, data-driven, approach to make decisions about the allocation and positioning of wildfire fire-fighting resources.

Citation and Abstract
Fiorucci, P., F. Gaetani, and R. Minciardi (2008) Regional partitioning for wildfire regime characterization, Journal of Geophysical Research, 113, F02013

Wildfire regime characterization is an important issue for wildfire managers especially in densely populated areas where fires threaten communities and property. The ability to partition a region by articulating differences in timing, frequency, and intensity of the phenomena among different zones allows wildfire managers to allocate and position resources in order to minimize wildfire risk. Here we investigate “wildfire regimes” in areas where the ecoregions are difficult to identify because of their variability and human impact. Several studies have asserted that wildfire frequency-area relationships follow a power law distribution. However, this power law distribution, or any heavy-tailed distribution, may represent a set of wildfires over a certain region only because of the data aggregation process. We present an aggregation procedure for the selection of homogeneous zones for wildfire characterization and test the procedure using a case study in Liguria on the northwest coast of Italy. The results show that the estimation of the power law parameters provides significantly different results depending on the way the area is partitioned into its various components. These finds also show that it is possible to discriminate between different wildfire regimes characterizing different zones. The proposed procedure has significant implications for the identification of ecoregion variability, putting it in a more mathematical basis.

The Tyranny of Power?

The past week or two I’ve been wrestling with the data we have on white-tailed deer density and vegetation in Michigan’s Upper Peninsula in an attempt to find some solid statistical relationships that we might use in our ecological-economic simulation model. However, I seem to be encountering similar issues to previous researchers, notably (as Weisberg and Bugmann put it) “the weak signal-to noise ratio that is characteristic of ungulate-vegetation systems”, that “multiple factors need to be considered, if we are to develop a useful, predictive understanding of ungulate-vegetation relationships”, and that “ungulate-vegetation interactions need to be better understood over multiple scales”.

Hobbs suggests that one of the problems slowing species distribution research is a preoccupation with statistical power that he calls “the tyranny of power”. This tyranny arises, he suggests, because traditional statistical methods that are powerful at smaller scales become less useful at larger extents. There are at least three reasons for this including,

  1. small things are more amenable to study by traditional methods than large things
  2. variability increases with scale (extent)
  3. potential for bias increases with scale (extent)

“The implication of the tyranny of power is that many of the traditionally sanctioned techniques for ecological investigation are simply not appropriate at large-scales… This means that inferences at large-scales are likely to require research designs that bear little resemblance to the approaches many of us learned in graduate school.” Hobbs p.230

However, this tyranny may simply be because, as Fortin and Dale point out, “most study areas contain more than one ecological process that can act at different spatial and temporal scales”. That is, the processes are non-stationary in time and space. Leaving time aside for now, spatial non-stationarity has already been found to be present in our study area with regards the processes we’re considering. For example, Shi and colleagues found that Geographically Weighted Regression (GWR) models are better at predicting white-tailed deer densities than an ordinary least-squares regression model for the entirety of our study area.

Hobbs’ argument suggests that it’s often useful analyse ecological data from large regions by partitioning them into smaller, more spatially homogenous areas. The idea is that these smaller patches are more likely to be governed by the same ecological process. But how should these smaller regions be selected? A commonly used geographical division is the ecoregion. Ecoregions divide land into areas of similar characteristics such as climate, soils, vegetation and topography. For our study area we’ve found that relationships between deer densities and predictor variables do indeed vary by Albert’s ecoregions. But we think that there might be more useful ways to divide our study area that take into account variables that are commonly believed to strongly influence spatial deer distributions. In Michigan’s UP the prime example is the large snow fall is received each winter and which hinders deer movement and foraging.

We’re beginning to examine how GWR and spatial boundary analysis might be used to delineate these areas (at different scales) in the hope of refining our understanding about the interaction of deer and vegetation across our large (400,000 ha) landscape. In turn we should be able to better quantify some of these relationships for use in our model.

Hierarchical Partitioning for Understanding LUCC

This post is my fourth contribution to JustScience week.

Multiple regression is an empirical, data-driven approach for modelling the response of a single (dependent) variable from a suite of predictor (independent) variables. Mac Nally (2002) suggests that multiple regression is generally used for two purposes by ecologists and biologists; 1) to assess the amount of variance exhibited by the dependent variable that can be attributed to each predictor variable, and 2) to find the ‘best’ predictive model (the model that explains most total variance). Yesterday I discussed the use of logistic regression (a form of multiple regression) models for predictive purposes in Land Use/Cover Change (LUCC) studies. Today I’ll present some work on an explanatory use of these methods.

Finding a multivariate model that uses the ‘best’ set of predictors does not imply that those predictors will remain the ‘best’ when used independently of one another. Multi-collinearity between predictor variables means that the use of the ‘best’ subset of variables (i.e. model) to infer causality between independent and dependent variables provides little valid ‘explanatory power’ (Mac Nally, 2002). The individual coefficients of a multiple regression model can only be interpreted for direct effects on the response variable when the other predictor variables are held constant (James & McCulloch, 1990). The use of a model to explain versus its use to predict must therefore be considered (Mac Nally, 2000).

Hierarchical partitioning (HP) is a statistical method that provides explanatory power, rather than predictive. It allows the contribution of each predictor to the total explained variance of a model, both independently and in conjunction with the other predictors, to be calculated for all possible candidate models. The use of the HP method developed by Chevan and Sutherland (1991) by ecologists and biologists in their multivariate analyses was first suggested by Mac Nally (1996). More recently, the method has been extended to help provide the ability to statistically choose which variables to retain once they have been ranked for their predictive use (Mac Nally, 2002). Details of how HP works can be found here.

With colleagues, I examined the use of hierarchical partitioning for understanding LUCC in my PhD study area, leading to a recent publication in Ecosystems. We examined the difference in using two different land-cover (LC) classifications for the same landscape, one classification with 10 LC classes, another with four. Using HP we found that more coarse LC classifications (i.e. fewer LC classes) causes the joint effects of variables to suppress total variance explained in LUCC. That is, the combined effect of explanatory variables increases the total explained variance (in LUCC) in regression models using the 10-class LC classification, but reduces total explained variance in the dependent variable for four-class models.

We suggested that (in our case at least) this was because the aggregated nature of the four-class models means broad observed changes (for example from agricultural land to forested land) masks specific changes within the classes (for example from pasture to pine forest or from arable land to oak forest). These specific transitions may have explanatory variables (causes) that oppose one another for the different specific transitions, decreasing the explanatory power of models that use both variables to explain a single broader shift. By considering more specific transitions, the utility of HP for elucidating important causal factors will increase.

We concluded that a systematic examination of specific LUCC transitions is important for elucidating drivers of change, and is one that has been under-used in the literature. Specifically, we suggested hierarchical partitioning should be useful for assessing the importance of causal mechanisms in LUCC studies in many regions around the world.

Technorati Tags: , ,