Bird Occupancy Modelling

Birds have been given short shrift in my posts blog posts about the Michigan UP ecological-economic modelling project. It’s not that we have forgotten about them, it’s just that before we got to incoporating them into our modelling there were other things to deal with first. Now that we’ve made progress on modelling deer distribution it’s time to turn our attention to how we can represent the potential impacts of forest management on bird habitat so that we might better understand the tradeoffs that will need to be negotiated to achieve both economic and ecological sustainability.

Ovenbird (Seiurus aurocapillus)
Ovenbird (Seiurus aurocapillus)

One of the things we want to do is link our bird-vegetation modelling with Laila Racevskis‘ assessment of the economic value of bird species she did during her PhD research. Laila assessed local residents’ willingess-to-pay for ensuring the conservation of several bird species of concern in our study area. If we can use our model to examine the effects of different timber management plans (each yielding different timber volumes) on the number of bird species present in an area we can use Laila’s data to examine the economic tradeoffs between different management approaches. The first thing we need to do to achieve this is be able to estimate how many bird species would be present in a given forest stand.

Right now the plan is to estimate the presence of songbird species of concern in forest stands by using the data Ed Laurent collected during his PhD research at MSU. To this end I’ve been doing some reading on the latest occupancy modelling approaches and reviewing the literature on its application to birds in managed forests. Probably the most popular current approach was developed recently by Darryl Mackenzie and colleagues – it allows the the estimation of whether a site is occupied by a given species or not when we know that our detection is imperfect (i.e. when we know we have false negative observations in our bird presence data). The publication of some nice overviews of this approach (e.g. Mackenzie 2006) plus the development of software to perform the analyses are likely to be at the root of this popularity.

The basic idea of the approach is that if we are able to make multiple observations at a site (and if we assume that bird populations and habitat do not change between these observations) we can use the probability of each bird observation history at a site across all the sites to form a model likelihood. This likelihood can then be used to estimate the parameters using any likelihood-based estimation procedure. Covariates can be used to model both the probability of observation and detection (i.e. we can account for factors that may have hindered bird observation such a wind strength or the time of day). I won’t go into further detail here because there’s an excellent online book that will lead you through the modelling process, and you can download the software and try it yourself.

Two recent papers have used this approach to investigate bird species presence given different forest conditions. DeWan et al. 2009 used Mackenzie’s occupancy modelling approach to examine impacts of urbanization on forest birds in New York State (they do a good job of explaining how they apply Mackenzie’s approach to their data and study area). DeWan considered landscape variables such as perimeter-area ratios of habitat patches and proximity to urban area to create occupancy models for 9 birds species at ~100 sites. They found that accounting for imperfect bird detection was important and that habitat patch “perimeter-area ratio had the most consistent influence on both detection probability and occupancy” (p989).

In a slightly different approach Smith et al. 2008 estimated site occupancy of the black-throated blue warbler (Dendroica caerulescens) and ovenbird (Seiurus aurocapillus) in 20 northern hardwood-conifer forest stands in Vermont. At each bird observation site they had also collected stand structure variables including basal area, understory density and tree diameters (in contrast to DeWan et al who only considered landscape-level variables). Smith et al. write their results “demonstrate that stand-level forest structure can be used to predict the occurrence of forest songbirds in northern hardwood-conifer forests” (p43) and “suggest that the role of stand-level vegetation may have been underestimated in the past” (p36).

Our approach will take the best aspects from both these studies; the large sample size of DeWan et al. with the consideration of stand-level variables like Smith et al. More on this again soon I expect.

Holiday Publications!

Update January 2010: This paper is now online with doi 10.1016/j.foreco.2009.12.020.

I received some good news this morning as I prepared to head back to the UK for the holidays. The paper I started writing back in January examining the white-tailed deer distribution in our managed forest landscape (the analysis for which inspired posts on Bayesian and ensemble modelling) has been accepted for publication and is ‘In Press’! I’ve copied the abstract below.

Another piece of publications news I received a while back is that the paper I co-authored with Raul Romero-Calcerrada and others modelling socioeconomic data to understand patterns of human-caused wildfire ignition risk has now officially been published in Ecological Modelling.

Happy Holidays everyone!

Effects of local and regional landscape characteristics on wildlife distribution across managed forests (In Press) Millington, Walters, Matonis, and Liu Forest Ecology and Management

Understanding impacts of local and regional landscape characteristics on spatial distributions of wildlife species is vital for achieving ecological and economic sustainability of forested landscapes. This understanding is important because wildlife species such as white-tailed deer (Odocoileus virginianus) have the potential to affect forest dynamics differently across space. Here, we quantify the effects of local and regional landscape characteristics on the spatial distribution of white-tailed deer, produce maps of estimated deer density using these quantified relationships, provide measures of uncertainty for these maps to aid interpretation, and show how this information can be used to guide co-management of deer and forests. Specifically, we use ordinary least squares and Bayesian regression methods to model the spatial distribution of white-tailed deer in northern hardwood stands during the winter in the managed hardwood-conifer forests of the central Upper Peninsula of Michigan, USA. Our results show that deer density is higher nearer lowland conifer stands and in areas where northern hardwood trees have small mean diameter-at-breast-height. Other factors related with deer density include mean northern hardwood basal area (negative relationship), proportion of lowland conifer forest cover (positive relationship), and mean daily snow depth (negative relationship).The modeling methods we present provide a means to identify locations in forest landscapes where wildlife and forest managers may most effectively co-ordinate their actions.

Keywords: wildlife distribution; landscape characteristics; managed forest; ungulate herbivory; northern hardwood; lowland conifer; white-tailed deer

Initial Michigan Forest Simulation Output

It’s taken a while but finally the model that I came to Michigan State to develop is producing what seems to be sensible output. Just recently we’ve brought all the analyses on the data that were collected in the field into a coherent whole. We’ll use this integrated model to investigate best approaches for forest and wildlife management to ensure ecological and economic sustainability. This post is a quick overview of what we’ve got at the moment and where we might take it. The image below provides a simplified view of the relationship of the primary components the model considers (a more detailed diagram is here).

The main model components I’ve been working on are the deer distribution, forest gap regeneration and tree growth and harvest sub-models. Right now we’re still in the model testing and verification stage but soon we hope to be able start putting it to use. Here’s a flow chart representing the current sequence of model execution (click for larger image):

As I’ve posted several times about the deer distribution modelling (here, here, and here for example) and because the integration of FVS with our analyses is more a technical than scientific issue, I’ll focus on the forest gap regeneration sub-model.

Most of the forest gap regeneration analyses used the data Megan Matonis collected during her two summers in the field (i.e., forest). During her fieldwork Megan measured gap and tree regeneration attributes such as gap size, soil and moisture regime, time since harvest, deer density, and sapling heights, density and species composition. Megan is writing up her thesis right now but we’ve also managed to find time to do some extra analyses on her data for the gap regeneration sub-model. Here’s the flow chart representing the model sequence to estimate initial regeneration in gaps created by a selection harvest in a forest stand (click for larger image):

In our gap regeneration sub-model we take a probabilistic approach to estimate the number and species of the first trees to reach 7m (this is the height at which we pass the trees to FVS to grow). The interesting equations for this are Eqs. 6 – 9 as they are responsible for estimating regeneration stocking (i.e. number of trees that regenerate) and the species composition of the regenerating trees. Through time the effects of the results of these equations will drive future forest composition and structure and the amount of standing timber available for harvest.

The probability that any trees regenerate in a gap is modelled using a generalized linear mixed model with a stand-level random intercept drawn from a normal distribution. The probability is a function of canopy gap area and deer browse category (high or low; calculated as a function of deer density in the stand).

If there are some regenerating trees in the gap, we use a logistic regression to calculate the probability that the gap contains as many (or more) trees as could fit in the gap when all the trees are 7m (and is therefore ‘fully stocked’). The probability is a function of canopy openness (calculated as a function of canopy gap area), soil moisture and nutrient conditions and deer density. If the gap is not fully stocked we sample the number of trees using from a uniform distribution.

Finally, we assign each tree to a species by estimating the relative species composition of the gap. We do this by assuming there are four possible species mixes (derived from our empirical data) and we use a logistic regression to calculate the probability that the gap has each of these four mixes. The probability of each mix is a function of soil moisture and nutrient conditions, canopy gap area, and stand-level basal area of Sugar Maple Ironwood. Currently we have parameterised the model to represent five species (Sugar Maple, Red Maple, White Ash, Black Cherry and Ironwood).

As the flow chart suggests, there is a little more to it than these three equations alone but hopefully this gives you a general idea about how we’ve approached this and what the important variables are (look out for publications in the future with all the gory details). For example, at subsequent time-steps in the simulation model we grow the regenerating trees until they reach 7m and also represent the coalescence of the canopy gaps. I haven’t integrated the economic sub-model into the program yet but that’s the next step.

So what can we use the model for? One question we might use the model to address is, ‘how does change in the deer population influence northern hardwood regeneration, timber revenue and deer hunting value?’ For example, in one set of initial model runs I varied the deer population to test how it affects regeneration success (defined as the number of trees that regenerate as a percentage of the maximum possible). Here’s a plot that shows how regeneration success decreases with increasing deer population (as we would expect given the model structure):

Because we are linking the ecological sub-models with economic analyses we can look at how these differences will play out through time to examine potential tradeoffs between ecological and economic values. For example, because we know (from our analyses) how the spatial arrangement of forest characteristics influences deer distribution we can estimate how different forest management approaches in different locations influences regeneration through time. The idea is that if we can reduce deer numbers in a given area immediately after timber harvest we can give trees a chance to survive and grow above the reach of deer – moving deer spatially does not necessarily mean reducing the total population (which would reduce hunting opportunities, an important part of the local economy). The outcomes may look something like this:

We plan to use our model to examine scenarios like this quantitatively. But first, I need to finish testing the model…

Synthetic Trees

When testing and using simulation models we often need to use synthetic data. This might be because we want to examine the effects of different initial conditions on our model output, or simply because we have insufficient data to examine a system at the scale we would like to. The ecological-economic modelling project I’m currently working on is in both these situations, and over the last week or two I’ve been working on generating synthetic tree-level data so that we can initialize our model of forest stand change for testing and scenario development. Here’s a brief overview of how I’ve approached the task of producing a ‘treelist generator’ from the empirical data we have for over 60,000 trees in Northern Hardwood stands across Upper Michigan.

One of the key measures we can use to characterise forest stands is basal area (BA). We can assume that for each stand we generate a treelist for there is some ‘target BA’ that we are aiming to produce. As well as hitting a target BA, we also need to make sure that the tree diameter-at-breast-height (DBH) size-class distribution and species composition are representative of the stands in our empirical data. Therefore, our the first step is to look at the diameter size-class distribution of the stands we want to emulate. We can do this by plotting histograms of the frequency of trees of different diameter for each stand. In the empirical data we see two characteristic distributions (Fig 1).

Fig 1. Example stand tree count histograms

The distribution on the left has very many more trees in the smaller size classes as a result of stands self-thinning (as larger trees compete for finite resources). The second distribution, in which the smallest size classes are under-represented and larger size classes have relatively more trees, does not fit so well with the theoretical, self-thinning DBH size-class distribution. Stands with a distribution like this have probably been influenced by other factors (for example deer browse on the smaller trees). However, it turns out that both these DBH size-class distributions can be pretty well described by the gamma probability distribution (Fig 2).

Fig 2. Example stand gamma probability distributions for data in Fig 1

The gamma distribution has two parameters, a shape parameter we will call alpha and a scale parameter we will call beta. Interestingly, in the stands I examined (dominated by Sugar Maple and Ironwood) there are two different linear relationships between the parameters. The relationship between alpha and beta for 80% of stands represents the ‘self-thinning’ distribution, and the other 20% represent distributions in which small DBH classes are under-represented. We use these relationships – along with the fact that the range of values of alpha for all stands has a log-normal distribution – to generate characteristic DBH size-class distributions;

  1. sample a value of alpha from a normal distribution (subsequently reconvert using 10alpha),
  2. for the two different relationships use Bayesian linear regression to find mean and 95% credible intervals for the slope and intercept of a regression line between alpha and beta,
  3. use the value of alpha with the regression parameters to produce a value of beta.

So now for each stand we have a target basal area, and parameters for the DBH size class distribution. The next step is to add trees to the stand with diameters specified by the probability distribution. Each time we add a tree, basal area is added to the stand. The basal area for a tree is calculated by:

TreeBA = TreeDensity * (0.005454* diameter2)

[Tree density can be calculated for each tree because we know the sampling strategy used to collect empirical data on our timber cruise, whether on a fixed area plot, n-tree or with a prism].

Once we get within 1% of our target BA we stop adding trees to the stand [we’ll satisfy ourselves with a 1% accuracy because the size of tree that we allocate each time is sampled from a probability distribution and so we it is unlikely we will be able to hit our target exactly]. The trees in our (synthetic) stand should now (theoretically) have the appropriate DBH size-class distribution and basal area.

With a number of trees in now in our synthetic stand, each with a DBH value, the next step is to assign each tree to a species so that the stand has a representative species composition. For now, the two species we are primarily interested in are Sugar Maple and Ironwood. However, we will also allow trees in our stands to be Red Maple, White Ash, Black Cherry or ‘other’ (these are the next most common species in stands dominated by Sugar Maple and Ironwood). First we estimate the proportion of the trees in each species. In stands with Sugar Maple and Ironwood deer selectively browse Sugar Maple, allowing Ironwood a competitive advantage. Correspondingly, in the empirical data we observe a strong linear and inverse relationship between the abundance of Sugar Maple and Ironwood (Fig 3).

Fig 3. Relationship between stand Sugar Maple and Ironwood abundance

To assign species proportions we first estimate the proportion of Sugar Maple from the empirical data. Next, using the strong inverse relationship above we estimate the corresponding proportion of Ironwood (sampled using normal distribution with mean and standard deviation from from Bayesian linear regression). The remaining species proportions are assigned according to the frequency of their presence in the empirical data.

Now we use these proportions to assign a species to individual trees. Because physiology varies between species, the probability that a tree is of a given size also varies between species. For example, Ironwood very seldom reach DBH greater than 25 cm and the vast majority (almost 99% in our data) are smaller than 7.6 cm (3 inches) in diameter. Consequently, first we assign the appropriate number Ironwood to trees according to their empirical size-class distribution, before then assigning all other trees to the remaining species (using a uniform distribution).

The final step in generating our treelist is to assign each tree a height and a canopy ratio. We do this using empirical relationships between diameter and height for each species that are available in the literature (e.g. Pacala et al. 1994). And we’re done!

In the model I’m developing, these stands can be assigned a spatial location either using a pre-existing empirical map or using a synthetic land cover map with known characteristics (generated for example using the modified random clusters method, as the SIMMAP 2.0 software does). In either case we can now run the model multiple times to investigate the dynamics and consequences of different initial conditions. More on that in the future.

ESA 2009 Agenda

I’ve just arrived in Albuquerque, New Mexico, for the Ecological Society of American meeting. Before heading out to explore town I’ve been putting the final touches to my presentation (Monday, 4.40pm, Sendero Ballroom III) and working out what I’m going to do this week. Here’s what I think I’ll be doing:

i) Importantly, on Monday at 2.30 I’ll be going to support Megan Matonis as she talks about the work she’s been doing on our UP project: ‘Gap-, stand-, and landscape-scale factors affecting tree regeneration in harvest gaps’.

ii) Monday morning I think I’ll attend the special session ‘What is Sustainability Science and Can It Make Us Sustainable?’ [“What is sustainability science and can it make us sustainable? If sustainability science requires interdisciplinarity, how do these diverse disciplines integrate the insights that each brings? How do we reconcile differing basic assumptions to solve an urgent and global problem? How do we ensure that research outputs of ecology and other disciplines lead toward sustainability?”]

iii) Tuesday, amongst other things, I’ll check out the symposium entitled; ‘Global Sustainability in the Face of Uncertainty: How to More Effectively Translate Ecological Knowledge to Policy Makers, Managers, and the Public’. [“The basic nature of science, as well as life, is that there will always be uncertainty. We define uncertainty as a situation in which a decision-maker (scientist, manager, or policy maker) has neither certainty nor reasonable probability estimates available to make a decision. In ecological science we have the added burden of dealing with the inherent complexity of ecological systems. In addition, ecological systems are greatly affected by chance events, further muddying our ability to make predictions based on empirical data. Therefore, one of the most difficult aspects of translating ecological and environmental science into policy is the uncertainty that bounds the interpretation of scientific results.”]

iv) Wednesday I plan on attending the symposium ‘What Should Ecology Education Look Like in the Year 2020?’ [“How should ecology education be structured to meet the needs of the next generation, and to ensure that Americans prioritize sustainability and sound ecological stewardship in their actions? What balance between virtual and hands-on ecology should be taught in a cutting-edge ecological curriculum? How can we tackle the creation versus evolution controversy that is gaining momentum?”]

v) Being a geographer (amongst other things) on Thursday I’d like to participate in the discussion regarding place; ‘The Ecology of Place: Charting a Course for Understanding the Planet’ [“The diversity, complexity, and contingency of ecological systems both bless and challenge ecologists. They bless us with beauty and endless fascination; our subject is never boring. But they also challenge us with a difficult task: to develop general and useful understanding even though the outcomes of our studies typically depend on a host of factors unique to the focal system as well as the particular location and time of the study. Ecologists address this central methodological dilemma in various ways. … Given the pressing environmental challenges facing the planet, it is critical that ecologists develop an arsenal of effective strategies for generating knowledge useful for solving real-world problems. This symposium inaugurates discussion of one such strategy – The Ecology of Place.”]

vi) Also on Thursday I think I’ll see what’s going on in the session; ‘Transcending Tradition to Understand and Model Complex Interactions in Ecology’. [“Ecology intersects with the study of complex systems, and our toolboxes must grow to meet interdisciplinary needs.”]

vii) Not sure about Friday yet…

Ensemble Modelling

Uncertainty is an inherent part of modelling. Models by definition are simplified representations of reality – as such they are all wrong. Being wrong doesn’t necessarily make them useless, but it helps to have some idea about how wrong they are for them to be most useful. That’s why we should always try to provide some means to assess the uncertainty in our models output. Producing multiple realisations of a model and its results – a model ensemble – is one way to do this.

Depending on our model we can use ensemble methods to examine four different sources of modelling uncertainty:

a) Model Structure: how appropriate are our model variables, relationships and rules?

b) Model Parameters: what numerical values appropriately represent the strengths of relationships between variables?

c) Initial Conditions: how does our uncertainty in the initial state of the system we are modelling propagate through the model to the output?

d) Boundary Conditions: how do alternative (uncertain) scenarios of events that perturb our model influence output?

In their recent paper, Arujo and New show how ensemble modelling might be used to assess the impacts of these different kinds of uncertainty on model output. They advocate the use of multiple models within an ensemble forecasting framework and argue that more robust forecasts can be achieved via the appropriate analysis of ensemble forecasts. This is important because projections between different models can be so variable as to compromise their usefulness for guiding policy decisions. For example, upon examining nine bioclimatic models for four South African plant species, Pearson et al. found that for different scenarios of future climate predicted changes in species distribution varied from 92% loss to 322% gain between the different models! It’s uncertainty like this that stifles debate about the presence and impacts of anthropogenic climate change. Araujo and New go on to discuss the uses and limitations of ensemble modelling for supporting policy decisions in biodiversity conservation.

In a previous post I discussed how Bayesian methods can be used to examine uncertainty in model structure. I’ve been using Bayesian Model Averaging to help me identify which are the most appropriate predictors of local winter white-tailed deer density for our UP Forest project. Using the relationships developed via that modelling process I’ve produced spatial estimates of deer density in northern hardwood stands for a section of our study area (example below).

Hopefully forest managers will find this sort of modelling useful for their planning (I’ll ask them sometime). However, I think this sort of product will be even more useful if I can provide the managers with a spatial estimate of uncertainty in the deer density estimates. This is important not only to emphasise that there is uncertainty in the model results generally, but also to highlight where (in the landscape) the model is more or less likely to be correct. Here’s the uncertainty map corresponding with the deer density estimate map above.

In this map the lighter colours (yellows and greens) indicate less certainty in the deer density estimate at that point. If managers were to take action in this landscape to reduce deer densities they could use a combination of the maps to find locations where deer densities are estimated to be high with low uncertainty.

To be more specific, the uncertainty map above is the standard deviation of 1,000 deer density estimate maps (correspondingly the deer density map is the mean of these 1,000 models). For each of the 1,000 deer density estimates I used slightly different model parameter values, each chosen with a certain probability. These 1,000 realisations are my model ensemble. The probability a value would be chosen for use as a parameter in any of the 1,000 models was specified by a (normal) probability distribution which came from the mean and standard deviation provided by the original Bayesian regression model. To produce the 1,000 models and sample their parameter values from a probability distribution I wrote my own program which made use of the standalone R math libraries built by Chris Jennings.

Appropriately representing and communicating uncertainty in model output is vital if models and modelling is to be useful for non-modellers (e.g., policy-makers, natural resource managers, etc.). Spatial ensemble modelling helps us to do this by identifying locations where we are more or less confident about our model output.

Bayesian Multimodel Inference

The abstract we submitted to the ESA Meeting was accepted a while back. Since we submitted it, Megan and I have been back in the field for some additional data collection and I’ve been doing some new analyses. Some of these new analyses are the result of my attendance at the Bayesian statistics workshop at US-IALE in Snowbird. Since then I’ve been learning more by picking the brain of a former advisor, George Perry, and doing a fair bit of reading (reading list with links at bottom). And of course, using my own data has helped a lot.

One of the main questions I’m facing, as many ecologists often do, is “which variables should be in my regression model?” This question lies at the core of model inference and assumes that it is appropriate to infer ecological process from data by searching for the single model that represents reality most accurately. However, as Link and Barker put it:

“It would be nice if there were no uncertainty about models. In such an ideal world, a single model would be available; the data analyst would be in the enviable position of having only to choose the best method for fitting model parameters based on the available data. The choice would be completely determined by the statistician’s theory, a theory which regards the model as exact depiction of the process that generated the data.

“It is clearly wrong to use the data to choose a model and then to conduct subsequent inference as though the selected model were chosen a priori: to do so is to fail to acknowledge the uncertainties present in the model selection process, and to incestuously use the data for two purposes.”

Thus, it usually more appropriate to undertake a process of multi-model inference and search for the ‘best’ possible model (given current data) rather than a single ‘true’ model. I’ve been looking into the use of Bayesian Model Averaging to address this issue. Bayesian approaches take prior knowledge (i.e., a probability distribution) and data about a system and combine them with a model to produce posterior knowledge (i.e., another probability distribution). This approach differs from the frequentist approach to statistics which calculates probabilities based on the idea of a (hypothetical) long-run of outcomes from a sequence of repeated experiments.

For example, estimating the parameters of a linear regression model using a Bayesian approach differs from a frequentist ordinary least squares (OLS) approach in two ways:

i) a Bayesian approach considers the parameter to be a random variable that might take a range of values each with a given probability, rather than being fixed with unknown probability,

ii) a Bayesian approach conditions the parameter estimate probability on the sample data at hand and not as the result of a set of multiple hypothetical independent samples (as the OLS approach does).

If there is little prior information available about the phenomena being modelled, ‘uninformative priors’ (e.g., a normal distribution with a relatively large variance about a mean of zero) can be used. In this case, the parameter estimates produced by the Bayesian linear regression will be very similar to those produced by regular OLS regression. The difference is in the error estimates and what they represent; a 95% confidence interval produced by a Bayesian analysis specifies that there is a 95% chance that the true value is within that interval given the data analyzed, whereas a 95% confidence interval from a frequentist (OLS) approach implies that if (hypothetical) data were sampled a large number of times, the parameter estimate for those samples would lie within that interval 95% of those times.

There has been debate recently in ecological circles about the merits of Bayesian versus frequentist approaches. Whilst some have strongly advocated the use of Bayesian approaches (e.g., McCarthy 2007), others have suggested a more pluralistic approach (e.g., Stephens et al. 2005). One of the main concerns with the approach of frequentist statistics is related to a broader criticism of the abuse and misuse of the P-value. For example, in linear regression models P-values are often used to examine the hypothesis that the slope of a regression line is not equal to zero (by rejecting the null hypothesis that is equal to zero). Because the slope of a regression line on a two-dimensional plot indicates the rate of change of one measure with respect to the other, a non-zero slope indicates that as one measure changes, so does the other. Consequently it is often inferred that a processes represented by one measure had an effect, or caused, the change in the other). However, as Ben Bolker points out in his excellent book:

“…common sense tells us that the null hypothesis must be false, because [the slope] can’t be exactly zero [due to the inherent variation and error in our data] — which makes the p value into a statement about whether we have enough data to detect a non-zero slope, rather than about whether the slope is actually different from zero.”

This is not to say there’s isn’t a place for null hypothesis testing using P-values in the frequentist approach. As Stephens et al. argue, “marginalizing the use of null-hypothesis testing, ecologists risk rejecting a powerful, informative and well-established analytical tool.” To the pragmatist, using whatever (statistical) tool available seems eminently more sensible than placing all one’s eggs in one basket. The important point is to try to make sure that the hypotheses one tests with P-values are ecologically meaningful.

Back to Bayesian Model Averaging (BMA). BMA provides a method to account for uncertainty in model structure by calculating (approximate) posterior probabilities for each possible model (i.e., combination of variables) that could be constructed from a set of independent variables (see Adrian Raftery’s webpage for details and examples of BMA implementation). The ‘model set’ is all possible combinations of variables (equal to 2n models, where n is the number of variables in the set). The important thing to remember with these probabilities is that it is the probability that the model is the best one from the model set considered – the probability of other models with variables not measured or included in the model set obviously can’t be calculated.

The advantage over other model selection procedures like stepwise regression is that the output provides a measure of the performance of many models, rather than simply providing the single ‘best’ model. For example, here’s a figure I derived from the output BMA provides:

The figure shows BMA results for the five models with highest posterior probability of being the best candidate model from a hypothetical model set. The probability that each model is the best in the model set is shown at top for each model – Model 1 has almost 23% chance that it is the best model given the data available. Dark blocks indicate the corresponding variable (row) is included in a given model – so Model 1 contains variables A and B, whereas Model 2 contains Variable A only. Posterior probabilities of variables being included in the best model (in the model set) are shown to the right of the blocks – as we might expect given that Variable A is present in the five most probable models it has the highest chance of being included in the best model. Click for a larger image.

BMA also provides a posterior probability for each variable being included in the best candidate model. One of the cool things about the variable posterior probability is that it can be used to produce a weighted mean value from all the models for each variable parameter estimate, each with their own Bayesian confidence interval. The weight for each parameter estimate is the probability that variable is present in the ‘best’ model. Thus, the ‘average’ model accounts for uncertainty in variable selection in the best candidate model in the individual parameter estimates.

I’ve been using these approaches to investigate the potential factors influencing local winter white-tailed deer density in in managed forests of Michigan’s Upper Peninsula. One of the most frequently used, and freely available, software packages for Bayesian statistics is WinBUGS. However, because I like to use R I’ve been exploring the packages available in that statistical language environment. Specifically, the BRugs package makes use of many OpenBUGS components (you actually provide R with a script in WinBUGS format to run) and the BMA package provides functionality for model averaging. We’re in the final stages of writing a manuscript incorporating these analyses – once it’s a bit more polished (and submitted) I’ll provide an abstract.

Reading List
Model Inference: Burnham and Anderson 2002, Stephens et al. 2005, Link and Barker 2006, Stephens et al. 2007

Introduction to Bayesian Statistics: Bolker 2007 [webpage with chapter pre-prints and exercises here], McCarthy 2007

Discussion of BMA methods: Hoeting et al. 1999, Adrian Raftery’s Webpage

Examples of BMA application: Wintle et al. 2003, Thomson et al. 2007

Criticisms of Stepwise Regression: James and McCulloch 1990, Whittingham et al. 2006

Cedar Swamps and Deer

Right now I should be back in East Lansing after a week of fieldwork in our Michigan Upper Peninsula (the UP) study area. We’ve been in the UP this last week to finish up on our mesic conifer planting and white-tailed deer density fieldwork that I’ve written about previously. However, an incident with a deer has delayed us (see the bottom of this post) so I’m doing some data entry and writing in Marquette while our Jeep is repaired.

In previous posts about the fieldwork we’ve done in the UP, I have included photos from forest stands containing deciduous hardwood species such as Sugar Maple or American Beech. Generally, it’s understood that white-tailed deer browse juveniles trees in hardwood stands during the daytime in the winter, but shelter overnight in nearby lowland conifer stands. One of the aspects of our project is to identify some quantitative relationships for this behaviour, and so we’ve often had take measurements in the cedar swamps adjacent to northern hardwood stands.

As you can see from the picture above, the density of cedar swamps can make tree measurements a bit tricky. A standard measure of forest stand density (or stocking) is ‘stand basal area’ – a measure of the area occupied by tree stems (i.e. trunks) in a given area. The northern hardwood stands in our study area can have a stand basal area of anywhere between 60 and 100 square feet per acre. Cedar swamps are much more densely populated, with stand basal area values of 280 to 350 square feet per acre. An example of the transition between these stand types is shown in the picture below (click for a larger image).

The high density of the cedar swamps combined with continual cover provided by the evergreen canopy (generally) make winter snow depths lower and winter air temperatures higher compared with the deciduous hardwood stands. The soggy conditions underfoot make surveying cedar swamps even trickier – one has to hop from tree-root island to tree-root island over puddles whilst trying not to impale oneself on the lower branches. Even with care given enough time you’re guaranteed scratches and wet boots.

We’ve completed our fieldwork for now and are just waiting for our Jeep to be fixed after we hit a deer on our last day of work. With so many deer in the area and the high number of miles we drive around our study area, it was only a matter time before we hit one. We were on a major highway and the deer came out of nowhere. We’ve often spooked deer driving on tracks through the forest – it seems to me that when they’re startled they just bolt in whatever direction they happen to be facing at the time. Even if that means running across the road in front of your vehicle. As you can see below, it left quite a dent in the radiator. But Megan did a good job of keeping us on the road and thankfully the only casualty was the deer.

ESA 2009 Abstract

February 2009 seems to be the month of abstracts. Here’s another we just submitted to the 94th Ecological Society of America Annual Meeting, the theme of which is Ecological Knowledge and a Global Sustainable Society.

Local winter white-tailed deer density: Effects of forest cover pattern, stand structure, and snow in a managed forest landscape
James D. A. Millington, Michael B. Walters, Megan S. Matonis and Jianguo Liu
Michigan State University

White-tailed deer (Odocoileus virginianus) are a ‘keystone herbivore’ with the potential to cause tree regeneration failure and greatly affect vegetation dynamics, stand structure and ecological function of forests across eastern North America. In northern mixed conifer-hardwood forests, local winter-time deer populations are dependent on habitat characterized by patterns of forest cover that provide shelter from snow and cold temperatures (lowland conifer stands) in close proximity to winter food (deciduous hardwood stands). Stand structure may also influence winter spatial deer distribution. Consequently, modification of forest cover patterns and stand structure by timber harvesting may affect local spatial deer distributions, with potential ecological and economic consequences. Here, we ask if forest cover pattern and stand structure, and their interactions with snow depth, can explain winter deer density in the managed forests of the central Upper Peninsula of Michigan, USA. For each local winter deer density estimate (from fecal pellet counts) we calculate stand-level characteristics for surrounding ‘landscapes of influence’ of radius 200 m and 380 m. For these data, and modeled snow depth estimates, we use multivariate techniques to produce predictive models and to identify the most important factors driving local deer densities across our 400,000 ha study area.

Distance to the nearest conifer stand consistently explains the most variance in univariate regression models. Deer densities are highest near lowland conifer stands in areas where the proportion of hardwood forest-cover is high but the mean tree diameter-at-breast-height is low. Multiple regression models including these factors explain up to 38% of variance in deer density and have up to a 68% chance of correctly ranking a site’s deer density (relative to other sites within our study area). We are unable to conclusively show that snow depth has a significant impact on winter deer density, but our data suggest that more detailed investigation into the combined effect of distance to lowland conifer and snow depth may prove fruitful. Our results quantify clear effects of stand structure and forest cover composition on the winter spatial distribution of white-tailed deer. We briefly discuss how these results can be used in an ecological-economic simulation model of a managed forest for tree regeneration risk assessment. Use of these results, and the simulation model, will help identify management practices that can decrease deer impacts and ensure the ecological and economic sustainability of forests in which deer browse is proving problematic for tree regeneration.