Summer 2011 Papers

Since I last posted, THREE of the papers I’ve mentioned here previously have become available online! Here are their details, abstracts are below. Email me if you can’t get hold of them yourself.

Millington, J.D.A., Walters, M.B., Matonis, M.S., Laurent, E.J., Hall, K.R. and Liu, J. (2011) Combined long-term effects of variable tree regeneration and timber management on forest songbirds and timber production Forest Ecology and Management 262 718-729 doi: 10.1016/j.foreco.2011.05.002

Millington, J.D.A. and Perry, G.L.W. (2011) Multi-model inference in biogeography Geography Compass 5(7) 448-530 doi: 10.1111/j.1749-8198.2011.00433.x

Millington, J.D.A., Demeritt, D. and Romero-Calcerrada, R. (2011) Participatory evaluation of agent-based land use models Journal of Land Use Science 6(2-3) 195-210 doi:10.1080/1747423X.2011.558595

Millington, J.D.A. et al. (2011) Combined long-term effects of variable tree regeneration and timber management on forest songbirds and timber production Forest Ecology and Management 262 718-729
The structure of forest stands is an important determinant of habitat use by songbirds, including species of conservation concern. In this paper, we investigate the combined long-term impacts of variable tree regeneration and timber management on stand structure, songbird occupancy probabilities, and timber production in northern hardwood forests. We develop species-specific relationships between bird species occupancy and forest stand structure for canopy-dependent black-throated green warbler (Dendroica virens), eastern wood-pewee (Contopus virens), least flycatcher (Empidonax minimus) and rose-breasted grosbeak (Pheucticus ludovicianus) from field data collected in northern hardwood forests of Michigan’s Upper Peninsula. We integrate these bird-forest structure relationships with a forest simulation model that couples a forest-gap tree regeneration submodel developed from our field data with the US Forest Service Forest Vegetation Simulator (Ontario variant). Our bird occupancy models are better than null models for all species, and indicate species-specific responses to management-related forest structure variables. When simulated over a century, higher overall tree regeneration densities and greater proportions of commercially high value, deer browse-preferred, canopy tree Acer saccharum (sugar maple) than low-value, browse-avoided subcanopy tree Ostrya virginiana (ironwood) ensure conditions allowing larger harvests of merchantable timber and had greater impacts on bird occupancy probability change. Compared to full regeneration, no regeneration over 100 years reduces merchantable timber volumes by up to 25% and drives differences in bird occupancy probability change of up to 30%. We also find that harvest prescriptions can be tailored to affect both timber removal volumes and bird occupancy probability simultaneously, but only when regeneration is adequate. When regeneration is poor (e.g., 25% or less of trees succeed in regenerating), timber harvest prescriptions have a greater relative influence on bird species occupancy probabilities than on the volume of merchantable timber harvested. However, regeneration density and composition, particularly the density of Acer saccharum regenerating, have the greatest long-term effects on canopy bird occupancy probability. Our results imply that forest and wildlife managers need to work together to ensure tree regeneration density and composition are adequate for both timber production and the maintenance of habitat for avian species over the long-term. Where tree regeneration is currently poor (e.g., due to deer herbivory), forest and wildlife managers should pay particularly close attention to the long-term impacts of timber harvest prescriptions on bird species.

Millington, J.D.A. and Perry, G.L.W. (2011) Multi-model inference in biogeography Geography Compass 5(7) 448-530
Multi-model inference (MMI) aims to contribute to the production of scientific knowledge by simultaneously comparing the evidence data provide for multiple hypotheses, each represented as a model. With roots in the method of ‘multiple working hypotheses’, MMI techniques have been advocated as an alternative to null-hypothesis significance testing. In this paper, we review two complementary MMI techniques – model selection and model averaging – and highlight examples of their use by biogeographers. Model selection provides a means to simultaneously compare multiple models to evaluate how well each is supported by data, and potentially to identify the best supported model(s). When model selection indicates no clear ‘best’ model, model averaging is useful to account for parameter uncertainty. Both techniques can be implemented in information-theoretic and Bayesian frameworks and we outline the debate about interpretations of the different approaches. We summarise recommendations for avoiding philosophical and methodological pitfalls, and suggest when each technique is best used. We advocate a pragmatic approach to MMI, one that emphasises the ‘thoughtful, science-based, a priori’ modelling that others have argued is vital to ensure valid scientific inference.

Millington et al. (2011) Participatory evaluation of agent-based land use models Journal of Land Use Science 6(2-3) 195-210
A key issue facing contemporary agent-based land-use models (ABLUMs) is model evaluation. In this article, we outline some of the epistemological problems facing the evaluation of ABLUMs, including the definition of boundaries for modelling open systems. In light of these issues and given the characteristics of ABLUMs, participatory model evaluation by local stakeholders may be a preferable avenue to pursue. We present a case study of participatory model evaluation for an agent-based model designed to examine the impacts of land-use/cover change on wildfire regimes for a region of Spain. Although model output was endorsed by interviewees as credible, several alterations to model structure were suggested. Of broader interest, we found that some interviewees conflated model structure with scenario boundary conditions. If an interactive participatory modelling approach is not possible, an emphasis on ensuring that stakeholders understand the distinction between model structure and scenario boundary conditions will be particularly important.

Multi-Model Inference in Biogeography

Earlier this month I, along with George Perry, finished a review on multi-model inference for the journal Geography Compass. Geography Compass publishes state-of-the-art reviews aimed at students, researchers and non-specialist scholars. The manuscript is currently under review so we’ll have to see what the reviewers think before the paper is published. Once it’s available I’ll re-post here. In the meantime, here’s the abstract to whet your appetite;

Multi-model inference (MMI) aims to produce scientific knowledge by simultaneously comparing the evidence data provide for multiple hypotheses, each represented as a model. Stemming from the method of ‘multiple working hypotheses’, MMI techniques have been advocated as an alternative to null hypothesis significance testing. These techniques will likely be particularly useful in research fields such as biogeography where formal experimentation is difficult and data are often influenced by uncontrolled factors. In this paper we review two complementary MMI techniques – model selection and model averaging – and highlight examples of their use in the biogeography literature. Both techniques can be implemented in a Bayesian framework and we outline the debate about different interpretations of probability. We summarise recommendations for avoiding philosophical and methodological pitfalls, and suggest circumstances in which each technique will likely be most useful in. We finish by advocating a pragmatic approach to MMI, one that emphasises the ‘thoughtful, science-based, a priori’ modelling others have argued is vital to ensure valid scientific inference.

Bayesian Modelling in Biogeography

Recently I was asked to write a review of the current state-of-the-art of model selection and Bayesian approaches to modelling in biogeography for the Geography Compass journal. The intended audience for the paper will be interested but non-expert, and the paper will “…summarize important research developments in a scholarly way but for a non-specialist audience”. With this in mind, the structure I expect I will aim for will look something like this:

i) Introduction to the general issue of model inference (i.e., “What is the best model to use?”). This section will likely discuss the modelling philosophy espoused by Burnham and Anderson and also highlight some of the criticisms of null-hypothesis testing using p-values. Then I might lead into possible alternatives (to standard p-value testing) such as:

ii) AIC approaches (to find the ‘best approximating model’)

iii) Bayesian approaches (including Bayesian Model Averaging, as I’ve discussed on this blog previously)

iv) Some applied examples (including my deer density modelling for example)

vi) A brief summary

I also expect I will try to offer some practical hint and tips, possibly using boxes with example R code (maybe for the examples in iv). Other published resources I’ll draw on will likely include the excellent books by Ben Bolker and Michael McCarthy. As things progress I may post more, and I’ll be sure to post again when the paper is available to read in full.

Bird Occupancy Modelling

Birds have been given short shrift in my posts blog posts about the Michigan UP ecological-economic modelling project. It’s not that we have forgotten about them, it’s just that before we got to incoporating them into our modelling there were other things to deal with first. Now that we’ve made progress on modelling deer distribution it’s time to turn our attention to how we can represent the potential impacts of forest management on bird habitat so that we might better understand the tradeoffs that will need to be negotiated to achieve both economic and ecological sustainability.

Ovenbird (Seiurus aurocapillus)
Ovenbird (Seiurus aurocapillus)

One of the things we want to do is link our bird-vegetation modelling with Laila Racevskis‘ assessment of the economic value of bird species she did during her PhD research. Laila assessed local residents’ willingess-to-pay for ensuring the conservation of several bird species of concern in our study area. If we can use our model to examine the effects of different timber management plans (each yielding different timber volumes) on the number of bird species present in an area we can use Laila’s data to examine the economic tradeoffs between different management approaches. The first thing we need to do to achieve this is be able to estimate how many bird species would be present in a given forest stand.

Right now the plan is to estimate the presence of songbird species of concern in forest stands by using the data Ed Laurent collected during his PhD research at MSU. To this end I’ve been doing some reading on the latest occupancy modelling approaches and reviewing the literature on its application to birds in managed forests. Probably the most popular current approach was developed recently by Darryl Mackenzie and colleagues – it allows the the estimation of whether a site is occupied by a given species or not when we know that our detection is imperfect (i.e. when we know we have false negative observations in our bird presence data). The publication of some nice overviews of this approach (e.g. Mackenzie 2006) plus the development of software to perform the analyses are likely to be at the root of this popularity.

The basic idea of the approach is that if we are able to make multiple observations at a site (and if we assume that bird populations and habitat do not change between these observations) we can use the probability of each bird observation history at a site across all the sites to form a model likelihood. This likelihood can then be used to estimate the parameters using any likelihood-based estimation procedure. Covariates can be used to model both the probability of observation and detection (i.e. we can account for factors that may have hindered bird observation such a wind strength or the time of day). I won’t go into further detail here because there’s an excellent online book that will lead you through the modelling process, and you can download the software and try it yourself.

Two recent papers have used this approach to investigate bird species presence given different forest conditions. DeWan et al. 2009 used Mackenzie’s occupancy modelling approach to examine impacts of urbanization on forest birds in New York State (they do a good job of explaining how they apply Mackenzie’s approach to their data and study area). DeWan considered landscape variables such as perimeter-area ratios of habitat patches and proximity to urban area to create occupancy models for 9 birds species at ~100 sites. They found that accounting for imperfect bird detection was important and that habitat patch “perimeter-area ratio had the most consistent influence on both detection probability and occupancy” (p989).

In a slightly different approach Smith et al. 2008 estimated site occupancy of the black-throated blue warbler (Dendroica caerulescens) and ovenbird (Seiurus aurocapillus) in 20 northern hardwood-conifer forest stands in Vermont. At each bird observation site they had also collected stand structure variables including basal area, understory density and tree diameters (in contrast to DeWan et al who only considered landscape-level variables). Smith et al. write their results “demonstrate that stand-level forest structure can be used to predict the occurrence of forest songbirds in northern hardwood-conifer forests” (p43) and “suggest that the role of stand-level vegetation may have been underestimated in the past” (p36).

Our approach will take the best aspects from both these studies; the large sample size of DeWan et al. with the consideration of stand-level variables like Smith et al. More on this again soon I expect.

Synthetic Trees

When testing and using simulation models we often need to use synthetic data. This might be because we want to examine the effects of different initial conditions on our model output, or simply because we have insufficient data to examine a system at the scale we would like to. The ecological-economic modelling project I’m currently working on is in both these situations, and over the last week or two I’ve been working on generating synthetic tree-level data so that we can initialize our model of forest stand change for testing and scenario development. Here’s a brief overview of how I’ve approached the task of producing a ‘treelist generator’ from the empirical data we have for over 60,000 trees in Northern Hardwood stands across Upper Michigan.

One of the key measures we can use to characterise forest stands is basal area (BA). We can assume that for each stand we generate a treelist for there is some ‘target BA’ that we are aiming to produce. As well as hitting a target BA, we also need to make sure that the tree diameter-at-breast-height (DBH) size-class distribution and species composition are representative of the stands in our empirical data. Therefore, our the first step is to look at the diameter size-class distribution of the stands we want to emulate. We can do this by plotting histograms of the frequency of trees of different diameter for each stand. In the empirical data we see two characteristic distributions (Fig 1).

Fig 1. Example stand tree count histograms

The distribution on the left has very many more trees in the smaller size classes as a result of stands self-thinning (as larger trees compete for finite resources). The second distribution, in which the smallest size classes are under-represented and larger size classes have relatively more trees, does not fit so well with the theoretical, self-thinning DBH size-class distribution. Stands with a distribution like this have probably been influenced by other factors (for example deer browse on the smaller trees). However, it turns out that both these DBH size-class distributions can be pretty well described by the gamma probability distribution (Fig 2).

Fig 2. Example stand gamma probability distributions for data in Fig 1

The gamma distribution has two parameters, a shape parameter we will call alpha and a scale parameter we will call beta. Interestingly, in the stands I examined (dominated by Sugar Maple and Ironwood) there are two different linear relationships between the parameters. The relationship between alpha and beta for 80% of stands represents the ‘self-thinning’ distribution, and the other 20% represent distributions in which small DBH classes are under-represented. We use these relationships – along with the fact that the range of values of alpha for all stands has a log-normal distribution – to generate characteristic DBH size-class distributions;

  1. sample a value of alpha from a normal distribution (subsequently reconvert using 10alpha),
  2. for the two different relationships use Bayesian linear regression to find mean and 95% credible intervals for the slope and intercept of a regression line between alpha and beta,
  3. use the value of alpha with the regression parameters to produce a value of beta.

So now for each stand we have a target basal area, and parameters for the DBH size class distribution. The next step is to add trees to the stand with diameters specified by the probability distribution. Each time we add a tree, basal area is added to the stand. The basal area for a tree is calculated by:

TreeBA = TreeDensity * (0.005454* diameter2)

[Tree density can be calculated for each tree because we know the sampling strategy used to collect empirical data on our timber cruise, whether on a fixed area plot, n-tree or with a prism].

Once we get within 1% of our target BA we stop adding trees to the stand [we’ll satisfy ourselves with a 1% accuracy because the size of tree that we allocate each time is sampled from a probability distribution and so we it is unlikely we will be able to hit our target exactly]. The trees in our (synthetic) stand should now (theoretically) have the appropriate DBH size-class distribution and basal area.

With a number of trees in now in our synthetic stand, each with a DBH value, the next step is to assign each tree to a species so that the stand has a representative species composition. For now, the two species we are primarily interested in are Sugar Maple and Ironwood. However, we will also allow trees in our stands to be Red Maple, White Ash, Black Cherry or ‘other’ (these are the next most common species in stands dominated by Sugar Maple and Ironwood). First we estimate the proportion of the trees in each species. In stands with Sugar Maple and Ironwood deer selectively browse Sugar Maple, allowing Ironwood a competitive advantage. Correspondingly, in the empirical data we observe a strong linear and inverse relationship between the abundance of Sugar Maple and Ironwood (Fig 3).

Fig 3. Relationship between stand Sugar Maple and Ironwood abundance

To assign species proportions we first estimate the proportion of Sugar Maple from the empirical data. Next, using the strong inverse relationship above we estimate the corresponding proportion of Ironwood (sampled using normal distribution with mean and standard deviation from from Bayesian linear regression). The remaining species proportions are assigned according to the frequency of their presence in the empirical data.

Now we use these proportions to assign a species to individual trees. Because physiology varies between species, the probability that a tree is of a given size also varies between species. For example, Ironwood very seldom reach DBH greater than 25 cm and the vast majority (almost 99% in our data) are smaller than 7.6 cm (3 inches) in diameter. Consequently, first we assign the appropriate number Ironwood to trees according to their empirical size-class distribution, before then assigning all other trees to the remaining species (using a uniform distribution).

The final step in generating our treelist is to assign each tree a height and a canopy ratio. We do this using empirical relationships between diameter and height for each species that are available in the literature (e.g. Pacala et al. 1994). And we’re done!

In the model I’m developing, these stands can be assigned a spatial location either using a pre-existing empirical map or using a synthetic land cover map with known characteristics (generated for example using the modified random clusters method, as the SIMMAP 2.0 software does). In either case we can now run the model multiple times to investigate the dynamics and consequences of different initial conditions. More on that in the future.

Ensemble Modelling

Uncertainty is an inherent part of modelling. Models by definition are simplified representations of reality – as such they are all wrong. Being wrong doesn’t necessarily make them useless, but it helps to have some idea about how wrong they are for them to be most useful. That’s why we should always try to provide some means to assess the uncertainty in our models output. Producing multiple realisations of a model and its results – a model ensemble – is one way to do this.

Depending on our model we can use ensemble methods to examine four different sources of modelling uncertainty:

a) Model Structure: how appropriate are our model variables, relationships and rules?

b) Model Parameters: what numerical values appropriately represent the strengths of relationships between variables?

c) Initial Conditions: how does our uncertainty in the initial state of the system we are modelling propagate through the model to the output?

d) Boundary Conditions: how do alternative (uncertain) scenarios of events that perturb our model influence output?

In their recent paper, Arujo and New show how ensemble modelling might be used to assess the impacts of these different kinds of uncertainty on model output. They advocate the use of multiple models within an ensemble forecasting framework and argue that more robust forecasts can be achieved via the appropriate analysis of ensemble forecasts. This is important because projections between different models can be so variable as to compromise their usefulness for guiding policy decisions. For example, upon examining nine bioclimatic models for four South African plant species, Pearson et al. found that for different scenarios of future climate predicted changes in species distribution varied from 92% loss to 322% gain between the different models! It’s uncertainty like this that stifles debate about the presence and impacts of anthropogenic climate change. Araujo and New go on to discuss the uses and limitations of ensemble modelling for supporting policy decisions in biodiversity conservation.

In a previous post I discussed how Bayesian methods can be used to examine uncertainty in model structure. I’ve been using Bayesian Model Averaging to help me identify which are the most appropriate predictors of local winter white-tailed deer density for our UP Forest project. Using the relationships developed via that modelling process I’ve produced spatial estimates of deer density in northern hardwood stands for a section of our study area (example below).

Hopefully forest managers will find this sort of modelling useful for their planning (I’ll ask them sometime). However, I think this sort of product will be even more useful if I can provide the managers with a spatial estimate of uncertainty in the deer density estimates. This is important not only to emphasise that there is uncertainty in the model results generally, but also to highlight where (in the landscape) the model is more or less likely to be correct. Here’s the uncertainty map corresponding with the deer density estimate map above.

In this map the lighter colours (yellows and greens) indicate less certainty in the deer density estimate at that point. If managers were to take action in this landscape to reduce deer densities they could use a combination of the maps to find locations where deer densities are estimated to be high with low uncertainty.

To be more specific, the uncertainty map above is the standard deviation of 1,000 deer density estimate maps (correspondingly the deer density map is the mean of these 1,000 models). For each of the 1,000 deer density estimates I used slightly different model parameter values, each chosen with a certain probability. These 1,000 realisations are my model ensemble. The probability a value would be chosen for use as a parameter in any of the 1,000 models was specified by a (normal) probability distribution which came from the mean and standard deviation provided by the original Bayesian regression model. To produce the 1,000 models and sample their parameter values from a probability distribution I wrote my own program which made use of the standalone R math libraries built by Chris Jennings.

Appropriately representing and communicating uncertainty in model output is vital if models and modelling is to be useful for non-modellers (e.g., policy-makers, natural resource managers, etc.). Spatial ensemble modelling helps us to do this by identifying locations where we are more or less confident about our model output.

Bayesian Multimodel Inference

The abstract we submitted to the ESA Meeting was accepted a while back. Since we submitted it, Megan and I have been back in the field for some additional data collection and I’ve been doing some new analyses. Some of these new analyses are the result of my attendance at the Bayesian statistics workshop at US-IALE in Snowbird. Since then I’ve been learning more by picking the brain of a former advisor, George Perry, and doing a fair bit of reading (reading list with links at bottom). And of course, using my own data has helped a lot.

One of the main questions I’m facing, as many ecologists often do, is “which variables should be in my regression model?” This question lies at the core of model inference and assumes that it is appropriate to infer ecological process from data by searching for the single model that represents reality most accurately. However, as Link and Barker put it:

“It would be nice if there were no uncertainty about models. In such an ideal world, a single model would be available; the data analyst would be in the enviable position of having only to choose the best method for fitting model parameters based on the available data. The choice would be completely determined by the statistician’s theory, a theory which regards the model as exact depiction of the process that generated the data.

“It is clearly wrong to use the data to choose a model and then to conduct subsequent inference as though the selected model were chosen a priori: to do so is to fail to acknowledge the uncertainties present in the model selection process, and to incestuously use the data for two purposes.”

Thus, it usually more appropriate to undertake a process of multi-model inference and search for the ‘best’ possible model (given current data) rather than a single ‘true’ model. I’ve been looking into the use of Bayesian Model Averaging to address this issue. Bayesian approaches take prior knowledge (i.e., a probability distribution) and data about a system and combine them with a model to produce posterior knowledge (i.e., another probability distribution). This approach differs from the frequentist approach to statistics which calculates probabilities based on the idea of a (hypothetical) long-run of outcomes from a sequence of repeated experiments.

For example, estimating the parameters of a linear regression model using a Bayesian approach differs from a frequentist ordinary least squares (OLS) approach in two ways:

i) a Bayesian approach considers the parameter to be a random variable that might take a range of values each with a given probability, rather than being fixed with unknown probability,

ii) a Bayesian approach conditions the parameter estimate probability on the sample data at hand and not as the result of a set of multiple hypothetical independent samples (as the OLS approach does).

If there is little prior information available about the phenomena being modelled, ‘uninformative priors’ (e.g., a normal distribution with a relatively large variance about a mean of zero) can be used. In this case, the parameter estimates produced by the Bayesian linear regression will be very similar to those produced by regular OLS regression. The difference is in the error estimates and what they represent; a 95% confidence interval produced by a Bayesian analysis specifies that there is a 95% chance that the true value is within that interval given the data analyzed, whereas a 95% confidence interval from a frequentist (OLS) approach implies that if (hypothetical) data were sampled a large number of times, the parameter estimate for those samples would lie within that interval 95% of those times.

There has been debate recently in ecological circles about the merits of Bayesian versus frequentist approaches. Whilst some have strongly advocated the use of Bayesian approaches (e.g., McCarthy 2007), others have suggested a more pluralistic approach (e.g., Stephens et al. 2005). One of the main concerns with the approach of frequentist statistics is related to a broader criticism of the abuse and misuse of the P-value. For example, in linear regression models P-values are often used to examine the hypothesis that the slope of a regression line is not equal to zero (by rejecting the null hypothesis that is equal to zero). Because the slope of a regression line on a two-dimensional plot indicates the rate of change of one measure with respect to the other, a non-zero slope indicates that as one measure changes, so does the other. Consequently it is often inferred that a processes represented by one measure had an effect, or caused, the change in the other). However, as Ben Bolker points out in his excellent book:

“…common sense tells us that the null hypothesis must be false, because [the slope] can’t be exactly zero [due to the inherent variation and error in our data] — which makes the p value into a statement about whether we have enough data to detect a non-zero slope, rather than about whether the slope is actually different from zero.”

This is not to say there’s isn’t a place for null hypothesis testing using P-values in the frequentist approach. As Stephens et al. argue, “marginalizing the use of null-hypothesis testing, ecologists risk rejecting a powerful, informative and well-established analytical tool.” To the pragmatist, using whatever (statistical) tool available seems eminently more sensible than placing all one’s eggs in one basket. The important point is to try to make sure that the hypotheses one tests with P-values are ecologically meaningful.

Back to Bayesian Model Averaging (BMA). BMA provides a method to account for uncertainty in model structure by calculating (approximate) posterior probabilities for each possible model (i.e., combination of variables) that could be constructed from a set of independent variables (see Adrian Raftery’s webpage for details and examples of BMA implementation). The ‘model set’ is all possible combinations of variables (equal to 2n models, where n is the number of variables in the set). The important thing to remember with these probabilities is that it is the probability that the model is the best one from the model set considered – the probability of other models with variables not measured or included in the model set obviously can’t be calculated.

The advantage over other model selection procedures like stepwise regression is that the output provides a measure of the performance of many models, rather than simply providing the single ‘best’ model. For example, here’s a figure I derived from the output BMA provides:

The figure shows BMA results for the five models with highest posterior probability of being the best candidate model from a hypothetical model set. The probability that each model is the best in the model set is shown at top for each model – Model 1 has almost 23% chance that it is the best model given the data available. Dark blocks indicate the corresponding variable (row) is included in a given model – so Model 1 contains variables A and B, whereas Model 2 contains Variable A only. Posterior probabilities of variables being included in the best model (in the model set) are shown to the right of the blocks – as we might expect given that Variable A is present in the five most probable models it has the highest chance of being included in the best model. Click for a larger image.

BMA also provides a posterior probability for each variable being included in the best candidate model. One of the cool things about the variable posterior probability is that it can be used to produce a weighted mean value from all the models for each variable parameter estimate, each with their own Bayesian confidence interval. The weight for each parameter estimate is the probability that variable is present in the ‘best’ model. Thus, the ‘average’ model accounts for uncertainty in variable selection in the best candidate model in the individual parameter estimates.

I’ve been using these approaches to investigate the potential factors influencing local winter white-tailed deer density in in managed forests of Michigan’s Upper Peninsula. One of the most frequently used, and freely available, software packages for Bayesian statistics is WinBUGS. However, because I like to use R I’ve been exploring the packages available in that statistical language environment. Specifically, the BRugs package makes use of many OpenBUGS components (you actually provide R with a script in WinBUGS format to run) and the BMA package provides functionality for model averaging. We’re in the final stages of writing a manuscript incorporating these analyses – once it’s a bit more polished (and submitted) I’ll provide an abstract.

Reading List
Model Inference: Burnham and Anderson 2002, Stephens et al. 2005, Link and Barker 2006, Stephens et al. 2007

Introduction to Bayesian Statistics: Bolker 2007 [webpage with chapter pre-prints and exercises here], McCarthy 2007

Discussion of BMA methods: Hoeting et al. 1999, Adrian Raftery’s Webpage

Examples of BMA application: Wintle et al. 2003, Thomson et al. 2007

Criticisms of Stepwise Regression: James and McCulloch 1990, Whittingham et al. 2006