Saturday, 10 October 2015

Rachel Fewster - Modern statistical methods for Capture-Recapture

The reading group will meet  this Friday (16th October) at 1pm. We're going to read another paper on Capture-recapture using modern technology, this time with a more statistical slant. The paper is by Rachel Fewster and colleagues at University of Auckland. Rachel focuses on how new technologies, such as DAN barcoding, acoustic profiles, impact on the assumptions on the statistical models used in capture-recapture, and developing methods to allow better use of the data collected in this way. 

We will be reading a paper submitted to Statistical Science titled Trace-Contrast Models for Capture-Recapture without Capture Histories.


  1. I thought this paper was cool but some others struggled with it - I think that is because it assumes knowledge of inhomogeneous Poisson processes. So if you want to undersand the maths of it (e.g. where the likelihoods come from) it might help to start with a gentle intro to those (early stuff in Diggle's point process text, or a recent review in MEE maybe...).

    So there are three parameters to estimate - mu (number of families i.e. apple trees or rates), nu (average number of siblings i.e. apples or video captures) and sigma (how spread out siblings are i.e. size of apple tree or how long the rat typically hangs out near camerat trap). What we really care about is mu. Cool how we can estimate all these things from distances between points. I've been told to tweet a photo of which bits of Fig 1 right inform about which parameters. That idea was Tanaka, this paper is extending to a capture-recapture-type setting with marks (e.g. tags on rats and whether or not it took the bait).

  2. Looking at page 15 now and seeing that the probability model for the marks (whether each of a pair of sightings "take the bait", and if they share a tag) is not just a function of a bunch of new parameters to do with prob of being tagged etc, it is also a function of b(r) (probability they are siblings, "bro") where b(r) is a function of mu and nu. So in short, the marks are giving us some information about mu/nu. (Maybe also about sigma via lambda0?) This bit looked like it took a bit of thinking to derive though, and is quite model-specific: this would change when you go to a different problem or even same problem with different marks.

    We started thinking about how this relates to the wombat problem. So one way to go would be to treat the genetic data (similarities?) as r, i.e. genetics takes the role of spatial information, and the burrow you sample at becomes auxiliary, and you would need a model for how this auxiliary information informs about your parameters. Using the apple tree analogy, we ended up thinking of this as a bit like taking a box of apples from your local grocer, doing genetics on the apples, and trying to infer from this the number of apple trees in the orchard!?

  3. Now we are talking about metabarcoding and whether this stuff applies there. In metabarcoding a cluster analysis is done and a decision is made about which observations are sufficiently different to be classified as different taxa (operational taxonomic units, OTU's). Or maybe even to infer different individuals (?). So could these Palm process ideas be used in that context to identify individuals or even species maybe, mu becomes the estimator of mean species richness?

    Another, unrelated point - you can use this method in principle to get good estimates of population size in your field (number of apple trees in a field, number of rats passing the trap in the time period), if the data has the information to give you good estimates. When you go somewhere new (new field, rats in new time period) the estimate may not be so good because we are estimating a Poisson variable so have Poisson variation to worry about now.

    We also discussed assumptions: a potential limitation to application of the technique is that it makes some restrictive assumptions: we assume apple trees are uniformly distributed in the field, number of apples is Poisson, and distribution of apples around the tree is spherical and normal. But estimates from the model are probably robust to violations of some of these assumptions (the normality bit might the important one?).