The reading group will meet this Friday (30th October) at 1pm. We're going to read another paper on analyzing data along stream networks, this time from an ecological perspective, by Bruce Chessman. Bruce is interested in conservation of freshwater species and human impacts on fresh waters.
We will be reading a paper from Journal of Applied Ecology titled "Do protected areas benefit freshwater species?A broad-scale assessment for fish in Australia’sMurray–Darling Basin"
The first question was "what is a Poisson test"! This is a term the average statistician doesn't use, but from memory (going back to a thrid year ecology assignment) I think it is where you have a bunch of counts and you test for difference in means across counts using a Pearson stat (sum (O-E)^2/E) which I think is called Poisson because this can be understood as saying the variance of a count is its mean, E. This depends critically on the assumption of independence of each unit being counted (in this case, I think species pres/absence in protect/unprotect).
ReplyDeleteWe also discussed the propensity scoring thing - it can more-or-less be described as trying impose some balance in a highly unbalanced setting by kicking out observations to get towards something more balanced. By "balanced" we mean trying to get protected and unprotected sites covering a similar range of environments, by matching pairs of sites with similar environments and kicing out any value without a matching pair. But most sites didn't have a match so this lost a lot of info. This seemed like a key idea of the paper, and a fairly original one (? in an applied ecology context) - how to compare protected and unprotected areas controlling for the substantial differences in environments.
One question was why not include the environmental variables as covariates in the analysis - probably because when the envrionments differ so much, results will be sensitive to misspecification of the model for response (abundance, richness) as a function of covariates. But after matching you could still include these covariates, matching should have largely resolved this.
Is protection a binary variable or are there different levels of protection that could be included in the model? Should time be in the model, if different samples were taken at quite different times of year?
Abundances are often highly overdispersed - that might have been one of the problems here, we have a quite variable response, so a big sample size is needed? Any water quality data to feed into the model?
How could this connect with the stream network spatial stats paper last week - would it help to get a better answer if we account for (stream-based) spatial correlation?