Monday, November 15, 2010

Statistical Models of Bird Identification

A recent post on BIRDWG01 particularly piqued my interest because it ran counter to my viewpoints on the statistical nature of bird identification. I think likelihood is both implicitly used and also formally overlooked. In this case, an Aythya (diving) duck of unusual appearance was seen in the Eastern USA and proposed to be a Pochard X Scaup as opposed to what the final ID appeared to settle on, which was Canvasback X Greater Scaup. (Scaup in Great Britain is the Greater Scaup in the U.S.A.).

All duck hybrids are rare to very rare, but many of them certainly occur (an even more recent WG01 thread shows a nice selection of Goldeneye hybrids). Well I guess that the Mallard hybrids with American Black Duck, Mottled Duck and Mexican Duck are more frequent than "very rare".

For any bird ID the likelihood that it's correct is basically the ratio of the probability that the species displays those characteristics vs the probability that some/all other species displays those characteristics, weighted by the chance that you'd see the species in that spot. Much of bird ID stops before the comma in that last sentence, but that's naive.

So for this duck let's arbitrarily say that the chance of seeing a hybrid is 1/10,000. If you see a Canvasback then there's a 1/10,000 chance of it really being a hybrid, ignoring for a moment the evidence for hybrid traits (i.e. an "unusual Canvasback" based on appearance). Let's say this 1/10,000 holds true for the similar Aythyas on the UK side of the Atlantic. The fundamental issue is, the chances of seeing a Pochard or Tufted Duck or Scaup on the East Coast of the USA is very low. Pochard is essentially unknown here although some stray to the west coast from Asia, Tufted Duck is very rare, and Greater Scaup vs nominate Scaup from Europe is impossible to distinguish in the field. It's safe to say that very, very few diving ducks make that transatlantic voyage. But let's say that there's a 1/20,000 chance of the Aythya you come across in some pond in NJ being one of Pochard/Tufted/"European" Scaup. This means that the chance that you find a European PochardXScaup hybrid is (1/20,000)X(1/10,000) or 1/200,000,000.

One in 200 million. The odds of winning the Powerball lottery grand prize from one ticket is pretty much the same. I do not recommend this as a retirement strategy.

We can haggle over any of the numbers - actually I think that the real numbers are more remote odds than these ones. Simply plug your own estimates in and follow the bouncing ball.

Now, the chance that you'd stumble into a hybrid Aythya from the USA side of the Atlantic is only 1/10,000 so the relative chance that any hybrid is European is less long odds: a mere 1/20,000. This means that for any bird that you're sure is one of those rare Aythya hybrids there's a 99.995% chance it's going to be USA-origin.

For most of us, ID is done with far less certainty - we see an American Robin fly by and call it that because it's shape best fits American Robin and the color we can see best fits American Robin so the overwhelming probability is that it's an American Robin and not, say, an atypical-looking Varied Thrush. Or a Redwing (Turdus iliacus). When you're working with common vs rare species, or even rare vs very rare species, that occurrence ratio of 1/20,000 is quite a strong selector between different IDs. You would have to be very, very certain indeed that the fly-by bird is NOT an American Robin before Varied Thrush even becomes a reasonable probability.

So it is with the hybrid Aythya. For any possible hybrid, there's a range of appearance that is "typical" of the hybrids. Hybrids are notoriously variable so the range of appearance is often broader than for a pure species (think Golden-winged Warbler and Blue-winged Warbler vs the common hybrids of Lawrence's and Brewster's Warbler, both of which show quite a range of variability). Even if a hybrid were judged to be 80% likely to be Pochard X Scaup and 20% Canvasback X Gtr Scaup by appearance, when you add the population weighting in, it becomes a 99.925% Canvasback X Gtr Scaup by likelihood. Or put another way: you would have to be absolutely certain it could only be Pochard X Scaup based on the bird appearance before it became anything close to even odds on the overall population-weighted chances you'd got that ID right.

I.E. You should use population to weight your idea of what the bird is. Failure to do that is a failure to take proper account of birds of a common species showing an atypical appearance, which might be a far more likely explanation than a much rarer bird.

This subject is not new to WG01 - a more formal discussion of this using Bayes theorem was bounced around in October 2007 regarding the identity of a Catharus thrush, although the post I linked to uses the dangerous territory of race rather than thrush ID points.

However I think this aspect of bird ID is still under-appreciated, and quite an important one when attempting to put labels on strange-looking birds.

No comments: