Speaking Science Soon: The thing about Nate Silver

I know, I know, I hear you now: “go find your own favorite statistician!” The thing is, Nate Silver is not only brilliant with numbers, but he also has thoughtful, well-developed ideas and he articulates them well.¹ Silver knows his stuff, and he makes it fun to learn about. If you haven’t read any of his blogs or his book², you are in for a treat. He gets me all riled up about the fact that we have some much information yet so little knowledge. He’ll get you excited about figuring out how to make sense of heaps of data. He’ll show you how awesome numbers are.

It’s like math orgasm.

Obviously Nate gets lots of bonus points for being not only a numbers guy but ALSO a words guy. In the intro to his book, he talks you through the etymology of “prediction” (Latin roots, with a superstition-kinda-fortune-telling connotation) and “forecast” (Germanic roots, implying prudent planning despite uncertain conditions). Forecasting is definitely progress, when that is actually what we are doing, since it means using what we already have to make more educated guesses. And yet, we’re still addicted to prediction, even though we’re embarrassingly inept.

I wonder if part of what will make us better forecasters is recognizing how biased our predictions are by our own confidence. As Silver puts it, “Human beings have an extraordinary capacity to ignore risks that threaten their livelihood, as though this will make them go away.” We all do it, and to a large extent I think this can be a positive thing. I mean, let’s face it, if we really stopped to think about every last obstacle that could possibly derail our plans, it would be much more difficult to get up with a smile each day. It’s nice to feel able and self-sufficient, and there is good evidence that those who approach tasks – intellectual or athletic ones – with confidence are those who often end up more successful. No one likes arrogance, though, and it’s a fine line that divides the confident from the arrogant.

Nate comments at one point that “most of our strengths and weaknesses as a nation – our ingenuity and our industriousness, our arrogance and our impatience – stem from our unshakable belief in the idea that we choose our own course.” And we should be better at controlling our fate in this data-rich era, right? Maybe not.

“We think we want information when we really want knowledge.”

With modern technology, we have access to more, but unfortunately “if the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost certainly isn’t. Most of it is just noise, and the noise is increasing faster than the signal. There are so many hypotheses to test, so many data sets to mine – but a relatively constant amount of objective truth.” Last week, we talked a bit in our summer journal club about the possibility of completely “null fields”; this is the idea that there are whole branches³ of science in which we are looking for relationships that simply don’t exist. It’s a big question, to be sure, and the distinction between statistical significance and genuine relevance is of obvious importance. We’re a bunch of twenty-something-year-olds talking through this stuff, and we’re not quite presumptuous enough to come to any conclusions, but it’s interesting to think about.

Another major issue we’re up against is the necessity for building assumptions into our models. A question as simple as “are these events independent or not?” can change your whole analysis of risk. Mathematically, it becomes obvious right away what a dramatic difference the extent of the dependence can make⁴. In the real world, we don’t always know if the events are independent or not; further, even when we know they’re not independent it’s pretty difficult to quantify how related they are. “Risk” is the stuff that’s quantifiable: “I have a 1/20 chance of selecting the blue.” Uncertainty, though, is the risk that is hard to measure. When we build models and forget to admit just how much uncertainty there really is, we end up forecasting in a range that might be too narrow. It’s tricky to keep track of the difference between what we know and what we think we know, seeing as we don’t know which is which. This partially explains why we have a nasty habit of making predictions that are absurdly precise but not at all accurate.

Too often, we convince ourselves that certain bad outcomes are simply not possible. Let’s remember what Douglas Adams said in The Hitchhiker’s Guide to the Galaxy: “The major difference between a thing that might go wrong and a thing than cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.”

So…that’s that for today. I’m pretty excited about math, I think staying humble makes us better forecasters, and I’m curious to hear what you think about statistics in the Real World!

¹That last part, that part about expressing them...it’s not trivial. No one really wants to talk to a mathematician or a scientist who speaks their own language and can’t use normal words.

²In 2012, Silver published his first book, The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t, and he currently writes the blog FiveThirtyEight as part of The New York Times, available at http://fivethirtyeight.blogs.nytimes.com; unless otherwise noted, quotes in this post are from his book.

³or sub-disciplines of sub-branches or whatever…don’t want to offend anyone

⁴If you have independent events, you simply multiply the probability of each event occurring to find the probability that they all occur simultaneously, i.e. P(A and B and C) = P(A)*P(B)*P(C). However, if the occurrence of some of the events depends on the others, you have to know how likely the second event is to occur given that the first already has, i.e. P(A and B) = P(A)*P(B|A). There are lots of great examples we could talk about using anything from economics to medical diagnosis, but you already get the idea.

Speaking Science Soon

Sunday, June 30, 2013

The thing about Nate Silver

No comments:

Post a Comment

About the author

Blog Archive