Rant: statistics is actually for something
For most of my working life I was an experimental scientist or an engineer of some sort. I got used to dealing with statistical methods and, over time, developed a feel for what you could do with statistics. I always assumed that other people had the same view of statistics that I did: statistical analysis was a tool we could used to find out interesting and important things. Over the years, I've come to realize I was wrong. A huge number of people either (a) have no idea what statistics is about at all, or (b) assume that it's a set of mechanical procedures that you have to apply to show you've done your job properly. Neither position is productive.
I've always had a few articles on my website about statistics and experimental design, and sometimes people read these articles, and come to believe -- usually wrongly -- that I can help them with their statistical problems. The reason I can't help readers with their statistics is because they haven't grasped the idea that statistics is for something, that it has a purpose. I blame the editors of scientific journals for this: they seem to think that nothing is publishable without a ream of incomprehensible Kolmogrov-Smirnov test results, or whatever. Whether I'm right or wrong, the idea seems to have grown that statistics is something you do if you have to, but otherwise is to be avoided. This thinking is allied with that other trite platitude:
You can make statistics say anything you like
To which my reply is usually:
Not to me, you can't!
In this short article I want to explain the single most important thing about statistics. There's no maths, Greek symbols, or jargon, just -- I think -- common sense. It doesn't matter whether you know or care about statistics. If fact, I've know people who've spent years working out t-tests and confidence intervals, and still don't grasp the core philosophy of the subject.
Statistics isn't just important to scientists: it's important to everyone. Even if you'll never use statistics in your working life; even if you have no interest at all in the subject; even if you can barely count to ten on your fingers; even then you still need to understand what follows. If I can get this message across, then even if I achieve nothing else in my life -- and, let's face it, that's looking increasingly likely -- I will still have left the world a slightly better place than I found it.
Statistics is important because we can use it to figure out whether something we observe can be applied to new and different situations.
Knowing this allows us to plan for the future, and to make decisions about how to allocate our scarce resources of money, energy and, ultimately, life. In statistics we use the term `generalisable': an observation is generalisable exactly if it can be used to predict what will happen in new and different situations. If it isn't generalisable, it can't. So what's statistics for? It's for determining whether an observation is generalisable or not. It's really as simple as that.
OK, so it doesn't sound all that earth-shattering, does it? Let me try to illustrate it with an example. Here's a conversation that you might hear in a bar, about smoking cigarettes, and its effect on your health. One person -- let's call him Bob -- might say:
Smoking can't be as bad as they say. I mean, look at my family: my dad had four brothers, and they all used to smoke four packs a day. And the youngest of them is eighty. My friend John, poor bastard -- he's just had a lung removed, because it was full of cancer. And he's never smoked in his life. It just goes to show, doesn't it?
What's wrong with Bob's statement? If you've never been exposed to statistics or experimental science, maybe you're thinking there's something in what Bob says -- perhaps you have an uncle/grandmother/sister who smokes like a chimney, and is fit as a fiddle at ninety.
On the other hand if you are, or have been, a statistics user, perhaps you're wondering whether Bob's defined his null hypothesis, and how you're going to work out a p-value. Both of these standpoints, while understandable, miss the point entirely. What matters is whether Bob's remarks generalize.
Well, do they? The fact is that we don't know. We can't know, because we don't have enough information.
Even if Bob's claims are completely accurate -- even he really does have a bunch of chain-smoking, perfectly healthy ageing relatives -- there just aren't enough of them to allow us to draw safe conclusions. Bob's family all have certain things in common: they have the same or similar ethnicity, similar age, similar socio-economic position, similar schooling, similar diet, and so on. That's how families work. We don't all share those characteristics.
Maybe one of them -- age, ethnicity, whatever -- accounts for why Bob's family tolerate smoking so well. Or maybe it's just, well, good luck. There are 60 million people in the UK -- there's an excellent change we'll find four or five people in the same family who enjoy robust health despite smoking, just by luck.
It should be fairly obvious that Bob's family doesn't amount to a representative sample of the population. We simply can't use them as a guide to whether we should put public funds into smoking prevention campaigns, or even whether I, myself, can smoke safely. Even if the effect Bob reports isn't just chance, we still don't know what it is about Bob's family that's relevant.
A properly-designed scientific study, with sensible statistical treatment, would allow us to tease out the different factors -- age, gender, ethnicity, and so on -- and determine which are relevant and which are not. It would tells us how likely it is, that what Bob observed was simply chance.
With such a study, we would get a more realistic view of the effect of smoking on the population as a whole. Then, and only then, we'd be getting towards a result that generalises. Then we'd be able to judge, with some confidence, whether we ought to take the risk of smoking or not. We could make plans. Without statistics, you may as well plan for the future using a ouija board or a weather vane.
Statistics is about looking at the world, and the things that happen in it, finding generalizable results, and figuring out what best to do next. Ultimately statistics isn't about numbers, or t-tests, or sampling theory, it's about living our lives effectively. And that, I believe, is important.
Not understanding this is what irks me a little, when people ask me to do their statistics for them. They seem to have lost sight, or never known, that statistics actually has a purpose; it's not just a thing you have to do to get your paper published.
Published 2026-02-24, updated 2026-02-24
Categories
education rant gemlogConverted from my Gemini capsule.