What is Dark Data?
Consider the example of the 1986 Challenger shuttle disaster. Should the shuttle launch, despite fears that low temperatures might weaken the “o-ring” seals? A graph of seven previous occasions when the o-rings had been stressed showed no relationship between temperature and the degree of damage.
Alas, what was missing was the data from all the launches where there had been no damage at all: in each case, air temperatures were higher. On viewing all the data, a clear relationship was visible.
“Dark data are data you do not have,” writes Hand. “They might be data you thought you had, or hoped to have, or wished you had. But they are data you don’t have. You might be aware that you do not have them, or you might be unaware.”
This book, then has some commonalities with Caroline Criado Perez’s Invisible Women, a book about systematic failure to collect data about women or issues that might be of particular relevance to women. Hand’s book is a shade more technical, but that is not the only difference: as the Challenger example makes clear, there are many different kinds of missing data, and many reasons why we might fail to have them. (The most mundane of all: time series data when some of the data points lie in the future…)
I enjoyed this book a lot; it is well-written and accesible, although mostly aimed at practioners.
My NEW book The Next Fifty Things That Made the Modern Economy is NOW OUT. Details, and to order on Hive, Blackwells, Amazon or Watersones. Bill Bryson comments, “Endlessly insightful and full of surprises — exactly what you would expect from Tim Harford.”
Receive these posts by email
(You can unsubscribe at any time)