by Corinne Keet
Recently the NYT ran an article “What Waze Users Can Tell Us About Thanksgiving Travel Patterns.” The title immediately made me suspicious that one pitfalls of “big data,” lack of generalizability, might be at play. According to the article, the most traveled day is not the Wednesday before Thanksgiving, at least as measured by miles traveled by Wazers in the NYC area, but instead Thanksgiving day itself.
Is this true?
Maybe not. There are lots of reasons that Wazers might differ from the total population of drivers, thereby limiting the conclusions that can be drawn. Wazers might be more likely to have professional jobs and therefore work through the Wednesday before Thanksgiving. They might be more concerned about avoiding traffic on a known busy travel day. There might be other ways that they are different that aren’t as obvious. All of these factors could result in a biased estimate of travel patterns.
The take-home message here is that big data – even enormous data – can be systematically wrong. Having this data does not obviate the need for sound scientific thinking – principles of biostatistics, epidemiology, and the scientific method may be needed now more than ever in the era of big data.