Are Google Alerts The Canary In The Big Data Mineshaft?

I remember the first time I actually got a Google Alert in my inbox. It was around 2008, and I had started my blog a while before. I signed up to get Google Alerts on my name mainly so I could see how fast Google indexed information. At the time, I noticed I got an alert within 3 hours of posting. It was a heady, strange type of thrill to know that Google had found me. A whoo hoo moment.

Then starting in 2009, as I started to write for Ad Age and HuffPo, I noticed that I got an alert within maybe five hours of publication. Often, it was the Alert that told me something had published since I often didn’t know. The whoo hoo moments came fast and furious as I started publishing a lot.

By 2010, as I continued to publish, I started to notice a real lag between the time I published and the timeliness of my Google Alerts. I’d say the lag had grown to about 12 hours before I got that coveted Alert in my inbox… a 5x time delay from 2008. In 2011, the lag times widened even more so that I was getting Alerts a full day later.

Probably by 2012 or so, the delay was now stretched out over several days so I didn’t even realize if I got the Google Alert or not. This brings us up to date. I note rather sadly that too often I don’t even get a Google Alert anymore, even if I publish in a widely distributed publication like HuffPo or Social Media Today.

It’s a sad milestone because in 2008 when I unknowingly embarked on my simple experiment, who’d have thought it could turn out to be a barometer on the health of the BIG Data mineshaft – the preverbal canary in the mineshaft.

There’s no question the industry is reeling from processing more and more data, so many people are relying on predictive technologies to crunch the numbers and spit out what you want even before you want it. The problem is they require GOOD big data to get it roughly right. Yet the volume of big data has grown so fast that lots of bad big data has easily permeated the system, contaminating the entire data mineshaft.

Here’s another simple, personal example. Recently a friend was on my Facebook page, and he accidentally clicked on an organic farm banner ad. Of course, within hours (ironic – no?), I started getting all sorts of gardening banner ads. I live in a tiny NYC apartment; gardening is not a top interest TBH. Yet in some data store somewhere, I have been tagged as a garden lover. It’s bound to happen over and over to lots and lots of people generating lots of bad big data that is hard to detect or scrub.

How does the predictive model predict the quirky, spontaneous nature of people who inconsistently clicks on things that are “unexpected?” Like the miners of yesteryear who used canaries as their early warning air safety system, could Google Alerts serve the same function in the data mineshaft? If my hunch is right – the continued deterioration in Google Alert deliveries is like our little canary, and it is gasping for air. Analyze that.