"Yellow grease" - the waste resulting from deep frying prawns and french fries and the kind - is yucky substance. In a large city, with thousands of restaurants, there are literally tons of it created every day. And restaurants need to get rid of it somehow.
An easy way to dispose of yellow grease is to tip it down a manhole. Needless to say, this is illegal. In sewage system the stuff creates a horde of problems: it blocks drains, smells foul, and can even cause fires, as explained in a current Financial Times article "Why grease is the word in New York" by Gillian Tett.
Restaurants do have legal options. In New York City it means using licensed disposal companies. But many restaurants choose to go with the manhole approach. It costs less and the risk of getting caught is small. The city has had little means to catch the rule breakers: too many restaurants, too many manholes, not enough inspectors.
But enter Big Data. By analyzing the data that already could be found from city's numerous databases (quoting the article):
"..the results were striking: suddenly they could spot which restaurants were likely to be dumping grease, and the inspectors’ success in catching offenders soared."
The article gave credit to the new profession of "data scientist", who deep dived into and mined the city's 60 or so data sources. The article went on…:
"Combining these databases was nightmarishly hard; in the matter of yellow grease, for example, information on pollution, manholes, restaurants’ licences and waste companies were in different (incompatible) files."
…however, is this really primarily just about "big data" or "data science"? I'd argue this is more of an example of a traditional Master Data Management topic. Yes, there is a lot of big data involved, requiring skills in predictive analytics etc, while combing through massive sets of data. But the work starts with solving the baseline of having solid foundations about Master Data about manholes, restaurants, disposal licenses, and so on.
Indeed, the article recognizes this, explaining that the "geeks must be empowered to break down departmental silos". Also, what Ms. Tett well recognizes, this is difficult. It is about human processes of crossing the silos, arriving at common definitions on which to build the next steps.
- - -
The nexus between big data and MDM is getting more attention. In an excellent book published in January 2013 by Sunil Soares, "Big Data Governance: An Emerging Imperative", the role of Master Data Management gets its due attention. The connection between the two is mirrored even to selecting Chief Research Officer Aaron Zornes from MDM Institute to write the foreword for the book.
I see this as a healthy development: emphasizing that "big data" is not a separate island of expertise or separate set of technologies, but rather it is built on top of many Data Management disciplines.
The challenge for the practitioners is that amidst all the marketing hype around "big data", it is not always easy to remind that the realization of business case often starts from simple things.
Like knowing where the manholes are.