Can We Just Stop Calling It BIG DATA …
- Oct 4, 2018 4:30 pm GMT
- 652 views
Call me a Luddite, or whatever, if you will but I think the expiration date on the shelf life of Big Data has come and gone and we should be removing the term from our vocabulary. On almost a daily basis I receive an email announcing a Big Data conference, a new article on Big Data or an upcoming webinar on Big Data. OK, I admit I’ve written at least one article on Big Data and have probably thrown the term around a little bit in my consulting work. But when looking at the agenda for the conferences, reading the articles or listening to the webinars it seems that the concept of Big Data has become an amorphous blob far removed from what is generally considered its original definition.
At this point I’m not even sure if Big Data is a noun (‘we have a lot of Big Data’ – from the Department of Redundancy), an adjective (‘Big Data architecture’) or a verb (‘We are going to Big Data that problem’ – just kidding on this one).
The concept of big data or at least that there is a lot of data floating around goes back at least to the 1940’s when there were articles published discussing the amount of data or information in libraries such as the Library of Congress and the continued growth of that data. With the advent of computers, the awareness of the increasing volume of data was recognized. The term ‘Information Overload’ came into being with a somewhat negative connotation. Somehow technology evolved, from a storage and processing standpoint, and we continued to survive the growth of data.
The term Big Data was probably first used in an article in 1997 which addressed the growth of unstructured data. Subsequently in 2001, Doug Laney, then an analyst at Meta, published an article which identified what became known as the three V’s (Volume, Velocity, Variety) as characteristics or dimensions of Big Data. Interestingly, Laney did not use the term Big Data in his article. Instead his focus was on what he saw as a data management challenge. While mentioning the need for newer technologies to address the challenge, he also provided several options for dealing with the data including the use of tiered storage, evaluating the data for actual usefulness and keeping a statistically valid amount of the data rather than retaining all of it.
Apparently three V’s were not enough. Soon came the five V’s, then ten V’s and with a little research an article about the thirty V’s can be found. However, the additional V’s have not really helped in the differentiation of Big Data from Little Data or any other size data. For example, V for value applies to any data; either data has value or it does not. Similarly, V for veracity applies to all data in the same way.
The idea that Big Data required new technologies or could not be processed with conventional methods beyond what was used for Little Data or Normal Size Data came after the initial V’s. My suspicion is that, as is usual in the technology world, there was as much marketing hype surrounding that need as there was actual need.
Recently I listened to an informative webinar which started with the point that Big Data tools and techniques were used. I’m not exactly sure what those are or how they are specific to Big Data. R and Python were mentioned although those can be used for any data and Python is a general purpose programming language. So Big Data tools? Not really. The primary tool that was used was a statistical package that has been around as long as I can remember. The technique was primarily predictive analytics. Again, not something that applies to Big Data alone. The data involved also would not be considered big by any of the definitions of Big Data.
Many webinars and presentations focus on the use of data from AMI which provides significant value in several areas. But does AMI data really qualify as Big Data? Most of the focus is on the volume. Admittedly it is a lot more rows of data than was obtained from monthly meter readings. There is also additional data such as voltage and power off and on messages. But does that really count as Variety? All of these are straightforward, limited attribute (what I call long, skinny rows of data), structured messages. Going along with the Volume is the Velocity which is again much greater by a factor of 720-3,000 or more relative to monthly meter readings. During a large outage the Volume and Velocity of power off messages can also be significant. But have either of those overwhelmed current technology for either operational or analytic purposes? Most utilities handle this data just fine using traditional database technology such as Oracle and integration technologies designed to handle the Volume and Velocity.
Similarly, there are utility analytic or smart grid conferences which tout a Big Data agenda. (OK, another truth in advertising, I have moderated such a conference). Yet in looking at the topics such as data architecture, data governance, data strategy, uses cases, building competencies, etc., they again apply to any size data. The analytic techniques covered in presentations including predictive analytics, machine learning, artificial intelligence (don’t get me started on this apparently brand new concept), etc. equally are not just applicable to Big Data and in fact most have been around before Big Data was a concept.
So, what has happened here? As noted above it seems that the term Big Data has become an amorphous blob swallowing up all things analytics. Or maybe, to use another analogy, it is like peanut butter that has been spread on everything analytics. In any case, I think that the term Big Data should be folded, spindled and mutilated and take its place in the bit bucket of outdated technology terms.
OK, I feel better now.