I was recently asked this question, "How do you define big data?" My response was "I say big data defines you."
With that in mind, let me expand on my definition. To do this, I would like to digress and say that big data in definition is exactly that; it is a big amount of data. (Current industry discussions add Velocity and Variety, and sometimes Value, meaning how fast it needs to be analyzed to provide value). Data is by definition information in some format or structure, so "big" must be analogous for amount. Yet my definition of big data goes a bit past this. The big data industries that have formed over the last decade focus on many different aspects of it; some deal with the infrastructure to house and manage, some deal with the software, some are in the analytic and business intelligence world, some are the consultants, and numerous companies focus on marketing, education and conferences on the subject. Within each segment of the industry, definitions may vary a little, but there are still some very key attributes that remain the same.
One, there are vast amounts of data which present unique challenges. Two, above and beyond the tasks of storing, managing and analyzing, working with this data is a primary goal (otherwise, why store it.) Three, we look for insights that we can extract from this information to enable us to make decisions that will define the way we act upon the information. For instance, a police chief in Pennsylvania knew that by analyzing criminal behavior and patterns within them, he would be able to reduce crime in a geographic area by implementing a higher police presence at specific times. The data and information that he extracted defined his response. Whether someone is researching a large number of publications and extracting relevant topics on specific articles, or they are gathering sales information for yearly trends during the rise and fall of economic windfalls, the process is still the same: we store, we search, we analyze, we define.
As a programmer myself, I have dealt with the various aspects of the infrastructure surrounding data. It is the service- and data-oriented interfaces that provide the vehicle to deliver information in a meaningful way. I have always focused on ensuring that the end user was able to get to the data that they were looking for, otherwise the data is almost useless. Data's reason for existing is solely the value of that information and what can be extracted from it. One of the most exciting aspects of this industry for me was reading about the different use cases for our analytic database and the very unique approaches that different organizations took to solving complex problems in simple elegant steps. Browse to http://www.infobright.com and read about how the Canadian Space Agency is using Infobright to store and read their machine-generated data, or any one of the JDSU or telecom-related whitepapers.
At the end of the day, we define our actions by what we learn.