Machine-Generated Data by Example
Last November, Infobright CEO Don Deloach discussed the topic of Machine Generated Data (MGD). As a cornerstone database for MGD, Infobright provides community members and customers the opportunity to understand trends in their voluminous data while maintaining very low total cost of ownership. Right now, there is a discussion over what's included in MGD; regardless on your school of thought (the Curt Monash School or the Daniel Abadi School), machine-generated data includes all data that is automatically generated without direct human intervention. While both schools are "close enough" in their definition, I'd like to take it one step further. As a person who learns more by example, let me illustrate the real difference between Machine-Generated Data and Human-Generated Data. Kind of like identifying an illness through symptoms, the following may help indicate the 'presence' of machine-generated data.
-
Growth
-
Machine-Generated Data: very fast as the machine is built to record many bits of information for each segment in time.
-
Human-Generated Data: slow growth as a human is required to enter the data and submit
-
Updates
-
Machine-Generated Data: Rarely (if ever) will the dataset be updated. For example, would you really change the high temperature recorded for 1-May-2010? It is what it is.
-
Human-Generated Data: constant updating or morphing. For example, when you moved, how many subscription databases needed to be updated, so you could receive your newspaper and magazines?
-
Retrieval
-
Machine-Generated Data: Typically used to understand trends in the dataset. For example, which machines may be wearing out? What's the avg energy consumption in houses in the smart-grid?
-
Human-Generated Data: Typically used to pull up specific information. For example, when you call your tech support center at Dell or HP, they pull up everything specific about you (name, address, gender, pant size, dogs name, etc.)
-
Note: of course, you can always retrieve either data set in an analytic or all-column extraction. However, most customers use their data in different ways.
-
Generation
-
Machine-Generated Data: Somewhere, some person created some script/process/application/etc to cause a computer/machine/sensor to log information regarding an event. For example, SIEM logs or call data records.
-
Human-Generated Data: A wo/man entered data into some fields and submitted. For example, updating inventory or changing addresses.
To help illustrate the path and growth of Machine-Generated vs Human-Generated, I've included our view of the data world. With Infobright's powerful Brighthouse engine and knowledge grid, many community members and customers look to Infobright to help understand trends in their machine-generated data.

Post Comment