Wednesday 12 November 2014


Big Data is the term used for huge quantities of collected data and the methods used to analyse and make sense of it all.  The problem really arose when the Sloane Digital Sky Survey began its redshift study in 2001 and started to collect data from the whole sphere of space surrounding us.  Within a few weeks, it had acquired more information on the stars it observed than had ever been collected in the history of astronomy and the team running it were at a loss as to how to deal with the information. The Large Hadron Collider at CERN in Switzerland has 150 million sensors that collect data 40 million times every second in its attempts to find new and exotic sub-atomic particles and understand how matter is made up.  Both of these examples illustrate how the world of data collection and analysis has to adapt to accommodate increasingly large amounts of information, and the challenges that it represents.

With sizable chunks of information now becoming commonplace in science, medicine, Defence, and manufacturing, industries have to change how they collect and analyse data that they are collecting. This is a factor that is receiving increasing attention from the Oil & Gas industries where exploration either on land or undersea, may require the collection and sifting of huge amounts of data in order to pinpoint potential new resources.  Big Data can be characterised by a number of specific features;

Volume – Relating to the quantity of data produced, the volume is the size variable that relates to the overall size of the collected information.  It is estimated that US consumer giant Walmart deals with over a million customer transactions every hour and runs stock and consumer information databases that are in the order of two and a half petabytes, or 2560 Terabytes, of information in size, which is a pretty big volume to keep on top of.
Variety – Variety is used to catergorise the received information and may relate to database entries, chemical data, statistical information or physical aspects such as size, position, movement etc. The greater the variety, the more analysis that needs to be carried out to reconcile the information and understand it.
Velocity – This parameter dictates the speed with which the data is amassed.  While some data sets may be large they may have been gathered over many years, others may be obtained as a huge mass of data that then requires analysis.  It is expected that the Large Synoptic Survey Telescope will amass 140 terabytes of information every five days, all of which will need to be sifted and catergorised.
Veracity – This refers to the quality of the obtained information.  A geological survey may capture many terabytes of information regarding the make-up of rock formations and gas analysis data, but if it is of insufficiently quality then there is little point in analysing it. Data needs to be of the best quality, which unfortunately tends to increase its size.

Added to these factors is another called “Complexity” which dictates how data management and analysis can become a very intensive process, especially when large volumes of data come from multiple sources and at high velocity. Big Data analysis is now being utilised in an increasing number of fields and experts try to understand the data coming in from many different sources.  Knowing something is one thing but actually understanding what the data is telling you is key to getting the most from it, and helping us advance in many areas.

Written exclusively for Enigma Consulting Group

Check out more of what we can do for you at www.enigma-cg.com

0 comments:

Post a Comment