Big Data is the term used for huge quantities of collected data and the methods used to analyse and make sense of it all. The problem really arose when the Sloane Digital Sky Survey began its redshift study in 2001 and started to collect data from the whole sphere of space surrounding us. Within a few weeks, it had acquired more information on the stars it observed than had ever been collected in the history of astronomy and the team running it were at a loss as to how to deal with the information. The Large Hadron Collider at CERN in Switzerland has 150 million sensors that collect data 40 million times every second in its attempts to find new and exotic sub-atomic particles and understand how matter is made up. Both of these examples illustrate how the world of data collection and analysis has to adapt to accommodate increasingly large amounts of information, and the challenges that it represents.
With
sizable chunks of information now becoming commonplace in science, medicine,
Defence, and manufacturing, industries have to change how they collect and
analyse data that they are collecting. This is a factor that is receiving increasing
attention from the Oil & Gas industries where exploration either on land or
undersea, may require the collection and sifting of huge amounts of data in
order to pinpoint potential new resources.
Big Data can be characterised by a number of specific features;
Volume – Relating to the
quantity of data produced, the volume is the size variable that relates to the
overall size of the collected information.
It is estimated that US consumer giant Walmart deals with over a million
customer transactions every hour and runs stock and consumer information databases
that are in the order of two and a half petabytes, or 2560 Terabytes, of
information in size, which is a pretty big volume to keep on top of.
Variety – Variety is used to
catergorise the received information and may relate to database entries,
chemical data, statistical information or physical aspects such as size,
position, movement etc. The greater the variety, the more analysis that needs
to be carried out to reconcile the information and understand it.
Velocity – This parameter
dictates the speed with which the data is amassed. While some data sets may be large they may
have been gathered over many years, others may be obtained as a huge mass of
data that then requires analysis. It is
expected that the Large Synoptic Survey Telescope will amass 140 terabytes of
information every five days, all of which will need to be sifted and
catergorised.
Veracity – This refers to the quality of the
obtained information. A geological
survey may capture many terabytes of information regarding the make-up of rock
formations and gas analysis data, but if it is of insufficiently quality then there
is little point in analysing it. Data needs to be of the best quality, which
unfortunately tends to increase its size.
Added to these factors is another called “Complexity” which dictates how data management and analysis can become a very intensive process, especially when large volumes of data come from multiple sources and at high velocity. Big Data analysis is now being utilised in an increasing number of fields and experts try to understand the data coming in from many different sources. Knowing something is one thing but actually understanding what the data is telling you is key to getting the most from it, and helping us advance in many areas.
Check out more of what we can do for you at www.enigma-cg.com