Big data is a concept that came to prominence in the 1990s in response to the massive increase in the size of datasets at the time, attributed to the growth of the internet and the rapidly declining price of data storage. While you might think of big data as terabytes of data, the term generally means more than simply "large in size." Big data differs from traditional data in that it’s almost always a combination of both structured and unstructured information, which requires new methods of processing and analysis in order to generate actionable insights that can incite strategic decision making.
Big data can be drawn from structured, unstructured or semi-structured datasets, but the real value is realized when these various data types are pulled together — in fact, its value is contingent upon both the amount and variety. Big data can come from just about anywhere, from a business’s sales and production records to public databases to social media feeds. Finding innovative ways to uncover patterns and correlations among these various data sources is the most essential function of a data scientist or big data analyst.
Big data analytics is a complex category that requires a substantial level of skill and training to master, along with comprehensive data management platforms. Tools such as Apache Hadoop, Storm and Spark are invaluable for processing massive amounts of data, finding personnel skilled in using these tools can be difficult (and costly) in a market that is hungry for the insights that big data can provide. But while many of these tools are democratizing big data efforts, they still have a long way to go before becoming wholly accessible. Thus, a key advancement for organizations with copious amounts of data is Map Reduce technology, which addresses this issue by helping organizations achieve value from their information in near real time.
In this article, we’ll look at the characteristics of big data, some of the most common use cases for big data, the tools essential for managing it, and best practices for starting a big data program in the enterprise.