Deep Web Tech Blog

Let’s take a look at some really BIG “big data”

A couple of months ago I came across the claim that we are generating 2.5 billion GB of new data every day and thought that I big-datashould write a fun little blog article about this claim. Here it is.

This claim, repeated by many, is attributed to IBM’s 2013 Annual Report. In this report, IBM claims that in 2012 2.5 billion GB of data was generated every day of which 80% of this data is unstructured and includes audio, video, sensor data and social media as some of the newer contributions to this deluge of data being generated. IBM also claims in this report that by 2015 1 trillion connected objects and devices will be generating data across our planet.

So how big is a billion GB? A billion GB is an Exabyte (a 1 followed by 18 zeros), i.e., 1000 petabytes or 1,000,000 terabytes.

My research took me to this article in Scientific American —  What is the Memory Capacity of the Human Brain? which I pursued to try to put into context all the huge numbers I’ve been throwing around. Professor of Psychology Paul Reber estimates that:

The human brain consists of about one billion neurons. Each neuron forms about 1,000 connections to other neurons, amounting to more than a trillion connections. If each neuron could only help store a single memory, running out of space would be a problem. You might have only a few gigabytes of storage space, similar to the space in an iPod or a USB flash drive. Yet neurons combine so that each one helps with many memories at a time, exponentially increasing the brain’s memory storage capacity to something closer to around 2.5 petabytes (or a million gigabytes).

So if I’m doing my math right, the 2.5 billion GB of information generated daily could be stored by 1,000 human brains and human memory capacity is still way higher than our electronic storage capacity.

My research then took me to this interesting blog article from 2015  – Surprising Facts and Stats About The Big Data Industry. Some of the facts and stats that I found most interesting in their infographic include:

  1. Google is the largest ‘big data’ company in the world, processing 3.5 billion requests per day, storing 10 Exabytes of data.
  2. Amazon hosts the most servers of any company, estimated at 1,400,000 servers with Google and Microsoft close behind.
  3. Amazon Web Services (AWS) are used by 60,000 companies and field more than 650,000 requests every second. It is estimated that 1/3 of all Internet users use a website hosted on AWS daily, and that 1% of all Internet traffic goes through Amazon.
  4. Facebook collects 500 terabytes of data daily, including 2.5 billion pieces of content, 2.7 billion likes and 300 million photos.
  5. 90% of all the data in the world was produced in the last 2 years.
  6. It is estimated that 40 Zettabytes (40,000 Exabytes) of data will be created by 2020.

Another interesting infographic on how much data was generated every minute in 2014 by some of our favorite web applications is available at – Data Never sleeps 2.0.

I would be remiss if I didn’t tie my blog article to what we do at Deep Web Technologies. So please take a look at our marketing piece – Take on Big Data & Web Sources with Deep Web Technologies. We’d love to hear from you and explore how we can feed content and data from a myriad of disparate sources to your big data analytics engine on the back end as well as explore how we can enhance the insights derived by your big data solutions by providing real-time access to content that complement these insights.