Contact Us

How Big Is Big Data? – from Archiving to Analysing

Sep 12, 2015 3 min

Amazon has unveiled a cloud server that offers 2 terabytes (i.e. 2000 gigabytes) of memory. If a typical laptop’s RAM is 4Gb then the new server has memory equivalent to 500 of them. Perhaps you’re thinking ‘what could you do with all that power?’ Well, let me tell you what I’m thinking. It’s ‘Yay! Finally, something that we can do things within the public cloud market! Sure, you can get thousands of terabytes of hard drive power from the cloud, but it’s just too slow. Amazon’s new server offers power and speed!

One of today’s buzzwords is big data. For me, big data isn’t only about the amounts of information involved but also about the ability to process and analyze it from multiple angles. It’s a new era; we’ve moved from electronic data processing to the information technology age. It means all that data hidden in archives and basements could be made available for business to process. All of it.

My question is: how big is big data? We can split the question into content and storage technology which lead us to technical measures. I’ll consider them together as they are related.

To put this in context, I consider big data to encompass collecting and including all the available data from a single domain. In theory the content depth has no limit, however, some data is easily accessible but gradually data collection gets harder and more expensive.

Once you have collected the data it helps to store it in an efficient way, for instance within a columnar database. How you store your data determines technical size and metrics such as how much memory and how many processor cores you need. More memory and processors equals more money.

During the last quarter, we attached an additional 14 terabytes (14 000Gb) of memory and 1400 processor cores to RELEX’s in-memory cloud. The most powerful single server that we tested during that time contained 12 terabytes of memory and 480 cores. Memory and processor cores are for analysis purposes and there is a large quantity of slower hard-drive space available in the background.

My answer to the question ‘how big is big data?’ is this: Big data is as much content as you can collect and as much as you are able to implement in terms of technology, given your budget. The more data you have to crunch and the fewer technical obstacles there are to your analysing it, the more valuable information your business can squeeze out of big data.

But memory is forever getting cheaper and processors have more cores. The limitations for big data technology are not down to hardware cost anymore. The reason why organisations might not make full use of this explosion of calculation capacity isn’t to be found in the hardware. If there is a reason for using inefficient hardware it might be the fact that the pricing of some software licenses reflect technology-capacity, whereby costs might, for instance, skyrocket when users add thousands of gigabytes of memory and hundreds of processor cores. The other possibility is that the software used is not designed or configured to make the best use of parallel calculation capacity and handle large data sets of memory.

If your business is struggling with software that’s running frustratingly slow you should take time to identify the real problem. Are the operations really so complex that the slow speed cannot be avoided? Or is there something in your hardware and software stack that is preventing you making the most effective use of all that power available. The first layer in the stack is powerful hardware and powerful hardware just isn’t that expensive anymore.

Written by

Tapio Pitkäranta

Former CTO