Technically Speaking

The Official Bigstep Blog

Is Your Data Lake an Episode of Hoarders: Buried Alive?

Limited offer! Discover the first Full Metal Data Lake as a Service in the world. Get 1TB free for life - limited to 100 applicants. Start here.

Have you ever seen the American television show ‘Hoarders: Buried Alive?’ If not, it’s an interesting sight to behold. The show chronicles homeowners who keep everything they can get their hands on, regardless of its worth or significance. Some people hoard a particular item, such as milk cartons or bottle tops. Others just hoard anything and everything they can get their hands on: items bought from shopping channels, collectibles, old clothing, and even garbage.

According to some experts, businesses have started data hoarding, and they’re eventually going to find themselves ‘buried alive’. Data hoarding is the act of collecting and storing as much data as possible from as many sources as possible, even when there is no clear or established purpose for it.

As is normally the case, there are two fields of thought among IT professionals: those who say, ‘Toss it! It’s worthless!’ and those who claim, ‘All data is useful data when you find the right use for it.’ So, should you start tossing or keep saving?

The Argument Against Data Hoarding


It’s only hoarding when you don’t have an established purpose for the data. While you might not know its potential uses now, it’s always possible you’ll find a good use for it later.

The school of thought that calls for businesses to get data-lean claim that historical data is essentially useless. What data is good for is real-time analysis. Historical data has nothing to offer, so just purge your databases and get a clean, fresh start.


Full Metal Data Lake - Exabyte-scale storage for big data. Get 1TB free for life!

The Argument for Data Hoarding

‘Ah!’ cries the opposing camp. ‘All data is valuable.’ These people have some compelling evidence, too. Historical data can lend valuable insight that could never be gleaned from real-time data. For example, one grocery store chain analyzed a full 15-years’ worth of data on their shoppers. This period of time included two significant economic recessions. From that data, the grocer was able to establish the buying habits of consumers as the economy falls into recession and as it emerges into more prosperous times. This helps the grocery stores keep their valued customers in hard times by offering the right products, brands, coupons, and other things people turn to when money is scarce.

Additionally, why not keep your data? After all, memory and cloud storage is at an all-time low. Just as the cravings for big data grow severe, the market responds with affordable storage aplenty.

Tools for Leveraging More Data

So, you’re convinced. You want to keep all the data you can, sure in the fact that your data analysts will eventually find a use for it. There are lots of tools to help you keep your data under control, so that data sprawl doesn’t consume your organization’s IT resources.

• Cloudera Navigator - One of the tools that has been developed to help organizations stay on top of data governance issues, Navigator includes metadata management as well as security auditing.
• Apache Atlas - A product of relative newcomer Hortonworks, Atlas is not yet proven, but shows promise by many of its testers. It has been accepted into the Apache Incubator, which bodes well for its potential.
• Data Lake - Data lakes aren’t unknown and unproven like most of the tools used to manage enormous sets of data. Data lakes are now available ‘as a service’, meaning you can store an abundance of both structured and unstructured data in its originally (unformatted) form. It’s inexpensive and takes the pressure off of your team to figure out what to do with the data pronto, before costs force you to toss it out.

Bigstep offers the first Data Lake as a Service (DLaaS), your solution to the data hoarding dilemma. Limited offer! Discover the first Full Metal Data Lake as a Service in the world. Get 1TB free for life - limited to 100 applicants. Start here.

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookGoogle+PinterestEmail

Readers also enjoyed:

Learning to Live with (and Overcome) Hadoop's Flaws

When it comes to managing big data, no system can match Hadoop in terms of working with huge data sets comprised of structured data, unstructured data,…

4 Powerful Ways Manufacturers are Putting Big Data to Extraordinarily Good Use

There has been so much emphasis on how big data is being used in industries like banking, security, and especially marketing. Less is said about how it…

Leave a Reply

Your email address will not be published.