Technically Speaking

The Official Bigstep Blog

Debunking 3 Myths About the Data Lake

Whether your organization is just dipping your feet in the pond of big data or dunk all the way in for a swim there regularly, someone at some point has likely brought up the topic of building a data lake. They were likely met with a range of reactions, from the, “has he lost his mind,” look delivered over the rim of a coffee cup in mid-sip, to outright guffaws of laughter over the insanity of the idea. Don’t laugh. These are real reactions by real adults when presented with the possibility of a data lake.

Don’t blame them. Most have read more than a couple of warning articles on data swamps—data lakes done wrong where all good data goes to die a slow, lingering, and perhaps even agonizing death. In actuality, the data lake is perhaps the most powerful platform for getting the most out of your big data strategy, both now and in the future. Data lakes don’t hamstring you in terms of what you can do with it later. A data lake means that you might find uses for a seemingly useless stream of data years from now, but since it’s still in its native format, tagged neatly and saved flawlessly in your data lake, you can drag it out, dust it off, and give it an analytical whirl whenever the inspiration strikes.

Here are the most common myths—arguments people try to use to get out of building a lake. Maybe they’re afraid of the water, or perhaps they just don’t want to leave the beach they’ve so carefully arranged like they want. Either way, here’s how to debunk these myths and dunk them into the data lake using truth and practical sense. Don’t worry, the water’s just fine.

1. The Concept of the Data Lake is Too Young & Immature

Compared to data warehouses, data lakes are mere toddlers. But have you seen a toddler let loose on one of today’s digital devices? The maturation process is faster in the age of technology.

This myth is usually spread by those who are deeply vested in the existing data warehouse. In fact, most of the myths are distributed by them, but that’s okay. Nobody is talking about tossing the existing data warehouse out the back window along with the fax machines and old Blackberry flip phones, just to build a data lake instead. For now, it’s not only a good idea, it’s highly advisable, to maintain both.

While Hadoop and the rest of the big data ecosystem are less mature in terms of data governance, data lakes are more focused on discovery. You can still keep your data warehouse for data governance and supporting the users and apps until your data lake matures and is ready to support those things. But you won’t be behind in the learning curve while your competitors learn to leverage the data lake and sail right past you in the coming months and years.

2. A Data Lake is Just Another Data Silo We’ll Have to Deal With

In short, no. The data lake is the first (and absolutely essential) step for eliminating those silos. For all the talk about data integration and offloading data from proprietary or legacy systems and leveraging them for data analytics, the reality is that these processes are still very much in the maturation process. The ideal solution is to establish a data lake where both your current structured data (the stuff from your existing data warehouse) can start learning how to play well with your oddball, unstructured data.

The data lake allows you to delve into the IoT, start analyzing data from disparate business systems (like your ERP plus CRM or your Excel spreadsheets with your marketing automation tool), and begin deriving valuable insight from those data sources. It’s kind of like the bridge you have to cross from data silos to a truly data-driven organization. If you build your data lake wisely, it will serve your needs nicely once you get where you’re going.

3. Data Lakes Challenge the Data Management Structures We Already Have in Place

The data lake seems to threaten the concept of the traditional data warehouse. But put another way, data lakes actually protect the integrity of the data warehouse during the various stages of becoming a data-driven organization.

This is a common argument from those who fear that the data lake threatens their data warehouse. In reality, data living in the data lake isn’t curated, structured, or proven. Much of it may have no value to the organization. However, you don’t know that yet. One way to answer these naysayers is to point out how the data lake can preserve the purity and integrity of the data warehouse, by allowing the organization to prove the worthiness of data in the data lake before introducing it to the data warehouse. Data that would cause a data warehouse manager to keel over in need of immediate medical attention can be poured into the data lake, stirred up, sifted through, and even tossed, all without threatening the health of the powers who be or the integrity of the data warehouse.

Once you’ve debunked these myths in your organization, head over to Bigstep and take advantage of the first Full Metal Data Lake as a Service in the world. Get 1TB free for life - limited to 100 applicants. Start here.

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookGoogle+PinterestEmail

Readers also enjoyed:

Is a Data Lake the Better Solution to Your Data Warehousing Issues?

The good old data warehouse has serviced business admirably for decades. Generally structured as a relational database, it is the go-to data resource…

5 Best Practices to Assure Your Data Lake is Swimmingly Successful

As big data becomes a mainstay in the business, many organizations are abandoning the data warehouse for data lakes. With a data lake, you don't have…

Leave a Reply

Your email address will not be published.