Technically Speaking

The Official Bigstep Blog

 

Does Taking on Hadoop Mean the End of Your Data Warehouse?

There seems to be some confusion among executives -- many respond in polls to indicate that Hadoop will eventually replace their data warehouses entirely. But that isn't necessarily the case. According to most experts familiar with Hadoop, the new open source data analytics solution is probably not the end of the need for data warehouses. The two handle things a bit differently, and many organizations will likely find a need for only one or for both. Here are the considerations you need to make before opting to abandon your tried and true data warehouse for Hadoop clusters.

There seems to be some confusion among executives—many respond in polls to indicate that Hadoop will eventually replace their data warehouses entirely. But that isn’t necessarily the case. According to most experts familiar with Hadoop, the new open source data analytics solution is probably not the end of the need for data warehouses. The two handle things a bit differently, and many organizations will likely find a need for only one or for both. Here are the considerations you need to make before opting to abandon your tried and true data warehouse for Hadoop clusters.

Consider the Quantity of Your Data

 

Hadoop is best used on enormous collections of data that include a variety of types of data from various sources.

 

Not all organizations even have (or need) the vast amounts of data it takes to make Hadoop worth your while. Hadoop is designed to handle many petabytes of data, not just a few terabytes. Unless you’re positive that you have the amount of data necessary to justify taking on Hadoop operations, it’s probably best to stick with your old, trusted data warehouse.

Consider the Complexity of Your Data

Hadoop isn’t just designed to handle lots of data, it’s specifically crafted to manage a complex variety of data. Data analytics is most powerful when a number of disparate sources of data are present. Hadoop also does a superior job when it comes to unstructured data, or data that doesn’t fit well with a typical SQL database. For example, Hadoop can take on unstructured data like text documents, emails, videos, images, presentations, and other unstructured data sets, while data warehouses still handle structured data (stuff that fits neatly in the rows and columns of a spreadsheet) just fine.

Consider the Comfort Zone of Your IT Staff

 

The leap from data warehousing to Hadoop is greater than you might think. Make sure your workers are prepared for the challenge, and that the initiative is worth it.

 

Hadoop comes with a significant learning curve. While most skilled programmers and database administrators can learn Hadoop with time, it’s not intuitive and it isn’t easy. There are numerous products to make offloading to Hadoop easier and to structure and analyze with your new Hadoop clusters. But you will need to either find someone to hire who is proficient in Hadoop or give your current IT staff the time to learn it.

Also, shop Hadoop vendors carefully. Make sure you understand that Hadoop is open source, and therefore doesn’t come with the vendor support you’re likely used to when undertaking new, complex software systems. Though Hadoop is “free,” vendors make their money primarily from providing such support. What you’re used to getting from vendors for free with your purchase will come with a price tag when using Hadoop.

Consider the Security Skills of Your Staff

Data warehouses are also pretty secure. Hadoop doesn’t come with the enterprise-grade security your team is likely used to. While several new security features have recently been added to Hadoop, the security settings and options that come standard won’t be enough if you’re managing consumer data, health care information, or any data that falls under regulations. This also means your sensitive data on business intelligence, intellectual property, etc. isn’t completely safe. You’ll need to make sure your IT team is up to the task of providing excellent security on top of what Hadoop offers.

Think of Hadoop as an Add-On, Not a Replacement for Your Current System

In most organizations, the data warehouse still has its place. Hadoop is more of an add-on solution than a replacement for any current IT infrastructure. Data warehouses are more reliable, stable, secure, and resilient than Hadoop, at least as Hadoop stands currently. Hadoop, however, can provide deep insight, powerful analytics, and some interesting analytics that data warehouses aren’t capable of. Most organizations that see long-term success using Hadoop take on a single small project and build on minor successes before taking on anything major. Use it to supplement your current data warehouse.

When taking on big data, a powerful cloud service can be extraordinarily helpful. It lessens the cost of housing and processing large data sets by eliminating the need to invest in more hardware for your Hadoop clusters. Visit BigStep to see what the Full Metal Cloud can do for your big data initiatives today.

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookLinkedinPinterestEmail

Readers also enjoyed:

A bare metal cloud is Hadoop’s best friend

Hadoop is power-hungry – we all know that, right? It needs immense computing power to work effectively and for all the insight it can deliver, there is…

The Pink Elephant in the Room: How IT Ignores Blatant Security Issues, Even in the Era of the Data Breach

What would happen if you completely ignored your car maintenance or home maintenance? What if you bought a car and drove it for years with no oil changes,…

Leave a Reply

Your email address will not be published.

* Required fields to post your comments.
Please review our Privacy Notice in order to understand how we process your personal data and what are your rights in this respect.