Technically Speaking

The Official Bigstep Blog

 

5 Best Practices to Assure Your Data Lake is Swimmingly Successful

As big data becomes a mainstay in the business, many organizations are abandoning the data warehouse for data lakes. With a data lake, you don't have to worry about the relationships among the data or what the data is good for. You just pour all the data in and let it swim around until you're ready to use it. When you're ready to get started building and filling your data lake, here are some best practices to keep in mind for success.

As big data becomes a mainstay in the business, many organizations are abandoning the data warehouse for data lakes. With a data lake, you don’t have to worry about the relationships among the data or what the data is good for. You just pour all the data in and let it swim around until you’re ready to use it. When you’re ready to get started building and filling your data lake, here are some best practices to keep in mind for success.

1. Don’t Worry About What You’re Going to Do With All That Data

The data lake is a lot like baseball fields: if you build it, uses for the data will come.

If you’ve spent your career working with relational databases like SQL it’s going to be really hard for you to build a database without nailing down exactly what the data will be used for and how. Take a deep breath and do it anyway. The beauty of a data lake is the ability to store all kinds of data that would normally go to waste, even if you don’t figure out a use for it for some time. Think of the data lake like your junk drawer at home: somewhere to stick all that miscellaneous junk until it’s time to pull it out and use it. One day, data that looked pretty worthless might yield tremendously valuable data on BI, your customers, or even ways to get ahead of the competition.

Full Metal Data Lake - Exabyte-scale storage for big data. Get 1TB free for life!

2. Find a Good Data Scientist (Hint: They Don’t Exist)

Data scientists are like elves and leprechauns. You hear a lot about them, but darn if you ever meet one in person. But a good data scientist can be developed instead of hired. Many companies find that a team is more successful than a single data scientist, anyway. Look for strong mathematical skills, particularly in the area of statistical analysis. Combine that with some savvy programming talents, and combine this with someone who has a good grasp of the business side of things. Sprinkle with salt and pepper, allow to marinate for a few months (preferably while studying Hadoop and data analytics) and Bam! A data scientist will emerge to help you develop and manage your data lake.

3. Decide on a Platform for Your Data Lake (Hint: It’s Hadoop)

Hadoop isn’t fast, it isn’t easy, and it is not necessarily cheap. But it is highly effective for managing and analyzing enormous sets of unstructured data, such as you’ll be dealing with in your new data lake. You can search and search, but you won’t find a better option for data crunching and munching than Hadoop on full metal. Just remember, it will take time to get a handle on, especially if you’re home-baking a data scientist or two.

4. Find New Sources to Feed Your Data Lake

One of the most powerful advantages of a data lake is the ability to stream in lots of different data from disparate sources without having to worry about what it’s going to do until much later. Stream away! Begin offloading data from various systems and your data scientists will soon find uses for most, if not all, of it.

5. Stay on Top of Capacity Planning

Good capacity planning assures that your data lake doesn’t eventually overflow its banks.

The thing about data lakes is that, since you’re essentially pouring in everything but the kitchen sink (and perhaps even that), the capacity tends to grow substantially more and faster than the scale of a typical data warehouse. Hence, it’s important to stay on top of that capacity planning. Partner with a solid DBaaS provider that will be able to offer you the scalability you need to maintain and manage your data lake.

A data lake will serve as your repository for the data you need heading into the future.

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookGoogle+PinterestEmail

Readers also enjoyed:

4 Things to Look for When Choosing Your Data Lake Vendor

In the beginning was the database, and the database was good. It stored all of the transactional data and powered your users and applications quite nicely.…

Get Your Enterprise (NCC-1701D) Ready for Data Lake

Say you’re Captain Picard. Your company is your flagship. Your Enterprise. With the right strategy, your ship can take you anywhere. But it must run flawlessly.…

Leave a Reply

Your email address will not be published.

* Required fields to post your comments.
Please review our Privacy Notice in order to understand how we process your personal data and what are your rights in this respect.