Technically Speaking

The Official Bigstep Blog

 

Getting the most out of Impala

We have teamed up with Cloudera to analyse ways of working with Impala in order to optimise for both better performance and budget. We will be sharing our experiences with you at the next London Enterprise Technology Meetup, Monday May 12th, starting at 7pm, in the Wolfson Theatre of the New Academic Building, at 54 Lincolns Inn Fields.

Getting more performance from any application is easy when there’s a possibility to increase the budget. But what happens when we have the opposite challenge? How can we do more with less? One usual suspect, probably even the most infamous for performance bottlenecks, is I/O. So that’s where we started.

We have teamed up with Cloudera to analyse ways of working with Impala in order to optimise for both better performance and budget. We will be sharing our experiences with you at the next London Enterprise Technology Meetup, Monday May 12th, starting at 7pm, in the Wolfson Theatre of the New Academic Building, at 54 Lincolns Inn Fields.

Getting more performance from any application is easy when there’s a possibility to increase the budget. But what happens when we have the opposite challenge? How can we do more with less? One usual suspect, probably even the most infamous for performance bottlenecks, is I/O. So that’s where we started.

The setup

We used a setup of Cloudera Impala on 10 instances, each with 20 physical CPU cores, 192 GB of RAM, and 4 x 10 Gbps ports (our FMCI 20.192) and we tested using TCP-DS.

In order to track I/O bottlenecks we looked at two alternate ways of deploying Impala:

A. Using local storage – precisely 8 x 1 TB drives per instance or a total of 80 enterprise drives at 7.2K RPM
B. Using Bigstep’s Full Metal Solid Storage – an all-SSD distributed storage system. The instance cluster was connected to the storage array with one 10 Gbps link per machine.

In both scenarios, the instances in the cluster were interconnected in a single LAN, each with one 10 Gbps link.

For those new to Impala, it is a massively parallel processing (MPP) SQL query engine from Cloudera that runs natively in Apache Hadoop and enables users to directly query data stored in HDFS and Apache HBase, without requiring data movement or transformation.

The speakers

Join us on May 12th at the London Enterprise Technology Meetup to see the results and to learn how you can optimise your big data infrastructure to provide more insight in less time.

Who will be speaking:

• Cloudera: Graham Gear - EMEA Director of Systems Engineering
• Bigstep: Alex Bordei – Product Manager

We look forward to seeing you there.

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookLinkedinPinterestEmail

Readers also enjoyed:

3 Totally Awesome Real-World Uses for Big Data

You've read a lot about big data, and are probably familiar with a few ways it is used in the real world. For instance, most people are aware that big…

4 Takeaways from the Recent Data Breach of US Government Security Agency

Today is not a good day to be an employee of the US federal government. One of the worst data breaches in history has compromised the private, sensitive…

Leave a Reply

Your email address will not be published.

* Required fields to post your comments.
Please review our Privacy Notice in order to understand how we process your personal data and what are your rights in this respect.