Technically Speaking

The Official Bigstep Blog


Interview: Matt Pfeil of DataStax on Apache Cassandra and Big Data

Everybody knows that big data is big business, but few know why - let alone how best to take advantage of it.DataStax is built around the Apache Cassandra, the open-source NoSQL database software that lets people analyze data in real time.Matt Pfeil took a moment to tell us the origin story of DataStax, educated us about Cassandra and some of its applications, and give us some predictions on where this exciting area of research is heading.

Everybody knows that big data is big business, but few know why - let alone how best to take advantage of it.

DataStax is built around the Apache Cassandra, the open-source NoSQL database software that lets people analyze data in real time.

Matt Pfeil took a moment to tell us the origin story of DataStax, educated us about Cassandra and some of its applications, and give us some predictions on where this exciting area of research is heading.

How did DataStax get started? Where are you based? What need was it that you perceived needed to be filled?

DataStax was started in the spring of 2010. Jonathan Ellis, my co-founder, and I were both working at Rackspace. We were working on the open source project Cassandra while there, and one day Jonathan informed me he would be leaving Rackspace to create a company around it. I took him out to lunch to convince him not to leave; he talked me into leaving with him.

Cassandra is an open source, NoSQL database. It’s a transactional database, so it powers a lot of the companies we work with.

Because it’s open source, Cassandra has had adoption with a lot of companies. For those companies to continue their usage, they needed a commercial entity to be behind it. DataStax bridged that gap.

Who are some of your main clientele, and how is DataStax uniquely set up to fill their needs?

Netflix, Comcast, Adobe and eBay are a few of our customers today.

DataStax has a combination of strong Cassandra talent in house, combined with products that enterprises want to use to fully embrace their Cassandra experience. On the services front, we offer things like training, architecture and design consultation, and support. But our primary offering is DataStax Enterprise, which offers advanced functionality for Cassandra with things like quality control, search, analytics, and even management software.

On the introduction video at your website, DataStax talks about “the new wave of enterprises that are powered by data-driven DNA that lets them engage with their customers like they never have before.” Can you give us a few brief real-world examples of this?

I’ll give you two. When you walk into my condo in Austin, the two most boring pieces of equipment should be my scale and my thermostat. But they’re both connected to the Internet, and they’re constantly capturing data - like my weight, or the inside temperature - to make the user experience better as I continue to use them.

As Marc Andreessen said, “Software is Eating the World” and the database is the key component of every software stack.

In that same video, DataStax talks about “not just keeping up, but leaping ahead.” Can you talk about some instances where Big Data is being used for predictive technologies? What are some ways that companies can implement this data to predict their customer’s needs?

One of our large use cases involves recommendation engines. For e-commerce platforms, it used to be that if a customer bought item A, recommend B. If someone buys a TV, recommend a DVD player (I’m showing my age).

But if the person is my mom, she and I have very little in common. Recommend the DVD player for her, but recommend a Sonos for me. In other words, for every product in your inventory, have a unique potential recommendation for every potential client of your store.

That’s a lot of data. And it takes a new type of database to do that.

In the video, DataStax talks about how “home appliances become lifestyle changers” and shows a graphic of a refrigerator and an automated grocery order. What are some examples of this that are already getting popular? Secondly, in your opinion or experience, what effect will the Internet Of Things have on Big Data, particularly as it applies to businesses?

Cars are a great example. Aeris provides an IoT platform for both automakers and drivers for a better experience, based on Cassandra.

In a nutshell, they let the automaker keep track of quality of the car for better reliability, and also engage the customer so he or she can do something like unlock the front door or adjust the windows remotely.

DataStax delivers information from online giants like Google and Amazon to a client, no matter what their size. What are some of the implications of small, independent businesses having access to the same resources as the giants? What are some ways that smaller companies are even at an advantage to larger corporations?

In this day and age, it takes less resources than ever to start a company. Thanks to both open source and the rise of cloud, companies can get started with thousands of dollars instead of millions.

At DataStax, we offer a startup program for small, emerging companies. If you’re a startup, you get our enterprise grade software 100% free. It’s that simple.

Startups have the advantage of being small and nimble. There aren’t a lot of bureaucratic processes to follow, so they can move swiftly and quickly. And because they’re new, they’re free to explore new ideas without considering existing business.

For a company selling a product or service, what are a few metrics you would recommend keeping an eye on?

Always watch your bank account. Cash on hand is a big one, and burning through cash without knowing when it ends is a recipe for disaster.

But on a more positive note, look at your top 1 or 2 initiatives and figure out the 1 or 2 metrics for those that are numeric and can be measured. And watch those daily. If they’re not going in the direction you want, admit to yourself something isn’t working and investigate why. Don’t be afraid to fail.

For example, if you’re a freemium company, watch your free user growth. And watch your number of paid users and determine the conversion ratio. As you grow, if one of those slips, figure out what changed as quickly as possible.

For people who are just getting started with Big Data, how steep is the learning curve? How hard is it to begin? What are a few baby steps a company can take to begin integrating Big Data and its possibilities into their current operations?

In 2010, it was fast. We used to tell people they could get up to speed in about a month or two of work.

It’s 2015, and it’s even faster. We see companies start their first project and go into production in as little as a quarter later. If you’re new to it, pick a small, manageable project and start playing with it. You’ll have a short term success and set the stage for your next round of learning. Big Data is a big deal and it’s not going away, so expect to iterate for quite a long time.

Can you tell us a little bit about what Apache Cassandra is and what it does? Is Cassandra available to the public - and if so, how hard is it to understand?

Cassandra is absolutely available to the public. Cassandra is an enterprise grade NoSQL database. NoSQL is a transactional - online - database built for the Big Data age. It’s the database that companies use to power their core business or function, and it offers advantages like increased uptime and flexible data models so businesses can harness Big Data as easily and as powerful as possible.

DataStax talks about how becoming an online enterprise is imperative, that it creates dynamic interaction, and it makes every customer like your only customer. First of all, can you describe a bit of what you mean by dynamic interaction and how that can be leveraged for greater sales? Lastly, what are some reasons why any business, great or small, can’t afford to overlook the potentials of data-driven technologies?

As I mentioned in the recommendation example earlier, you can utilize data for better customer and user experiences. But if you look at the history of databases, they can divided into one of two camps - transactional or analytical.

Transactional is the one that powers the application; analytical is the one that’s used for business intelligence.

There’s always been a feedback loop between them. In the past, that loop could be days if not weeks before results from the analysis were fed back into the online system. In this day and age, they’re merging. DataStax offers analytics as part of its enterprise grade edition of Cassandra so our customers can perform analysis during the transaction to make the experience that much greater - in real time.

For more updates from DataStax, like them on Facebook, follow them on Twitter, and connect with them on LinkedIn.

Got a question? Need advice? We're just one click away.
Sharing is caring:TwitterFacebookLinkedinPinterestEmail

Readers also enjoyed:

Anita Garimella Andrews on Data-Driven Decision Making

If you want to lessen data management growing pains in the future, then you need to implement thoughtful data structure right at the start, says Anita…

Big Data and SEO: How the Two Can Partner Up

Search engine optimization is all about making sure that you get the best possible results from your online presence, and one of the most interesting…

Leave a Reply

Your email address will not be published.

* Required fields to post your comments.
Please review our Privacy Notice in order to understand how we process your personal data and what are your rights in this respect.