Hadoop Overview: What is It? What is It Used for? Do You Need It?
The concept of big data isn’t new at all. While the exact conception point of modern big data can likely be traced back (according to Forbes) to the identification of the “information explosion” back in 1941, what has changed recently isn’t just the volume, variety, and velocity of data—but rather the tools that we have to store and analyze the data. Certainly, volume, variety, and velocity are growing more today, as well, since the bulk of big data is produced by machines, not humans.
Where Did Hadoop Come From?
While there are numerous databases, data storage solutions, data processing systems, and other tools and products available to work with big data today, the one that has become practically synonymous with big data is Hadoop. Hadoop was first developed in 2005 as an open source platform for distributed computing. The hallmark of Hadoop is its scalability. Even the largest collections of data you can imagine (like all of the Facebook posts, Twitter feeds, and Google searches combined) can be stored and processed in a cost-effective manner. In fact, the other hallmark of Hadoop is its cost savings. Without it (or a really good substitute for it), there really would be few organizations that could afford to collect, store, and process such quantities of data.
What Does Hadoop Do?
Hadoop and its related products (most open source, and many produced by Apache) are collectively called the Hadoop ecosystem. While we could discuss that ecosystem, the internal workings of Hadoop, and the best companion products forever, it would be more beneficial to understand how and why people have turned to Hadoop en masse for their big data projects.
Why is Hadoop Different?
Aside from being unbelievably scalable and affordable, Hadoop is efficient. The more recent versions of Hadoop have also added a tremendous amount of reliability. Hadoop users of old (well, as old as you can be with a product dating to 2005) remember that almost as many Hadoop jobs crashed as returned answers. That’s no longer an issue. Hadoop utilizes multiple nodes in parallel to crunch data and perform analytics. It partially performs computations on storage nodes, doing away with delays caused by moving data between storage and compute nodes. Since the data doesn’t have to move among servers, it doesn’t overload the network or cause latency.
Hadoop can be scaled dynamically. Every machine that is added to the Hadoop environment increases the amount of storage and compute power available. In other words, you can theoretically just keep adding and adding and adding, creating a Hadoop environment as large as you need it to be. Hadoop is also flexible. It is capable of processing both structured and unstructured data. However, Hadoop isn’t suitable for solving just any problem. When working with smaller sets of data, traditional analytics would be much more practical. Using Hadoop for data sets like a single warehouse’s inventory history or the transactional data from a single chain of retail stores would be like taking a space shuttle to the local mall. Hadoop only becomes practical with enormous sets of data, especially data that is varied—such as from numerous data streams in different formats.
What Do People Use Hadoop For?
Some of the usual use cases for Hadoop include:
• Threat analysis—From network monitoring to detecting bank or insurance fraud, Hadoop is making huge waves in the realm of identifying threats as they happen.
• Risk modeling—Investors can pinpoint with unbelievable accuracy which risks are worth taking and which investments are better off skipped using historical data and current trend data analyzed in Hadoop.
• Improving marketing efforts—Hadoop can run analytics on consumer and customer behavior and help marketers make scarily accurate predictions about what tactics are most likely to trigger a consumer to buy a product.
• Recommending products to customers—Have you ever noticed how on target the recommendations of your favorite e-commerce store are when you’re ready to check out with a product? Most of those are done using Hadoop.
Does Hadoop sound like the right solution for you and your business? If so, you can get started right away. Visit Bigstep to see how other customers have levied our products for Hadoop success or get started for yourself now.