Hocus Pocus: Is it Data Science, or Is It Magic?
Back in the mid-1990s, IBM built a machine called Deep Blue. Deep Blue awed the world in 1996 by beating the then world chess champ at his own game under standard competition chess time rules. The world was (and continues to be) awed by this feat. To those outside the world of computers and AI, this seems like, well, magic.
But what if Deep Blue were to play Backgammon? Or checkers? Or perhaps poker? In reality, Deep Blue wouldn’t likely do so well. That’s because the computer was built specifically to do well at the game of chess. Unlike a person, who can excel at a number of things simultaneously, computers aren’t that flexible. They are designed to do one thing, and they do that thing incredibly well. They just can’t apply that to other aspects of life, like a person who could easily take their strategy skills from chess and apply those to Risk or Yahtzee or Scrabble.
In fact, many of the things big data, data science, and AI or machine learning are capable of seem like a black art at the least and utter witchcraft at most. Big data is helping cure cancer, find people like yourself to hang out with, help stores stock up on the most popular stock items before a big storm, and even help women find better-fitting bras. Yeah, some of that definitely sounds like magic.
Though the data scientist is a highly-sought and rarely found creature, it’s not magic at all. In fact, the daily life of the average data scientist is rather mundane. Here’s what it takes to make all those mountains of data conjure up magically useful insight.
Data Science is a Lot of Meetings & Research
According to data scientists in the trenches, the process usually begins with a meeting to determine exactly what questions or problems the data needs to answer. This isn’t always so cut and dried. After meeting with the customer (either the data scientist’s internal customers inside the company or external customers), the data scientist usually has to delve into more research to fully understand the issues.
But learning about the issues is just the first step. The data scientist then has to figure out what data holds the answer, and in many cases, how exactly to get that data. Sometimes, the data needed to query for the answers isn’t just sitting around in the Hadoop infrastructure. A source for the data has to be found, as well as a means for getting the data. All that is done before the data scientist can even begin to query the data.
Data Science is a Lot of Number Crunching
Once the data is in place, the data scientist has to develop the right algorithm(s) for getting the answers needed. Speaking to InfoWorld magazine, one data scientist estimated that about half of her time is spent meeting with people to learn what they need out of data analytics. Another 20 percent is spent on the actual computations necessary to glean the findings.
Data Science Involves a Lot of Time Interpreting the ‘Findings’
As much as 30 percent of the data scientist’s time can be spent just interpreting the findings. Data doesn’t come out in usable form. It takes hours of analyzing the results and putting them into useful format so that people can understand what the analytics has discovered. Data scientist call this data visualization.
Before data scientists can even begin assembling data or running analytics, they have to build a flexible, scalable, practical, and secure data infrastructure. That’s where Bigstep can help. For a limited time, you can discover the first Bare-Metal Data Lake as a Service in the world. Get 1TB free for life - limited to 100 applicants. Start here.