Expert Interview with Daniel D. Gutierrez on Big Data Headlines
For IT teams to ensure they stay ahead of the curve on security and new technology related to big data, they need to start putting together their data science teams now, says Daniel D. Gutierrez, Managing Editor of insideBIGDATA.com.
“Unfortunately, many companies try to hold out for a single ‘unicorn,’ a person with all the skills of a data scientist and big data engineer,” Daniel adds. “But this approach is unrealistic. A data science effort needs a broad range of skill sets not found in a single person.”
We recently checked in with Daniel to get more of his insight on the most interesting big data news, trends and headlines he’s following today. Here’s what he had to say:
Tell us about InsideBIGDATA. Who should be following your site? What will they find on it?
insideBIGDATA is a leading source of news for the big data industry including data science, machine learning, predictive analytics, data visualization, use case examples and much more. We have a number of compelling channels that readers in this space will find useful, including “Ask a Data Scientist,” “Data Science 101,” our Industry Perspectives section, and our numerous free technology guides sponsored by top tier vendors.
In your bio, you say that you’ve been involved with big data long before it came into vogue. How long has the big data concept been around? What did it look like at its inception?
In addition to being a big data technology journalist, I am also a practicing data scientist. I’ve seen data science grow up from the days of “data mining” and “KDD;” and through those years, I’m pleased to see the field’s current incarnation centered around big data. Big data is relatively new due to the increased volume, velocity and variety of data assets; but data science is founded on very old disciplines such as computer science, AI, mathematical statistics and probability theory. What’s happened lately is that the hardware capabilities have caught up with the demand for analyzing extremely large data sets.
How has it changed since you first became interested in it?
The primary change is that more and more enterprises are realizing they can increase the value of their data assets by utilizing data science to make predictions and discover knowledge in the data. Actionable insights are a reality today! It is a very exciting time for someone like me who has seen the field grow exponentially in the past few years.
What headlines are followers of your site talking about the most these days?
The biggest headlines are found around the Hadoop arena, but now Apache Spark is seriously challenging Hadoop as the distributed processing architecture of choice. The next year or so will experience a lot of change based on rapidly-evolving technologies.
What big data news or trends do you think are overrated, overreported or overblown?
As a journalist, I see a lot of vendors trying to establish very loose affiliations with “big data.” Some stretch the relationship to a great degree. But I guess everyone wants to climb aboard the big data bandwagon.
From your perspective, which companies seem to be leveraging the power of big data with the most interesting or exciting results? What can we learn from them?
I think the companies in the Spark arena are the most exciting to me in the way they’re leveraging the power of big data. Companies like Databricks are way ahead of the pack. I am watching such companies closely. Databricks had their genesis in academia, so I think this is an important lesson in how to productize research.
Where do you think companies fail when it comes to strategically using the data they collect?
I see enterprises fail with their data when they don’t fully respect what data science can deliver to their bottom line. It is because C-level people aren’t fully primed on the technology. As a journalist and consultant, I’m trying to change that.
What are the biggest dangers or risks associated with collecting and storing massive amounts of data? What should companies be wary about?
Of course, there is the privacy and security issues that have received much press these days. That’s why I feel big data security is not getting the amount of play it should. This will change in the next year.