August 26, 2015

Big Data In Science

By Daniela Mustatea in Big Data Use Cases

A retail store needs big data to track its merchandise. Another company leverages big data in order to streamline its manufacturing process, or looks at data and comes up with better marketing strategies. At the bottom of this lies one imperative: profit. How is science using big data?

Well, science needs big data for a lot of things, but the underlying thread is the search for truth. Getting there, though, can be a tricky business, because precision matters, and precision is the outcome of many many MANY observations. Take CERN, for instance.

The Large Hadron Collider at CERN produces 500 Exabytes daily. That’s 500 billion Gigabytes of data every single day. But 99.99% of this behemoth of information is discarded as noise, and the fraction that remains is not consolidated using large traditional arrays, but divided throughout many parallel processing nodes using technologies like Hadoop. This arrangement means that teams have to share resources. A contemporary enterprise rarely sees inter-department cooperation when it comes to crunching data. At CERN, the standard procedure is for everyone to globally „pitch in”.

Making use of computing power and sharing the findings are two different things. CERN might be an exception in doing both of these, but in science, the common use of data collected in a field is still a long way from becoming reality. In molecular biology and chemistry, for instance, this does not happen. Competition harbors secrecy. And at the heart of competition lies, again, the search for profit. However, a sense of community among scientists would do the rest of the world a lot of good. In Timo Hannay’s words,

“If institutions and funders were to give more credit to open sharing of research data, scientific progress would accelerate and we would all benefit.”

Another problem is addressability. While business enterprises enjoy polished and dedicated software, scientists represent an insignificant niche. A big software company doesn’t see the point in investing a lot of effort to build tools for such a small and specialized market. Many scientists still build their own software.

“There are around 7 million researchers in the world, making them about 0.1% of the human population.”

Add to that that science, in the age of big data, requires constant scrutiny from peers to evaluate the quality of the research. Christie Aschwanden has an interesting story about how difficult it is to keep in check all those intentionally perpetuated frauds that give scientists a bad name. It’s easy to manipulate your data even if you don’t want to, so it’s no wonder that some less scrupulous people distort the results or even make them up as they go in order to follow their own agendas. Double checking every paper prior to it being published requires an insane amount of work.

But,

“If we’re going to rely on science as a means for reaching the truth — and it’s still the best tool we have — it’s important that we understand and respect just how difficult it is to get a rigorous result.”

Got a question? Need advice? We're just one click away.

Sharing is caring:

Back to articles

Readers also enjoyed:

February 14, 2014

Big data in use…HR

By Daniela Mustatea in Big Data Use Cases

When organisations talk about deploying big data, they usually mean within the marketing department. The insight gleaned from big data is used to better…

May 11, 2015

4 Customer Analytics You Can't Afford to Ignore

By Daniela Mustatea in Big Data Use Cases

Never have we been able to know our customers as well as we do today, thanks to Big Data. We can now glean everything from where they live to how much…

February 1, 2016

Is Self-Serve Analytics a Viable Option in Your IT Environment?

By Daniela Mustatea in What is Big Data

When big data and data analytics first hit the scene, getting answers and business intelligence out of the data was like pulling teeth out of a charging…

Your email address will not be published.

Big Data In Science

Readers also enjoyed:

Big data in use…HR

4 Customer Analytics You Can't Afford to Ignore

Is Self-Serve Analytics a Viable Option in Your IT Environment?

Leave a Reply