OrientDB Interview On Multi-Model DBs Management For Big Data
Speed, efficiency, and ease of visualizing information are some of the greatest opportunities Big Data offers. This speed and efficiency goes straight out the window when developers are constantly having to flip between tabs and open new software - not to mention not being able to understand what they're looking at.
Speed, efficiency, and ease of visualizing information are some of the greatest opportunities Big Data offers. This speed and efficiency goes straight out the window when developers are constantly having to flip between tabs and open new software - not to mention not being able to understand what they’re looking at.
Having a complete, comprehensive database management system with built-in graphs is an obvious solution. OrientDB gives a developer everything they need all in one place to fully maximize the data sets we have at our fingertips.
OrientDB’s CEO and Founder Luca Garulli took a moment to tell us about OrientDB - what it is and what it does - and why it’s not technically just a NoSQL DBMS but rather a multi-model database management.
Can you explain the premise behind the name OrientDB?
I created an ODBMS with index-free adjacency (GraphDB technology) in 1998. It was written in C++ and named “Express.” After discovering that a DBMS with the same name already existed, I changed it to Orient ODBMS (taken from the famous Orient Express passenger train). In 2009, I took into account all the lessons I’d learned over the last 11 years and built a completely new database from the ground up with new schema-less capabilities. It was written in Java to encourage open-source contributions, and pioneered the concept of multi-model databases that’s now becoming mainstream. After merging my “Orient ODBMS” and “Orient KV” DBMS, the final product combined the flexibility of documents with the connectedness of graphs and the simplicity of key value: OrientDB.
As far as your origins, you talk about developers growing tired of sacrificing speed and flexibility or supporting several Database Management Systems to satisfy their needs. What are some of these compromises developers are having to make? What are some of the risks and nuisances of having to run several DBMSs?
Every technology makes tradeoffs, and particularly new up-and-coming databases are focusing on some specific features and deliberately ignoring others. Some products compromise functionalities over performance or relax the durability of data in favor of something else. Most of the NoSQL solutions out there need to be used together with others to complement the missing features. For large enterprises, adopting too many technologies is a risk and a cost in itself. Developers face similar challenges as they are forced to use multiple products for the same job, hence reducing their productivity.
For people who are new to the concept, can you describe a bit what NoSQL is? What are some of the opportunities that may stem from open access NoSQL Databases?
I’m not really a huge fan of the term, so let me just quote Wikipedia on that. “A NoSQL (often interpreted as Not only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling, and finer control over availability.” Truth be told, we prefer to say that we’re a multi-model database or distributed graph and document solution. We also provide a SQL interface. In our particular case, the NoSQL term might appear confusing or too generic.
From the get go, OrientDB was built for speed, which you talked about being especially critical for Big Data. First of all, how is OrientDB so fast? Secondly, what kind of ramifications could this have for Big Data? Do you have any estimations of projected savings?
OrientDB is a pure multi-model database. It actually has document, graph and key/value models all built into the core engine. So by design, OrientDB will be faster than other DBMSs with graph or document layers on top of the core engine. OrientDB can process more than 200,000 documents/second on a single machine. But the best part of OrientDB is not on insertion, but on crossing the relationships. RDBMS, and other NoSQL that have JOINs, have O(Log N) performance on crossing relationships. This means that the bigger the database, the slower the traversal will be. These joins are calculated every time a query is executed. I think everyone with RDBMS experience can confirm this. OrientDB, instead, uses no JOINs between records, but rather physical pointers. This is O(1), meaning that time is constant on traversing relationships no matter the database size. These physical pointers are created one time. This is huge in the Big Data age.
You’ve worked with a wide array of clients, from Warner Music Group to Lufthansa. What are some innovative ways that you’ve seen your software used? What are some examples of different kinds of data a specific industry might look at? Where do they go about finding it?
Perhaps the most interesting thing for Enterprises when adopting a Multi-Model DBMS like OrientDB is the simplicity of the entire architecture. Very often, companies need more than one DBMS because most of them are very good only at one thing. This means multiple technologies to learn and integrate with each other. Furthermore, the architecture could become very complex, and it’s hard to predict the final result. Instead, with a Multi-Model like OrientDB, you can satisfy most use-cases with only one DBMS. If you need to scale up, just clone the OrientDB server to other machines to create a cluster of them. Simple, fast and less expensive.
The healthcare industry comes to mind when you mention “most innovative ways.” Ingestible sensors are technology you swallow that are powered by your body. The patch, which is body-worn and disposable, captures and relays your body’s physiologic responses and behaviors. It receives information from the ingestible sensor; detects heart rate, activity, and rest; and sends information to your mobile device. Using a bluetooth-enabled device, you can access secure applications that display your data in context and support care in a variety of different ways. The data is stored in OrientDB because it’s fast, easily embeddable, developer-oriented, and open-source; and it easily manages relationships and complex data.
On your website, you discuss the similarities and differences between OrientDB and some of the other open source NoSQL DBMSs like MongoDB. Can you briefly recap some of those similarities and differences? What are some circumstances that OrientDB is best suited for?
Many times, to quickly explain OrientDB to people in 5 seconds, I say, “Think about MongoDB + Neo4j and MySQL - on steroids.” MongoDB users can use OrientDB like MongoDB; but with OrientDB, they can decide if an object is embedded or linked (with relationship). Coming from the Neo4j world, you find in OrientDB an evolved graph database with Multi-Master replication, sharding, and a query language most people know: SQL.
You hosted a conference on Big Data & Graphs last year in Rome, where you discussed how graph databases can be disruptive in the Big Data age. Can you describe what you mean by a graph database? Why are they disruptive?
Data is precious, but it’s the relationships that give meaning to data. Think about a social network like Facebook: having user data is nice, but having the data and how each piece of data relates to each other gives much more value to the data. A graphDB emphasizes relationships providing ad-hoc language to manipulate it and super speed on traversing thousands of relationships.
If you had to make a prediction, what are some industries that you anticipate either getting into Big Data in the next 5 years or getting more heavily invested?
I honestly cannot think of one industry that doesn’t already use big data. Not even one. Companies in healthcare, financial services, government, retail, media, entertainment, and telecom all use big data. If they don’t, soon they will most likely be run over by their competitors who will make better business decisions because they have access to big data.
What are some of the applications and implications of Big Data that you are particularly excited and passionate about?
I’ve seen many big data projects where the challenge is determining how the information was related to an event or to another piece of data. Connecting information is like connecting dots to discover the bigger picture. Everything is linked in one way or another, and knowledge is power because it allows you to make better decisions. This is my point of view on Big Data.