Offloading Mainframe Data Into Hadoop: 4 Things You Need to Know
For those who have spent the last decade steeped in all things cloud, virtualized environments, and Hadoop ecosystems, it may come as a shock that some 70 to 80 percent of the world’s business transactions are still handled by the mainframe. About 71 percent of all Fortune 500 companies are customers of the tremendously successful System z, the flagship of mainframe computing. The mainframe isn’t dead, and isn’t likely to be anytime soon. Mainframes are incredibly stable, unbelievably secure, and deliver an impressive level of performance.
Still, there’s ALL THAT DATA. Even seasoned mainframers are excited about the potential for offloading mainframe data into Hadoop to get all the goodie unlocked for improved business intel, operational intel, and customer insight. But getting the data from Point A (hi, mainframe) to Point B (Hadoop) is not a trivial matter. There are several options that you can discuss with your mainframe team and big data team to come to a conclusion about the best route for offloading your mainframe data.
Option #1: Database Log Replication
This option does require installing software on the mainframe (as well as a receiver Hadoop), so expect to field some questions and concerns (potentially even some wailing and gnashing of teeth) from your mainframe team. Log replication works by the database (such as DB2) writing redo logs when it writes to a table. The log-replication software reads those and translates it. It then sends a message to the receiver that is responsible for writing it to Hadoop.
Option #2: Flat-File Dumps
This is done by dumping tables to flat files on the mainframe, and then transferring those to a destination (probably FTP). Next, those flat files are moved to a different filename, so that it’s obvious that the transaction is completed and not still being transferred. This can be done either as a push or as a pull. On the Hadoop end of things, Spark, Pig, or Hive is used to parse the files and load them to tables. This process can usually be done overnight or whenever your mainframe resources are in lowest demand.
Option #3: VSAM Copybook Files
Not unlike flat-file dumps, you can alternately copy files to VSAM. VSAM files can then be imported, exported, what have you. There are several tools available to do this, including Syncsort (which has been in the biz for some time and has a lot of knowledge and reportedly excellent customer service), and Legstar (which has the reputation of being a bit more tedious and is open source, so doesn’t come with much in the way of tech support).
Option #4: ODBC/JDBC
This option is mentioned last, because aside from requesting that the mainframe team allow you to—gasp!—install software on their precious system, this one will likely meet the most resistance. However, it is an option. In this solution, you connect with either ODBC or JDBC on the mainframe directly from your database (probably DB2). The drawback is that because of how memory works in mainframe computers, you probably won’t get multiversion concurrency, or even row-level locking. This might be a good option to toss to your mainframe team first, because it’s almost guaranteed to receive an overwhelming and passionate, “No”. Then you can proceed with offering the other options, which will sound comparatively much better. Marketers call this the “door in the face” technique, because after violently slamming the door in your face when they hear this one, they’ll feel guilty if they don’t at least give ear to your following suggestions.
Want to see how other businesses have overcome the mainframe challenges to become successful with Hadoop and big data? Read about our customer stories, and then become a success story of your own with Bigstep.