Building Data Lakes in the Cloud

Paper Abstract

A walk through the steps required to build a data lake in the cloud and connect it to on-premises environments, covering best practices in architecting cloud data lakes and key aspects such as performance, security, benefits and software solutions, presenting technologies ranging from basic HDFS storage to real-time processing with Spark Streaming.

What's Inside

  1. Data lakes in the cloud
    Find out how cloud-based data lakes can be connected to on-premises environments.
  2. Security solutions
    Learn about the authentication protocols and data encryption down to per-file basis which safeguards data lakes in the cloud environment.
  3. Software and performance solutions
    Discover how to increase performance by going directly onto a bare metal cloud and have your data lake flexible architecture ready within minutes.

