Bigstep Data Lake

The World’s First Data Lake-as-a-Service

Enterprise-Grade Data Lake-as-a-Service

At its core, the data lake is an exabyte-scale storage repository and processing engine which supports big data projects by paving the way to the discovery of brand new and actionable insights.

All Data Tells a Story

The Bigstep Data Lake integrates with your existing applications and systems, ensures enterprise-grade security, it delivers unparalleled throughput, massive processing power and the ability to handle limitless simultaneous tasks or jobs. Working with big data has never been easier.

Data sources The Bigstep Data Lake holds any type of structured or unstructured data. Database Exports Video & Audio Data Machine Data Scanned Documents Clickstream Social Media Sensor Data CRM & ERP
Data Filtering Data Processing Visualisation Machine Learning ETL 10 010 1010

Key Features

Native HDFS Integration

Hadoop-native applications (Spark, Kafka, Drill, Flink, NoSQL DBs) can access the data lake service through the binary HDFS protocol.

File-Level Replication

Replication is configured on a per-file basis, so you can decide the extent to which your most sensitive data is safeguarded against loss.

Enterprise-Grade Security

The Bigstep Data Lake uses data in transit and at rest encryption and Kerberos-based authentication to enforce data security.

Supports Files of Any Size

There are no restrictions on either how much data you may store in the data lake or on individual file size.

High Throughput

Thanks to 40 Gbps throughput on a single link, data moves freely, and quickly reaches mission-critical applications.

Multiple Availability Regions

Access data from any Bigstep Region (Chicago USA, Reading UK or Frankfurt Germany) and replicate it to ensure maximum availability.

If There's Data, There's a Way

Forget overly expensive on-premises storage solutions that are difficult to scale and manage. No matter how large the dataset or what type of data needs to be collected, processed and analyzed, the Bigstep Data Lake is the go-to service for all big data use cases.

Scale Out of An Existing Enterprise Data Warehouse

The Bigstep Data Lake can ingest existing data and cross-reference it with information from new sources and systems. It is compatible with Oracle, Teradata, IBM, HP, and Microsoft solutions, as well as with most business intelligence tools, such as Qlikview, Microstrategy, Jaspersoft, and Tableau.

Take In And Analyze Data Streams of Any Size

Most on-premises architectures are not ready to collect high-velocity data streams from web users, social media, sensors, and other apps or devices. The Bigstep Data Lake is, and it can be deployed in minutes. It can take in external data streams, make the files instantly available to processing engines, and feed the results back into existing systems.

Run Data Science and Machine Learning on Integrated Data Sets

The Bigstep Data Lake is designed to handle preposterous amounts of data, and enable data science teams to discover correlations otherwise invisible. RStudio, Matlab, and iPhyton Notebook, Spark Notebook, or PyCharm can be used to connect directly to the Bigstep Data Lake. Data scientists can now do their best work, in a scalable, lightning-fast environment.

Build Distributed Microservices Architectures for Web-Ready Apps

How do web or mobile apps fast enough to keep users coming back for more, without burning a whole in your budget? Containers are an easy way to distribute the right resources to the right workload. The Bigstep Data Lake was built to deliver the flexibility that microservices architectures require, and work out-of-the-box with Mesos, Kubernetes, and Docker.

Data Processing and ETL From Any Source, In Any Application

SQL, Java, Python, Scala, RDBMS, Hadoop, Spark, NoSQL – everything works out-of-the-box, so teams can start analyzing data and gain insights in mere minutes. Deliver the right data to the right tool at the right time, and you won’t be locked into a single technology ever again.

Active Cold Data Storage and Backup for Massive Data Sets

Whether you’re looking for a storage solution to host an active replica of your data sets, or a system that keeps your data available via HTTP across multiple regions and systems, the Bigstep Data Lake is ready to meet your requirements.


Data Ownership Control

Files have individual access permissions and ownership control that replicates the user groups and hierarchy in the Bigstep Metal Cloud's delegation system.

Data Encryption

To protect against unauthorized access, the Data Lake service uses Kerberos for secret-key cryptography and encrypts data both when transmitted across networks and while at rest.

Identity Services Integration

Easily integrate your corporate active directory (LDAP) or a 3rd party authentication method using the Full Metal Identity Services.
(Coming soon)

The World's First Data Lake as a Service

The Bigstep Data Lake is the first service of its kind. It has been designed for big files, in the order of terabytes and above, and supports both structured and unstructured data, regardless of source. Every file is composed out of multiple blocks, each of which is concurrently downloadable from different source machines. Add up to 40 Gbps throughput per node and it’s easy to see how this turns into a multi-terabit traffic architecture.

Data Stream Data Processing& Filtering BIGSTEP DATA LAKE(HDFS) In-MemoryEngines iPython,R, MATLAB NoSQL SQL

The Bigstep Data Lake matches the distributed replication schema in Hadoop. File blocks are distributed evenly across data-nodes, while also making sure replicas are not on the same machines or disks. Thanks to this replication system, individual disk failures do not affect stored data. Compared to traditional RAID solutions, this has the benefit of increased throughput and performance – users can simultaneously download different parts of a big file from different data nodes.

Simple Infrastructure Integration

The Bigstep Metal Cloud offers complete infrastructure integration with the Data Lake service through multiple protocols:

HDFS – a binary protocol for Hadoop-native applications such as Spark, Kafka, Drill, Flink or NoSQL DBs.
WebHDFS – a HTTP-based protocol that can be used by many web-enabled applications.
FUSE – a locally mounted file system which can be used by any application.

Quick Data Migration

High throughput in the Bigstep Metal Cloud, Bigstep’s high-performance infrastructure layer, gets all of your data where you need it. Large volumes of data that seem forever stuck in cloud storage solutions such as Amazon S3 can be easily migrated through HTTP to the Bigstep Data Lake to benefit from unlimited storage and the added bare metal performance.

Set up your Data Lake in minutes

Enter the Control Center