Data Exploration as a Service
Bigstep DataLab is a highly integrated data-processing service that enables domain experts, business analysts, and data-science groups to collaborate in helping organizations make data-driven decisions. Bigstep DataLab provides state-of-the-art architectures that stream high-velocity data and enable real-time automated reactions.
It is time for smart business decisions
Seamless Access to Data
We use Bigstep DataLake as a universal storage for structured and unstructured data. It is a single source of truth used by the Spark processing engine and then accessed and manipulated by the data scientist or the business analyst using specific tools. We use Jupyter and Zoomdata as preferred tools but the DataLab can also integrate with Qlik, Tableau, Zeppelin and almost anything that speaks SQL or HDFS.
We also offer Kafka as a service and support building Kafka-based consumers and producers that link the various components of your real-time architecture together.
Highly Secure Environment
Bigstep DataLab exceeds industry security standards and offers highly granular permission control.
Self-Service Data Science Frontend
Bigstep DataLab provides a development environment designed for collaborative work and point-and-click tools for non-technical analysts.
Large-Scale Data Processing Framework
Bigstep DataLab provides essentially limitless scalability, performance, and resilience.
Real-Time Data Streaming
Bigstep DataLab allows diverse data streaming from multiple sources and ad-hoc data analysis.
Highly Compatible with Existing Apps
Bigstep DataLab provides deep integration with existing tools and services via an SQL-compatible interface.
Schema-on-Read Capabilities
Working with structured, semi-structured and unstructured data side by side has never been easier.
How We Do It
Bigstep DataLab can easily handle large quantities of data, it can perform complex machine-learning tasks, and it can be quickly stopped or repurposed. Bigstep DataLab was built using three main components, which can help data scientists conduct data research at scale without the need to involve IT, but within an environment that IT can fully control:
Bigstep
DataLake
An infinite repository system where structured, semi-structured, and unstructured data can be stored side by side. Research results stored in the DataLake can be directly interrogated by ZoomData, Tableau or Qlik via the ODBC connector of the Spark cluster, with no need to export the data.
Bigstep Real-Time
Spark Service
A managed, fully scalable big data compute service capable of machine learning, graph processing, and statistics. Bigstep DataLab is able to accommodate any stream processing application or framework that can use Apache Spark as a processing engine, such as Apache Beam.
Bigstep Real-Time
Container Service
A managed service that uses containers to multiplex single-tenant bare-metal machines and runs hundreds of independent workloads and applications on the same physical hosts. High-performance environments for data science, analytics, and even SQL querying can be scaled at the push of a button.
Frequently Asked Questions
How do I connect securely to Bigstep Metal Cloud?
We offer a VPN service that connects to your current infrastructure (such as AWS or on-premises data centers). Bigstep Metal Cloud supports standard ipsec protocols and is fully compatible with most enterprise security standards.
What is the latency between Bigstep and my AWS region?
Typically, we are around 5 to 10ms away.
What’s the secret sauce behind Bigstep's 2-5x performance improvement?
Latency matters a lot in analytics. In virtual clouds, it creeps in, induced by the hypervisor, by the many layers of the virtual network, storage caching techniques, etc. We have removed every element that could introduce latency by using only bare-metal nodes, 4x10Gbps cut-through east-west optimized network, PCI-e attached NVMe storage and distributed SSD-based block storage. We are squeezing every bit of performance out of the hardware without sacrificing flexibility.
Will latency affect my app’s performance?
Not all workloads require a low-latency setup. It all depends on the latency tolerance of your application.