Bigstep DataLab is an open data-exploration service that offers data science, analytics and technology experimentation. It is built on our Spark-as-a-Service and Data Lake offerings and it is running on our highly flexible and high-performance bare-metal cloud infrastructure.
Bigstep offers a single point of contact for support, billing and a one-of-a-kind self-service portal for managing the Blueprint defining your technology stack.
The Bigstep DataLab provides deep integration with existing tools and services via an SQL-compatible interface.
Bigstep DataLab provides a development environment well-suited for collaborative work and point-and-click tools for non-technical analysts.
Management capabilities are available from project inception to actionable business insights.
Bigstep DataLab can easily handle large quantities of data, it can perform complex machine-learning tasks, and it can be quickly stopped or repurposed. Bigstep DataLab was built using three main components, which can help data scientists conduct data research at scale without the need to involve IT, but within an environment IT can fully control:
An infinite repository system where structured, semi-structured, and unstructured data can be stored side by side.
Research results stored in the DataLake can be directly interrogated by ZoomData, Tableau or Qlik via the ODBC connector of the SparkArray, with no need to export the data.
A managed, fully scalable big data compute service capable of machine learning, graph processing and statistics.
The DataLab is able to accommodate any stream-processing application or framework that can use Apache Spark as a processing engine, such as Apache Beam.
A managed service that uses containers to multiplex single-tenant bare-metal machines and runs hundreds of independent workloads and applications on the same physical hosts. High-performance environments for data science, analytics, and even SQL querying can be scaled at the push of a button.
Typically, decision-makers are forced to rely on intuition to deduce the answers to ad-hoc questions. However, intuition should be focused on where to look for data and how to interpret it, rather than being a substitute for data altogether. Moreover, gaining the ability to freely query the available data opens up the door for innovation.
Answering ad-hoc queries is fundamentally different from creating static data-analysis and visualization dashboards. Whereas dashboard and its associated queries and processes are fixed in time and repeatable, ad-hoc queries require a data laboratory, an environment purpose-built for exploration and experimentation.