Bigstep Real-Time Spark Service

In-Memory Data Analytics Engine

Bigstep Real-Time Spark Service is a pay-per-use, fully managed, auto-scaled container-based Spark cluster. It is interconnected with Jupyter Notebook to provide enhanced data processing, analytics and visualization capabilities that can be used by data science teams alongside business intelligence experts and non-technical stakeholders.
Flip Your Jupyter Notebook Open

Plug-and-Play In-Memory Analytics

Bigstep Real-Time Spark Service supports multiple kernels that are connected with Spark. It is a flexible solution, offering support for several programming languages such as Python, Scala, R – all common vernacular in the data science world. By using the Jupyter Notebook, data scientists can combine code, graphs, dashboards, and descriptive texts within the same document and can perform operations interactively.

Tightly Integrated with Apache Spark

You can execute Spark, Spark SQL or SparkR code to enhance the data processing, machine learning and analytics capabilities of your standard notebooks.

High Performance on Each Task

The Spark cluster is fine-tuned to offer high performance when executing tasks. All configurations are dynamically adapting to the container’s specified hardware requirements.

Interactive Data Visualizations

Bigstep Real-Time Spark Service offers interactive visualizations by using the already-embedded libraries or others of your choice in the preffered programming language.

Resilient and Fault-Tolerant

Built on top of Bigstep Managed Containers, Real-Time Spark Service benefits from all the performance and scalability provided by the underlying infrastructure.

Automatically Scalable

The underlying infrastructure scales automatically to accommodate the deployed containers, based on the specified hardware requirements.

Security at the Firewall Level

The Bigstep Real-Time Spark Service infrastructure is protected at the firewall level. Customers can configure the Firewall in the Control Center and have full control over the added rules.

Spark up Your Business on Bare Metal

Create Your Container-Based Environment with Standalone Applications

Customers can use Bigstep Real-Time Spark Service in the context of the DataLab solution, alongside Zoomdata and Bigstep DataLake, or can use it with applications deployed on-premises or on Bigstep Managed Containers.

Tightly Integrated within the Bigstep Environment

Bigstep Real-Time Spark Service can work with Bigstep DataLake out of the box, because it is pre-authenticated via Kerberos to this shared HDFS system.
Furthermore, Real-Time Spark Service can be easily connected with other applications running on Bigstep Metal Cloud.
All these features translate in a significant reduction of the time spent configuring, installing and fine-tuning the platform. In turn, this gives way to an increased focus on innovation within big data science teams.

Who Can Use Bigstep Real-Time Spark Service?

As organizations become more data-driven, it is pivotal that we enable conversations around data with access to algorithms, queries, formulas, models, visualizations, and machine learning. As a result, notebooks are starting to be used in Business Intelligence departments as well.
The notebook is an emerging technology present in the data scientist's toolbox, since it offers a unified interface for both exploratory work and production use. With the integrated Jupyter Notebook as a working interface, Bigstep’s Spark Array can be utilized by:

  1. Data science teams in performing their day to day activity.
  2. Business professionals and business intelligence specialists in the process of gathering results from the data science teams and making decisions based on what they’ve learned.
  3. In research projects, when multiple technologies are being tested and a flexible platform can accelerate the project and its outcomes.
    For advanced analytics and visualization, Bigstep offers users the freedom to import packages that are not included in the standard offering.