SOURCE: Databricks

March 18, 2015 09:30 ET

Databricks Announces 'Jobs' Feature for Databricks Cloud at Inaugural Spark Summit East

Summit to Feature Databricks CEO Keynote and Sessions Lead by Apache Spark Users in a Variety of Industry Verticals

NEW YORK, NY--(Marketwired - Mar 18, 2015) - Databricks -- the company founded by the creators of the popular open-source big data processing engine Apache Spark with its flagship product, Databricks Cloud -- today introduced "Jobs," a feature for Databricks Cloud at the inaugural Spark Summit East. Hosted by Databricks, the Summit will include over thirty high quality sessions that showcase Spark's momentum and use cases from top talent in the Spark community and leading production users, including Salesforce, Intel, DataStax, MyFitnessPal, Box, and more.

As the latest update to Databricks Cloud, Jobs enables data scientists and engineers to easily schedule and manage production pipelines to run Spark workloads without any human intervention. Built to integrate seamlessly with Databricks Cloud, this new feature can perform periodic ingest, transformations, and processing of data in Databricks Cloud automatically. 

Jobs supports the creation of production pipelines using Databricks Cloud notebooks as well as standalone Spark applications, enabling Databricks Cloud users to seamlessly transition from exploration to production workloads. As a result of the Jobs feature, time spent on developing, scheduling, and managing complex Spark workloads will be dramatically reduced.

Jobs also runs on clusters using both Amazon Web Services on-demand as well as spot instances. Additional capabilities include:

  • The ability to set up new Spark clusters or reuse existing clusters for the execution of jobs.
  • A flexible job scheduler that guarantees timely execution of Spark applications. 
  • A notification service that will email Jobs owners of important events, such as failures. 

"Jobs holistically automates and eliminates the repetitive, manual, human processing element typically required to properly schedule, sequence and execute these production pipelines -- generating significant time and cost savings through improved productivity and better use of strategic resources," said Ion Stoica, CEO of Databricks. "Databricks Cloud makes it easy for users to get started on analyzing their business-critical data within minutes. Spark Summit East is the perfect avenue for this announcement since it's exciting to hear such a dynamic lineup of speakers discuss their unique way of innovating and simplifying Big Data with Spark."

To learn more about Jobs, read the Databricks blog post here: https://databricks.com/blog/2015/03/18/databricks-launches-jobs-feature-for-production-workloads.html

The First Annual Spark Summit East
As evidenced by the expansion and launch of the inaugural Spark Summit on the East coast, Spark continues to drive tremendous adoption in the enterprise going into 2015, with additional major initiatives planned throughout the year that will empower large scale data science, integrate rich data sources, and continue to simplify deployment with Databricks Cloud.
Spark Summit East is currently taking place at The Sheraton Times Square in New York City. See the agenda for a full list of tracks and sessions.

Industry-Wide Apache Spark Momentum Fueling Customer Adoption and Growing Partner Ecosystem
Apache Spark has become the most active open source project in the Big Data ecosystem with over 500 contributors, adoption by a number of platform vendors -- including all of the major Hadoop distributors -- and over 200 enterprises now deploying Spark in production. As the creators of the thriving processing engine, Databricks has contributed over 75 percent of the code added to Spark. 

The timing of Spark Summit East 2015 could not be better, with the general availability of Spark 1.3 announced last week. Spark 1.3 introduces the widely anticipated DataFrame API, an evolution of Spark's RDD abstraction designed to make crunching large datasets simple and fast. Spark 1.3 also boasts a large number of improvements across the stack, from Streaming, to ML, to SQL.

To learn more about Spark 1.3, read the Databricks blog post here: https://databricks.com/blog/2015/03/13/announcing-spark-1-3.html

"With its Spark framework, Databricks Cloud meets all our critical needs by speeding up our interactive analysis for consistent and reliable data exploration, and enables us to explore novel techniques for analyzing and visualizing real-time automotive data," said Rob Ferguson, director of engineering at Automatic Labs. "Our engineers and data scientists are now instantly more productive out of the box, and can explore our massive data sets generated by users and their vehicles instantly and interactively."

"The Zoomdata BI engine runs natively on Spark, and is one of the first BI tools to be fully integrated with Databricks Cloud. Where other BI technologies simply offer SparkSQL connectors that copy query results out of Spark, Zoomdata uses the full power of Spark to perform all calculations, joins, pivots, paging, filtering, and analytical operations," said Justin Langseth, CEO/Founder of Zoomdata. "And with our SparkIT functionality, Zoomdata can move raw or aggregated data from files or slow legacy databases and data warehouses into Spark Dataframes, allowing Zoomdata's patented data sharpening engine to visualize the data in seconds."

About Databricks:
Databricks was founded by the team that created and continues to drive Apache Spark, the most active open source project in the Big Data ecosystem. Databricks' vision is to dramatically simplify big data processing and free users to focus on turning data into value. Databricks Cloud, a cloud platform built around Apache Spark, delivers on this vision by combining the power of Spark with a zero-management hosted platform and an initial set of applications built around common workflows. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, visit http://www.databricks.com.

Contact Information