SOURCE: Databricks

July 27, 2016 06:00 ET

Databricks Becomes the First Vendor to Provide Support for Apache® Spark™ 2.0 on Its Just-in-Time Data Platform

Apache Spark 2.0 General Availability Brings Speed, Simplicity, and Structured Streaming to Users

SAN FRANCISCO, CA--(Marketwired - Jul 27, 2016) - Databricks, the company founded by the team that created Apache® Spark™, today announced that Apache Spark 2.0 is generally available on its just-in-time data platform, making it the first vendor to offer Apache Spark 2.0 support. With major contributions from Databricks and the Spark community, this is the first major release of open source Spark since Spark 1.6 in 2015. Databricks customers can now immediately benefit from Spark 2.0's three core attributes -- easier, faster, and smarter.

"Since the release of Spark 1.0, we've spent countless hours listening to members of the Spark community and Databricks users to learn from a mix of praises and complaints. Spark 2.0 builds on what the community has learned, doubling down on what users love and improving on what users lament," said Databricks' Chief Architect and Cofounder, Reynold Xin.

Among other major improvements as outlined in the Databricks blog post, the most notable features of Apache Spark 2.0 are:

  • Speed: Gaining huge performance in orders of 5 to 10 times faster than Spark 1.6 for some Spark operators due to Tungsten's Phase 2 whole-stage-code generation and Catalyst's code optimization;
  • Simplicity: Unifying developer APIs across Spark's libraries such as DataFrames and Datasets;
  • Structured Streaming: Laying the foundation for continuous applications by providing high-level declarative streaming APIs based on DataFrames and Datasets built atop Spark SQL engine that works on real-time data;
  • Machine Learning Model Persistence: Saving and loading pipelines and models across all programming languages supported by Spark;
  • DataFrame-based Machine Learning APIs: Emerging as the primary MLlib package with its "pipeline" APIs and focusing future developments on DataFrame-based API;
  • Standard SQL Support: Expanding Spark's SQL capabilities for SQL:2003 features, introducing new ANSI SQL parser, and supporting scalar and predicate type subqueries.

"One of the things that's really exciting for me as a developer of Apache Spark is seeing how quickly users start to use new features and APIs we introduce, and in turn, offer almost instantaneous feedback, so that we can continue to improve them," said Matei Zaharia, CTO and co-founder of Databricks and creator of Apache Spark.

For Databricks users, immediate access to Apache Spark 2.0 to create new clusters is as simple as selecting the release from its menu -- all completed with a few clicks. Spark 2.0 is highly compatible with Spark 1.6, so migrating code should require minimal effort.

By making Spark 2.0 instantly accessible within a fully managed data platform, Databricks affords its users a full suite of tools to harness the open source 2.0 release advancements and ensure end-to-end security, giving data scientists and data engineers the easiest way to analyze data, perform advanced analytics, and deploy Spark applications.

"Spark is becoming a staple for enterprise big data strategies with its speed and simplicity. Deploying the Apache Spark 2.0 release through Databricks' platform enables businesses to translate Spark's innovations into a competitive edge faster, while getting support from the people who are core to the Apache project," said Tony Baer, Principal Analyst at Ovum.

The Apache Spark 2.0 features are available and supported today for all Databricks customers. To sign up for Databricks, visit or contact

To learn more, read the Databricks blog post here:
Try Apache Spark 2.0 on Databricks:

About Databricks:
Databricks' vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache® Spark™, a powerful open source data processing engine built for sophisticated analytics, ease of use, and speed. Databricks is the largest contributor to the open source Apache Spark project providing 10x more code than any other company. The company has also trained over 20,000 users on Apache Spark, and has the largest number of customers deploying Spark to date. Databricks provides a just-in-time data platform, to simplify data integration, real-time experimentation, and robust deployment of production applications. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, contact

© Databricks 2016. All rights reserved. Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.

Contact Information