SAN FRANCISCO, CA--(Marketwired - Jun 15, 2015) - SPARK SUMMIT -- Databricks, the company behind Apache Spark, today announced the general availability of its cloud-hosted data platform (formerly known as Databricks Cloud). The Databricks platform makes it easy to turn data into value, from ingest to production, without the hassle of managing complex infrastructure, systems and tools. The announcement was made at Spark Summit in San Francisco, Calif., the premier event bringing together the growing Apache Spark and Databricks community.
Today's data scientists, data engineers and developers need to cobble together various complex infrastructure, tools and systems to meet their day to day data needs, severely inhibiting their ability to generate business value quickly. By combining the power of Spark with a zero-management hosted platform, Databricks removes this critical bottleneck, enabling these data professionals to focus on finding answers from their data instantly and to build value creating data products.
"We're beyond thrilled to bring Databricks to the masses. Apache Spark has come a long way since its inception at UC Berkeley and we're enthused to make it available to thousands of organizations. We've been working closely with our customers to help them get the most out of their deployments and are eager to bring Spark's power, ease-of-use, speed and flexibility to organizations that have big plans for their equally big data," said Ion Stoica, CEO, Databricks.
Following the general availability announcement, Databricks will unveil an exciting range of new features planned for the second half of this year at Spark Summit in San Francisco including:
- R-language notebooks: Analyze large-scale data sets using R in the Databricks environment.
- Access control and private notebooks: Manage permissions to view and execute code at an individual level.
- Version control: Track changes to source code in the Databricks platform.
- Spark streaming support: Enabling fault-tolerant real-time processing.
Databricks is available as a hosted platform on Amazon Web Services with a monthly subscription.
"We use Databricks to speed up prototyping our machine learning pipeline development. We're looking at hard problems like progressively improving matches between voter records and other representations of people through machine-learning, and before Databricks, it took roughly three times as long," said Andy Barkett, CEO, GetExp. "We appreciate that we can quickly pull data from a variety of sources, including relational databases, flat files, and JSON-stores."
"Powering great dining experiences requires us to analyze many different kinds of data across many machines. Databricks enables us to rapidly iterate from ideas to data-driven insights and new features. In particular, we have leveraged Databricks with Spark's MLlib to build out machine learning models that provide personalized restaurant recommendations and help diners discover the perfect restaurant for their occasion. Spark's combination of speed, flexibility, and access to machine learning out of the box enables us to innovate faster," said Jeremy Schiff, Senior Data Science Manager, OpenTable.
"With the announcement of general availability, Databricks should be on the radar of data scientists or engineers who are tackling complex data problems," said Tony Baer, Principal Analyst at Ovum. "Built around the Spark engine, Databricks allows data professionals to easily deploy critical Spark workloads such as machine learning, streaming and stream processing, without the hassle of managing complex infrastructure and disparate systems and tools."
Apache Spark Momentum and 1.4 Release
The general availability of Spark 1.4 was also announced last week. Spark 1.4 will be the largest Spark release to date with more than 220 contributors and 1,200 commits. Spark 1.4 introduces a new R language API (SparkR) and adds new features in Spark's core engine and all standard libraries. Spark 1.4 also boasts a large number of new features in this release including:
- Expansion of Spark's Dataframe APIs: window functions, statistical and mathematical functions, support for missing data.
- Machine learning pipelines API graduates from alpha (adds feature parity in Python and stable API for developers)
- Added UI visualizations for debugging and monitoring programs (interactive event timeline for jobs, DAG visualization, visual monitoring for Spark Streaming).
While Spark is the most active open source project in the big data ecosystem with over 500 contributors, Databricks is committed to Apache Spark and continues to lead the community of contributors. Spark has been adopted by a number of platform vendors, including all of the major Hadoop distributors. As both the creators of Spark and the company leading its evolution to be enterprise-ready, Databricks has contributed over 75 percent of the code added to Spark the end of 2014 alone. To learn more about Spark 1.4, read the Databricks blog posts here: http://databricks.com/blog/2015/06/11/announcing-apache-spark-1-4.html
Contact firstname.lastname@example.org for a demo or sign-up for a free trial at www.databricks.com/registration. Additional resources include:
Databricks' vision is to dramatically simplify big data processing. It was founded by the team that created and continues to drive Apache Spark, a powerful open source data processing engine built for sophisticated analytics, ease of use, and speed. Databricks offers a cloud platform that makes it easy to turn data into value, from ingest to production, without the hassle of managing complex infrastructure, systems and tools. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, visit http://www.databricks.com.