SOURCE: Cloudera


September 22, 2010 09:00 ET

Cloudera and EMC Greenplum Team Up to Expand the Way Companies Collect, Process and Store Data

Integration of Cloudera's Distribution for Hadoop With Greenplum Provides New Opportunities for Analysis of Structured and Complex Data

PALO ALTO, CA--(Marketwire - September 22, 2010) -  Cloudera, a leading provider of Hadoop-based data management software and services, and EMC Data Computing Division announced an alliance that will enable the integration of Cloudera's Distribution for Hadoop (CDH) and Greenplum technology. The integration between CDH for collecting, consolidating and analyzing data with EMC Greenplum's massively parallel processing database and enterprise data cloud platform will provide a robust architecture for collaborative analysis of large amounts of structured (i.e. online databases) and unstructured (i.e. log files, sensor data, documents) data. 

As part of the alliance, Cloudera will build a connector between Cloudera's Distribution for Hadoop and Greenplum technologies. The connector will enable high-speed bi-directional data transfer between the systems and will be jointly supported by both Cloudera and Greenplum. Additionally the Greenplum sales team will be trained on Cloudera's suite of Apache Hadoop based products and services.

The alliance between EMC Greenplum and Cloudera will change the way customers collect, process and store data. Today, customers use a combination of database and archive storage products to collect, process and store complex and structured data. They are required to shuttle the data between systems, transforming and structuring it before they can analyze it. As data volumes and types grow, there is no single place to store and process all of this data.

Hadoop is becoming an increasingly popular solution to this problem. Customers are able to easily stage their data in a single Hadoop-based repository, leveraging its ability to inexpensively store both complex and structured data. They can then iterate over data using MapReduce to process and analyze the data, create meta-data layers, and transform the data for loading into a Greenplum database. Additionally, customers can combine long-term historical and new data enabling deeper insight and the detection of patterns not visible over short time periods.

"Together EMC and Cloudera have a real opportunity to help companies change the way they collect, process and store data," said Michael Olson, CEO of Cloudera. "Organizations can use CDH to inexpensively capture complex and structured data, while Greenplum Chorus utilizes its cloud-based platform to discover data from a variety of sources and enables collaborative analysis for end users."

"EMC is building the data system of the future, a system that brings together all of your data, all of your tools, and all of your people," said Bill Cook, President and General Manager of EMC's Data Computing Division. "EMC and Cloudera represent a powerful combination of what we can deliver to customers. By bringing together our solutions, our customers have a powerful tool for collaborative data analysis and can more quickly and effectively analyze data from a variety of sources."

CDH is the most comprehensive and broadly adopted Hadoop-based platform on the market, lowering the barrier to Hadoop adoption by making it simple to install and easy to integrate into the data center. It consists of core Apache Hadoop and eight additional open source projects, all tested and integrated into a single platform, making it the most complete Hadoop-based distribution. For more information about CDH, visit

EMC will be exhibiting and presenting on its relationship with Cloudera at the annual Hadoop World conference taking place in New York City on October 12. Attend Hadoop World 2010 for additional examples of Hadoop in the enterprise.

About Cloudera

Cloudera ( is a leading provider of Hadoop-based software and services and works with customers in financial services, web, telecommunications, government and other industries. The company's products, Cloudera Enterprise and Cloudera's Distribution for Hadoop, help organizations profit from all of their information. Cloudera's Distribution for Hadoop is the most comprehensive Apache Hadoop-based platform in the industry. Cloudera Enterprise is the most cost-effective way to perform large-scale data storage and analysis and includes the tools, platform and support necessary to use Hadoop in a production environment. Cloudera provides professional services, technical support and training to help any business use the software created by Google, Facebook and Yahoo! Founded by pioneers in large-scale data and home of the original Apache Hadoop creator, Cloudera is a private company backed by venture investors Accel Partners and Greylock Partners with headquarters in Palo Alto, California.