SOURCE: Qubole


April 26, 2016 09:00 ET

Qubole Extends Big Data-as-a-Service Platform With StreamX

Reliable and Efficient Persistence of Kafka Logs for Big Data Analysis in the Cloud

SAN FRANCISCO, CA--(Marketwired - Apr 26, 2016) - Kafka Summit -- Qubole, the big data-as-a-service company, today announced it has open sourced StreamX, an ingestion service to help data teams efficiently and reliably capture large scale, real-time data. Qubole will be adding support for StreamX as a managed service on the Qubole Data Service (QDS) platform to simplify and automate the ingestion of data for big data analysis in the cloud.

Enterprises are grappling with increasing volumes of data and the need for real-time analysis from multiple data sources to drive business growth. To address this issue, Qubole has created StreamX, an open-source service that ingests the data logs from Kafka and persists it to cloud object stores such as Amazon S3. Without an ingestion service such as StreamX, maintaining reliability and data integrity on Kafka is challenging, particularly in guaranteeing the delivery is without duplicates that could be harmful to critical systems. StreamX is built on the Kafka Connect framework and is designed for reliable, exactly-once delivery.

QDS is a self-service platform for big data analytics that runs on the three major public clouds: Amazon AWS, Google Compute Engine and Microsoft Azure. QDS supports the latest open source technologies, such as Apache Hadoop, Hive, Presto, Pig, Oozie, Sqoop and Spark, to provide the only comprehensive cloud-based data analytics platform, complete with enterprise security features, an easy to use UI and built-in data governance. Now with support for StreamX, Qubole customers will be able to use Kafka to capture high-velocity data generated across thousands of data sources.

"Real-time analytics is being used for everything from mobile applications, financial trading, gaming and even social networks. As the number of data producers increase and become more disparate, it is increasingly valuable to have a central platform to manage the ingestion of this data," said Joydeep Sen Sarma, co-founder and CTO of Qubole. "Adding StreamX was a natural extension for the Qubole platform, which was purpose-built to process ever-growing data and data sources, and we look forward to providing a fully managed service that can do this reliably with just a few clicks."

This comes just weeks after Qubole open sourced Quark, its SQL optimization project to help simplify and optimize access to data for data analysts. Qubole is committed to contributing its projects that address the most critical demands of today's data teams to the open source community. Streaming analytics is increasingly becoming a necessity for enterprises across industries, and as such, Qubole will continue to create tools that enable fast, scalable and reliable real-time data analytics. 

If you would like to learn more, visit Qubole at the Kafka Summit at Booth #1 or contact Look for updates to the StreamX project at

About Qubole
Qubole is a big data-as-a-service company that provides a fast, easy and reliable path to turn big data into valuable business insights. Qubole's cloud-based platform addresses the challenges of processing huge volumes of structured and unstructured data. It uses clouds such as Amazon Web Services, Google Compute Engine and Microsoft Azure to help enterprises extract value out of their big data while enabling their operations teams to be nimble and adaptive to their users' needs. Qubole achieves this through features such as auto-scaled big data clusters and integrated toolsets for data analysts, developers and business users. With more than 250+ PB of data processed every month across its customer base, Qubole's platform makes enterprises agile with big data.