SOURCE: DataTorrent


July 30, 2015 09:00 ET

DataTorrent Simplifies Data Ingestion and Extraction for Hadoop With DataTorrent dtIngest

DataTorrent Releases Project Apex Source Code on GitHub

SANTA CLARA, CA--(Marketwired - Jul 30, 2015) - DataTorrent, the leader in real-time big data analytics and creator of DataTorrent RTS, the world's first enterprise-grade unified platform for both stream and batch processing on Hadoop, today announced the availability of the first ingestion application for Hadoop, DataTorrent dtIngest. DataTorrent dtIngest simplifies the collection, aggregation and movement of large amounts of data to and from Hadoop for a more efficient data processing pipeline and is available to organizations for unlimited use at no cost.

"Getting data in and out of Hadoop is a challenge for most enterprises, and yet still largely neglected by current solutions. No existing tool handles all the requirements demanded for Hadoop ingestion. Without proper ingestion and data management, Hadoop data analysis becomes much more troublesome," said Jason Stamper, analyst, 451 Research. "DataTorrent dtIngest delivers an enterprise-grade user experience and performance."

DataTorrent also announced today that the GitHub repository for Project Apex is now available. Project Apex is the Apache 2.0 open source unified batch and stream processing engine that forms the core foundation of DataTorrent RTS 3. DataTorrent RTS 3 Community edition is the DataTorrent certified version of Project Apex. DataTorrent RTS 3 Enterprise Edition offers additional capabilities for operational management, easy development and data visualization on top of the community edition. Both editions are now generally available and downloadable at

"Hadoop ingestion is difficult and often prevents enterprises from gaining value from Hadoop, creating inefficiencies in the analysis process and stalling data initiatives altogether," said Phu Hoang, CEO and co-founder, DataTorrent. "With the release of DataTorrent dtIngest, we now provide a free application to overcome this challenge. DataTorrent dtIngest, built on the enterprise-grade Project Apex, delivers secure, high performance and fault tolerant data ingestion for any Hadoop-based project."

DataTorrent dtIngest makes configuring and running Hadoop data ingestion and data extraction a point-and-click process and includes enterprise-grade features not available in the market today:

  • Apache 2.0 open-source Project Apex based - Built on Project Apex, dtIngest is a native YARN application. It is completely fault tolerant, unlike other tools such as distCP, and can "resume" file ingest on failure. It is horizontally scalable and supports extremely high throughput and low latency data ingestion.
  • Simple to use and manage - A point-and-click application user interface makes it easy to configure, save and launch multiple data ingestion and distribution pipelines. Centralized management provides visibility, monitoring and summary logs.
  • Batch as well as stream data - dtIngest supports moving data between NFS, (S)FTP, HDFS, AWS S3n, Kafka and JMS so you can use one platform to exchange data across multiple endpoints.
  • HDFS small file ingest using 'compaction' - Configurable automatic compaction of small files into large files during ingest into HDFS helps prevent running out of HDFS namenode namespace.
  • Secure and efficient data movement - dtIngest supports compression and encryption during ingestion and is certified with Kerberos-enabled secure Hadoop clusters.
  • Runs in any Hadoop 2.0 cluster - Certified to run across all major Hadoop distributions in physical, virtual or in-the-cloud deployments.

Project Apex meetups announced
Following the announcement of Project Apex, several regional meetups have been established. To sign up for an existing meetup or request one in your area, please go to

About DataTorrent
DataTorrent is the leader in real-time big data analytics. DataTorrent RTS is the industry's only solution to have a high performing, fault tolerant unified architecture for both data in motion and data at rest. DataTorrent RTS is proven in production environments to reduce time to market, development costs and operational expenditures for Fortune 100 and leading Internet companies. Based in Santa Clara, California, DataTorrent is backed by leading investors including August Capital, GE Ventures, Singtel Innov8, Morado Ventures, and Yahoo co-founder Jerry Yang. For more information, visit our website or follow us on Twitter.

Contact Information

  • Media Contact:
    Nolan Necoechea
    LEWIS PR for DataTorrent
    Email Contact