SOURCE: RainStor


May 24, 2011 09:01 ET

RainStor Delivers Big Data Retention on Cloudera's Distribution Including Apache Hadoop

Extreme Compression and Shared Infrastructure Lowers TCO for Retaining Massive Data Sets to Meet Changing Business Requirements

SAN FRANCISCO, CA--(Marketwire - May 24, 2011) - RainStor, an infrastructure software company specializing in Online Data Retention (OLDR), today announced that RainStor 4.5 can be deployed using Cloudera's Distribution including Apache Hadoop. The result is a pragmatic and scalable approach to Big Data that performs fast analytics while retaining data at a lower overall total cost of ownership (TCO).

RainStor can be used to retain and access massive data sets on the Hadoop Distributed File System (HDFS) at a physical footprint at least 97 percent smaller. The result combines Hadoop's Big Data processing, management and analytics with RainStor for compliant data retention on existing, low-cost servers and storage.

"Hadoop gives organizations the ability to scale for Big Data analytics but the data actually grows as it's replicated across nodes. Reducing the size of data slated for retention makes enormous sense," said Merv Adrian, VP Research, Gartner. "The combination changes the class of hardware and storage required, making the economics even more attractive."

RainStor on HDFS Designed for Petabytes (or more) of Enterprise Data
As enterprises collect and generate more data than ever, RainStor on HDFS, using locally attached commodity storage, offers the lowest initial capital investment and ongoing total cost of ownership for retaining petabytes of data. RainStor's specialized repository compresses the data using a patented value and pattern de-duplication technique and stores it in immutable form on HDFS. RainStor has built-in security, audit trails and granular retention and expiry policies for managing the lifecycle of stored data. Data within RainStor can be accessed through standard structured query language (SQL), specialized RDBMS native SQL and standard BI tools via ODBC/JDBC.

Making the Big Data Problem Smaller
Depending on the Hadoop replication factor, the size of stored data can be a significant multiple of the raw data loaded. To counteract this, most Hadoop deployments rely on the use of binary compression (such as LZO), which typically yields on average 5 to 1 compression and comes with a re-inflation performance penalty upon access. In contrast, RainStor achieves compression rates of 40 to 1 or greater and allows data access without re-inflation.

Example: With 2 petabytes (Pb) of raw data to be stored for a 6-month period, the difference in disk savings could look like this:

  • Data in HDFS: 2 Pb X 3 (for replication) =6Pb + results of analysis.
  • Data in HDFS with RainStor: 0.05Pb (original source data compressed 40 to 1) X 3 (for replication) =0.15Pb + results of analysis. A physical storage savings of 5.85Pb.

Even using low cost commodity disk, as data volumes reach multi-petabytes and beyond, the initial capital expenditure can be significant. More importantly, the overall operating cost of a large number of storage drives continues to be a significant contributing expense that can reach millions of dollars over multiple years. RainStor's compression, lifecycle management and compliant retention features, combined with HDFS' low cost commodity disk and scale out benefits, provide significant value and cost savings for Big Data analysis and retention.

"Cloudera's Distribution including Apache Hadoop is fast becoming the gold standard in enterprise Hadoop deployments," said Ramon Chen, vice president product management, RainStor. "Our partners and their customers face exploding data volumes and extended compliance retention requirements. Organizations that deploy RainStor on HDFS benefit from a scalable online data retention solution at the lowest TCO, while leveraging Hadoop for Big Data analytics."

Product Availability
RainStor 4.5 for Hadoop is available immediately. For more information contact

About RainStor
RainStor is a technology pioneer in Online Data Retention (OLDR). The company's specialized data repository significantly reduces the total cost of retaining data through extreme data reduction, simplified data management and near-perfect scalability on commodity hardware. RainStor solutions are deployed by technology partners across multiple industries and include AdaptiveMobile, HP, Informatica, On Point Technology and Teradata to reduce the cost and complexity of preserving information in the enterprise or the cloud. RainStor has been deployed at over 100 blue chip companies to address key data retention requirements.

RainStor is a privately held company with offices in San Francisco, USA and Gloucester, UK. For more information, visit Join the conversation at

Contact Information

  • Contact:
    Julie Tangen
    Kulesa Faul for RainStor
    Email Contact