Redis Accelerates Spark by Over 100 Times


MOUNTAIN VIEW, CA--(Marketwired - Feb 2, 2016) - Redis Labs, the home of Redis, today announced its integration with Spark SQL and the release of the Spark-Redis connector package. Redis Labs' benchmarks using time-series data show that running Spark on Redis as a data store results in 135 times faster processing compared to Spark using HDFS and 45 times faster processing compared to Spark using Tachyon as an off-heap data store or Spark storing the data on-heap. 

The Spark-Redis connector package is open source and provides a library for writing to and reading from a Redis cluster with access to all of Redis' data structures -- string, hash, list, set, sorted set, bitmaps, hyperloglogs -- from Spark as RDDs. In addition, the package also ensures close cluster alignment between Spark and Redis clusters, reducing network overhead and ensuring optimal processing times.

The main advantages of using Redis with Spark include:

  •  Acceleration of Spark performance by over 100 times, in use cases such as Spark time-series
  • Redis data structures allow elements of data to be accessed individually and rapidly, minimizing serialization/deserialization overhead and avoiding transfer of large chunks of data

"Big data is coming of age and customers are demanding that big data insights are extracted in real-time," said Yiftach Shoolman, co-founder and CTO of Redis Labs. "This is where Redis Labs fills the gap by delivering both the right performance and optimized distributed memory infrastructure to accelerate Spark. Our goal is to make Redis the de-facto data store for any Spark deployment."

The Spark-Redis solution enables:

  • Redis data structures exposed via Spark RDD and DataSet API
  • Spark SQL support (via DataFrame and DataSource API) as a standard query interface
  • Use of Redis Cluster as a distributed memory infrastructure for Spark

Additional planned enhancements to the solution include using the combination of Spark and Redis for other popular use cases such as graph computation and machine learning.

"The Spark-Redis connector package was developed by Redis Labs in close collaboration with Databricks," says Patrick Wendell, vice president of engineering at Databricks. "Spark and Redis are a powerful combination enabling sophisticated analytics with a great deal of simplicity and speed."

"Spark with Redis is a combination that we have been waiting for," says Yuval Levav, VP, R&D of CoolaData. "The new Spark-Redis solution will allow us to deliver analytics in real-time and bring instant insights to our customers."

"Apache Spark is becoming a default in-memory engine for high-performance data integration and analytics," said Matt Aslett, research director, data platforms and analytics at 451 Research. "The combination of Redis and Spark should enable high-performance, real-time analytics with extremely large and variable datasets."

To start using the Spark-Redis connector, visit: http://spark-packages.org/package/RedisLabs/spark-redis.

To learn more about the benchmark visit: https://redislabs.com/lp-redis-accelerates-apache-spark.

About Redis Labs
Redis Labs is the open source home and commercial provider of Redis, a database benchmarked as the world's fastest. Gartner has named the company as a Leader in its 2015 ODBMS Magic Quadrant. Redis Labs' software and service solutions power cutting edge applications with blazing fast enterprise-class Redis and are trusted by thousands of customers for high performance, seamless scalability, true high availability and best-in-class expertise. These solutions enhance popular Redis use cases such as real-time analytics, fast high-volume transactions, in-app social functionality, application job management, queuing and caching.

Redis is ranked the #1 NoSQL (and #2 database) in User Satisfaction and Market Presence by G2 Crowd, the top database technology on Docker by Datadog, the most popular NoSQL database in containers by DevOps.com and ClusterHQ, the #1 NoSQL among Top 10 Data Stores by Stackshare and both the fastest growing database since January 2013 and one of the top three NoSQL databases by DB-engines.

Contact Information:

Media Contacts:
Leena Joshi
Redis Labs

(408) 391-5616

Melissa Roxas
Inner Circle Labs for Redis Labs

(415) 684-9401