SOURCE: Pepperdata

Pepperdata

September 22, 2016 07:30 ET

Pepperdata Optimizes Amazon EMR Clusters to Increase Job Performance by up to 4x

New Self-Service Offering Provides Over 300 Real-Time and Historical Metrics to Troubleshoot EMR Workloads, Dramatically Increasing Performance and Helping Customers Control Costs

CUPERTINO, CA--(Marketwired - Sep 22, 2016) - Pepperdata, the world experts in the performance of distributed systems at scale, today announced a new offering that enables customers of Amazon Elastic MapReduce (EMR) to run jobs up to four times faster and simultaneously cut costs. With a one-click install, joint customers gain instant, granular visibility into their clusters' run time performance, which today is not possible through Amazon alone. Even after an Amazon EMR cluster has completed its work and terminated, users will be able to access fine-grained monitoring data that allows customers to view a run and analyze it, as well as compare it with historical data to improve future performance. Pepperdata customers can take advantage of this new service free of charge until December 31, 2016.

Because Amazon EMR clusters are short lived, once a run is complete the cluster terminates, taking all performance data along with it. As a result, visibility into job performance is essentially non-existent, making it very difficult to pinpoint areas of improvement that can decrease run times and costs for customers. Pepperdata's granular analysis of runs -- based on over 300 metrics, including CPU, memory, unused capacity, and job duration -- helps DevOps teams optimize workloads and decrease run times caused by code inefficiencies. This instant visibility into cluster utilization also makes it easy for customers to determine the right amount of compute needed to complete jobs on time and at the lowest cost.

"Amazon EMR is designed to help companies process huge amounts of data easily and cost-effectively without having to commit unnecessary resources," said Sean Suchter, CTO, Pepperdata. "As customers embrace Hadoop in the cloud, they need to be able to manage cost and performance without any big surprises. Pepperdata eliminates those blind spots with very granular insight into the performance of current and historical EMR runs."

Even Small Reductions in Run Time Can Yield Significant Savings
Managing cost is the top priority for customers using Amazon EMR. Because billing in Amazon EMR is hourly, any reduction in run time can have a demonstrable impact on overall cost.

One of Pepperdata's customers, a leading online real estate destination, wanted to reduce a specific run in Amazon EMR that consistently required 17 hours to complete. By analyzing the metrics that Pepperdata collected and stored after termination, the customer was able to identify areas of improvement and use the workload analysis to decrease the same run to four hours. Pepperdata's unique insights into cluster utilization quickly and accurately identified areas of inefficiency, leading to hundreds of thousands of dollars in annual cost savings for a single job.

Automated, Adaptive Scaling for Amazon EMR
In addition to the self-service option for Amazon EMR, Pepperdata is also today announcing the beta availability of Adaptive Scaling for Amazon EMR. With Adaptive Scaling, customers can specify a time or cost budget for job completion and Pepperdata will automatically purchase instances with Amazon EMR that will elastically grow or shrink as needed to meet these criteria. Adaptive Scaling for Amazon EMR will be available in beta in Q1 2017, and Pepperdata is accepting early sign-ups here.

To get started today using Pepperdata with Amazon EMR, visit: Pepperdata.com/EMR

About Pepperdata
Pepperdata develops software that governs and guarantees consistent, peak performance of Hadoop clusters from hundreds to thousands of nodes. Enterprises, from Fortune 500 companies to SMBs, trust Pepperdata to deliver transparency and control over distributed systems, and eliminate blind spots in Hadoop environments. Pepperdata provides the only solution that can anticipate and avert cluster performance issues at both the user and job level to create order out of the chaos inherent in distributed computing. Its Adaptive Performance Core™ has predictive learning capabilities that can anticipate a cluster's performance by looking 30 seconds into the future to anticipate changing conditions. Pepperdata then uses this information to reshape application usage of CPU, RAM, network and disk without user intervention, so that jobs can complete on time. Pepperdata software dynamically prevents bottlenecks in multi-tenant, multi-workload clusters so that numerous users and jobs can run reliably on a single cluster at maximum utilization, increasing throughput by 30 to 50 percent. Job performance is enforced on the fly based on priority and current cluster conditions, eliminating fatal contention for hardware resources and the need for workload isolation. The software also precisely pinpoints where problems are occurring so that IT teams can quickly identify and fix troublesome jobs. By capturing global knowledge of each cluster and controlling processes second by second to deliver Quality of Service, the software reclaims control over unpredictable cluster environments so that enterprises can realize untapped value from existing distributed infrastructures. The distributed systems supervisor installs in under an hour runs on existing clusters, and is compatible with all major Hadoop distributions. With Pepperdata, organizations can put their big data to use in production to meet business objectives today and satisfy future use cases.

Contact Information

  • Media Contact:
    Bhava Communications
    Brianna Galloway
    925.922.0708
    Email Contact