SOURCE: dataguise


March 18, 2013 16:17 ET

Dataguise Survey Reveals 80 Percent of Enterprises Placing Importance on Identifying Sensitive Data in Hadoop Environments

Social Security Numbers, Credit Card Information and Other Personally Identifiable Information Important to Secure in Hadoop Deployments

FREMONT, CA--(Marketwire - Mar 18, 2013) - Dataguise (, a leading innovator of data security intelligence and protection solutions, today announced the results of a recent survey on the importance of securing sensitive data in Apache Hadoop environments. The study was conducted at the O'Reilly Strata Conference and the RSA Conference where survey participants highlighted the need to secure sensitive information in Hadoop deployments, reducing the risk for non-compliance.

According to analysts, IT organizations with Apache Hadoop deployments should be aware of the potential security problems. In particular, the use of Hadoop to combine and store data from several sources can result in a number of problems related to identifying and securing sensitive data. Hadoop deployments can include a variety of data classifications with disparate security requirements. The key to ensuring compliance is to select the appropriate security solution for the Hadoop distribution.

In the qualitative enterprise user Hadoop survey conducted by Dataguise, data from 62 enterprise respondents was collected during the recently held O'Reilly Strata and RSA Conferences. Key findings of the survey included the following:

  • 80% of the enterprises surveyed feel it is important to know whether sensitive data is stored in their Hadoop environment.
  • 77% feel it is important to protect access to the sensitive data stored in their Hadoop environment.
  • 33% store sensitive data in Hadoop, including social security numbers, credit card numbers and addresses.
  • 43% of survey participants are currently testing Hadoop and 31% have active production environments.
  • Data in Hadoop environments consists primarily of log files (55%), followed by structured DBMS data (36%) and mixed data types (24%).
  • Company divisions using Hadoop include marketing (28%), sales (23%), customer support (23%) and the balance by other divisions.
  • Major challenges faced during Hadoop implementations include lack of skills (35%), Hadoop usability (23%) and security management (21%).

As petabytes of new data accumulate and propagate across businesses, much of this data comes from external sources and from customer interaction channels, such as web sites, call centers, Facebook, and Twitter. Other data originates from traditional data repositories such as RDBMS and file servers. To mine these large volumes and varieties of data in a cost efficient way, companies are adopting new technologies such as Apache Hadoop. Line of business managers are benefiting from Hadoop and its ability to enable the analysis of data patterns previously inaccessible but security officers are concerned about the nature of the information and its uncontrolled accessibility. They are well aware of the potential catastrophic financial losses and the brand damage that compliance breaches can cause to their business.

To address the challenges of Hadoop data privacy, organizations require proactive detection and protection. The ability to locate and identify sensitive data across all Hadoop clusters provides compliance experts with the intelligence and assurance they need to evaluate a company's exposure and risk. Depending on the types of data, a solution that enforces the appropriate remediation policies, such as data masking or data quarantine, should remain a priority. Additionally, the ability to centrally manage and schedule detection and protection actions will make compliance enforcement transparent and automatic across all instances of Hadoop, on premises and in the cloud.

"Organizations require a straightforward and economical way to determine where sensitive data is and how to effectively secure their Hadoop environments," said Manmeet Singh, CEO, Dataguise. "The data here shows that data privacy protection is important to Hadoop users and they are actively engaging security personnel to find ways to detect and protect sensitive data to meet compliance requirements. Using solutions such as DG for Hadoop™ by Dataguise allows for proactive actions to be taken while alleviating the complexity and cost of data privacy protections."

Tweet this: @Dataguise Survey Shows 80% of Enterprises Place Importance on Identifying Sensitive Data In Hadoop #Hadoop #BigData -

Follow Dataguise on Twitter at:

About Dataguise
Dataguise helps organizations safely leverage their enterprise data with a comprehensive risk-based data protection solution. By automatically locating sensitive data, transparently protecting it with high performance Masking on-Demand™ or encryption, and providing enterprise security intelligence to managers, Dataguise improves data risk management, operational efficiencies and regulatory compliance costs. For more information, call 510-824-1036 or visit

Contact Information

  • Agency Contact:
    Joe Austin
    The Ventana Group
    (818) 332-6166
    Email Contact