NoSQL: Dataguise Presents 10 Best Practices for Securing Sensitive Data in Hadoop

| | bookmark | email

Dataguise Presents 10 Best Practices for Securing Sensitive Data in Hadoop

Start Early! Determine the data privacy protection strategy during the planning phase of a deployment, preferably before moving any data into Hadoop. This will prevent the possibility of damaging compliance exposure for the company and avoid unpredictability in the roll out schedule. Identify what data elements are defined as sensitive within your organization. Consider company privacy policies, pertinent industry regulations and governmental regulations. Discover whether sensitive data is embedded in the environment, assembled or will be assembled in Hadoop. Determine the compliance exposure risk based on the information collected. Determine whether business analytic needs require access to real data or if desensitized data can be used. Then, choose the right remediation technique (masking or encryption). If in doubt, remember that masking provides the most secure remediation while encryption provides the most flexibility, should future needs evolve. Ensure the data protection solutions under consideration support both masking and encryption remediation techniques, especially if the goal is to keep both masked and unmasked versions of sensitive data in separate Hadoop directories. Ensure the data protection technology used implements consistent masking across all data files (Joe becomes Dave in all files) to preserve the accuracy of data analysis across every data aggregation dimensions. Determine whether a tailored protection for specific data sets is required and consider dividing Hadoop directories into smaller groups where security can be managed as a unit. ? Ensure the selected encryption solution interoperates with the company's access control technology and that both allow users with different credentials to have the appropriate, selective access to data in the Hadoop cluster. Ensure that when encryption is required, the proper technology (Java, Pig, etc.) is deployed to allow for seamless decryption and ensure expedited access to data.

tags:hadoop,security,dataguise,mapreduce,bigdata

via NoSQL databases