August 25, 2013

Cloudera Sentry and other security subjects

I chatted with Charles Zedlewski of Cloudera on Thursday about security — especially Cloudera’s new offering Sentry — and other Hadoop subjects.

Sentry is:

Apparently, Hadoop security options pre-Sentry boil down to:

Sentry adds role-based permissions for SQL access to Hadoop:

for a variety of actions — selections, transformations, schema changes, etc. Sentry does this by examining a query plan and checking whether each step in the plan is permissible. 

What Sentry doesn’t have is cell-based security, for which Charles perceives relatively little demand. I agree, but also note that traditional RDBMS implementations of cell-based security — notably Oracle Label Security — can have unpleasant performance consequences. From there, I segued the discussion to Accumulo. Unlike Hortonworks, Cloudera sees Accumulo demand strictly in the Federal government, where Accumulo is baked into some major reference architectures.

Charles also walked me through the use cases for some security requests he does frequently hear:

Related link

Comments

5 Responses to “Cloudera Sentry and other security subjects”

  1. Cloudera Hadoop usage notes | DBMS 2 : DataBase Management System Services on August 25th, 2013 11:40 am

    [...] we scheduled a call to talk about Sentry, Cloudera’s Charles Zedlewski and I found time to discuss other stuff as well. One [...]

  2. Greg Khairallah on August 26th, 2013 2:30 pm

    Interesting attempt by Cloudera to match what Intel is already shipping with a security framework under project Rhino https://github.com/intel-hadoop/project-rhino/. Today Intel has encryption of data files with access from Map Reduce, Pig and Hive that keeps the data fully secure while processing and at rest. Future releases will include cell level authorization and encryption of HBase. I encourage you to see what Intel has for security by reviewing the Intel distribution Security Guide located here: http://hadoop.intel.com/pdfs/IDH-SecurityGuide_R2-4-1_EN.pdf

  3. Charles Zedlewski on August 26th, 2013 3:45 pm

    @Greg – I encourage you to re-read Curt’s post and then your product documentation. Sentry is focused on fine-grained authorization by securing views of data such as a selection of columns or a range of rows in the Hive metastore as well as securing select SQL operations as they’re applied to those views. This is totally unrelated to the encryption you’re referring to.

    Apache Sentry (incubating) is open source and I encourage your colleagues to consider incorporating it into a future release of your own products and perhaps even contributing patches. Much as Cloudera will review & incorporate the HDFS, MapReduce and HBase features that Intel proposes to add under the umbrella of “Rhino” (once they are committed upstream of course).

  4. Curt Monash on August 28th, 2013 12:21 pm

    A talk description for the Intel Developer Forum next month. Emphasis mine. However, I have no idea how granular the bolded part is.

    EDCS004 – Protect Your Big Data with Intel® Xeon® Processors and Intel® Distribution for Apache Hadoop* Software

    Big data and data analytics are helping businesses to become smarter, more productive, and better at making predictions. These tools also present numerous security, compliance, and performance challenges. The Apache* open source projects do not provide adequate mechanisms for data protection or access controls, which are necessary for typical enterprise production use. The Intel® Distribution for Apache Hadoop* software provides significant enhancements to deal with these gaps.

    In this session, we will discuss:

    • Flexible framework for hardware assisted encryption with Intel® Data Protection Technology using Intel® AES New Instructions and Secure Key added to Hadoop, leveraging open source based OpenSSL* encryption algorithms
    • Role based access control recently open sourced for Apache Hadoop
    • New advanced encryption optimization techniques e.g. AES multi-buffer
  5. Sanjay Subramanian on September 3rd, 2013 4:31 pm

    Hi guys

    I implemented a 25 node hive-hadoop cluster in production last month. I have been struggling with implementing Cloudera Sentry for past 3 weeks. I read the documentation but its still not working. Basically I can do the LDAP authentication from a beeline client. But I am not clear if I need to create ROLES thru Hive CLI or just Sentry config files. Either way the authorization is not working. Many questions were answered well by CDH users group but I still cant get it to work. Where can I find help ?

    https://groups.google.com/a/cloudera.org/forum/#!mydiscussions/cdh-user/wEEcDWWqUBI

    https://groups.google.com/a/cloudera.org/forum/#!mydiscussions/cdh-user/y6nwB2-gpoo

    thanks
    sanjay

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.