- Developed by Cloudera.
- An Apache incubator project.
- Slated to be rolled into CDH — Cloudera’s Hadoop distribution — over the next couple of weeks.
- Only useful with Hive in Version 1, but planned to also work in the future with other Hadoop data access systems such as Pig, search and so on.
- Lacking in administrative scalability in Version 1, something that is also slated to be fixed in future releases.
Apparently, Hadoop security options pre-Sentry boil down to:
- Kerberos, which only works down to directory or file levels of granularity.
- Third-party products.
Sentry adds role-based permissions for SQL access to Hadoop:
- By server.
- By database.
- By table.
- By view.
for a variety of actions — selections, transformations, schema changes, etc. Sentry does this by examining a query plan and checking whether each step in the plan is permissible.
What Sentry doesn’t have is cell-based security, for which Charles perceives relatively little demand. I agree, but also note that traditional RDBMS implementations of cell-based security — notably Oracle Label Security — can have unpleasant performance consequences. From there, I segued the discussion to Accumulo. Unlike Hortonworks, Cloudera sees Accumulo demand strictly in the Federal government, where Accumulo is baked into some major reference architectures.
Charles also walked me through the use cases for some security requests he does frequently hear:
- Encryption at rest is important for compliance, for example for credit card numbers.
- Masking is also of particular interest for credit card numbers.
- Audit arises frequently for Sarbanes-Oxley compliance, and also in financial services (not necessarily for compliance).
- View-based security — a big point of Sentry — is usually to satisfy internal (i.e. non-regulatory) policies.
- Other issues in regulatory compliance (July, 2012)