June 19, 2012

Notes on HBase 0.92

This is part of a four-post series, covering:

Annoying Hadoop marketing themes that should be ignored.
Hadoop versions and distributions, and their readiness or lack thereof for production.
In general, how “enterprise-ready” is Hadoop?
HBase 0.92 (this post)

As part of my recent round of Hadoop research, I talked with Cloudera’s Todd Lipcon. Naturally, one of the subjects was HBase, and specifically HBase 0.92. I gather that the major themes to HBase 0.92 are:

Performance, scalability, and so on.
“Coprocessors”, which are like triggers or stored procedures.
Security, as the first major application of co-processors.

HBase coprocessors are Java code that links straight into HBase. As with other DBMS extensions of the “links straight into the DBMS code” kind,* HBase coprocessors seem best suited for very sophisticated users and third parties.** Evidently, coprocessors have already been used to make HBase security more granular — role-based, per-column-family/per-table, etc. Further, Todd thinks coprocessors could serve as a good basis for future HBase enhancements in areas such as aggregation or secondary indexing.

*Examples include unfenced C++ extensions to analytic RDBMS or — which mattered more in the 1990s than now — “blade”/”cartridge”/datatype extensions to extensible RDBMS such as Illustra, Informix, Oracle, or DB2.

**Admittedly, in the current HBase community, a considerable fraction of user organizations fit the “very sophisticated”/co-developer template.

As for scalability and performance, it seems the advances there match clichés such as “low-hanging fruit” or Bottleneck Whack-a-Mole.

HBase b-trees used to be restricted to two levels; now they aren’t.
Replication among data centers has been strengthened (I eventually hear that about most NoSQL projects that aren’t Cassandra 🙂 ).
HBase inherits some performance improvements in HBase itself.

Overall, Todd says several tests have indicated HBase performance improvements of 60% or better, with some particular cases of course going much higher (up to 2 1/2X).

My whole HBase discussion with Todd was pretty short, actually; just one of several subjects in a one-hour call. But we did squeeze in one topic that wasn’t 0.92-specific — namely, what does HBase storage tend to be like? Notes on that included:

HBase working sets are commonly in RAM, or else have cache hit ratios in at least the 60-80% range.
Solid-state memory isn’t generally used for HBase persistence. Small fast disks are beginning to appear.
When you do short-request and MapReduce processing against the same HBase database, the MapReduce part is usually still done using cheaper disks.

Categories: Benchmarks and POCs, Cloudera, Hadoop, HBase, MapReduce, NoSQL, Open source, Storage, Theory and architecture

Subscribe to our complete feed!

Comments

2 Responses to “Notes on HBase 0.92”

“Enterprise-ready Hadoop” | DBMS 2 : DataBase Management System Services on June 20th, 2012 3:58 am

[…] HBase 0.92. […]
Hadoop distributions: CDH 4, HDP 1, Hadoop 2.0, Hadoop 1.0 and all that | DBMS 2 : DataBase Management System Services on June 20th, 2012 5:33 pm

[…] HBase 0.92. […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Notes on HBase 0.92

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin