Oracle is announcing today what it’s calling “Oracle Big Data SQL”. As usual, I haven’t been briefed, but highlights seem to include:
- Oracle Big Data SQL is basically data federation using the External Tables capability of the Oracle DBMS.
- Unlike independent products — e.g. Cirro — Oracle Big Data SQL federates SQL queries only across Oracle offerings, such as the Oracle DBMS, the Oracle NoSQL offering, or Oracle’s Cloudera-based Hadoop appliance.
- Also unlike independent products, Oracle Big Data SQL is claimed to be compatible with Oracle’s usual security model and SQL dialect.
- At least when it talks to Hadoop, Oracle Big Data SQL exploits predicate pushdown to reduce network traffic.
And by the way – Oracle Big Data SQL is NOT “SQL-on-Hadoop” as that term is commonly construed, unless the complete Oracle DBMS is running on every node of a Hadoop cluster.
Predicate pushdown is actually a simple concept:
- If you issue a query in one place to run against a lot of data that’s in another place, you could spawn a lot of network traffic, which could be slow and costly. However …
- … if you can “push down” parts of the query to where the data is stored, and thus filter out most of the data, then you can greatly reduce network traffic.
“Predicate pushdown” gets its name from the fact that portions of SQL statements, specifically ones that filter data, are properly referred to as predicates. They earn that name because predicates in mathematical logic and clauses in SQL are the same kind of thing — statements that, upon evaluation, can be TRUE or FALSE for different values of variables or data.
The most famous example of predicate pushdown is Oracle Exadata, with the story there being:
- Oracle’s shared-everything architecture created a huge I/O bottleneck when querying large amounts of data, making Oracle inappropriate for very large data warehouses.
- Oracle Exadata added a second tier of servers each tied to a subset of the overall storage; certain predicates are pushed down to that tier.
- The I/O between Exadata’s two sets of servers is now tolerable, and so Oracle is now often competitive in the high-end data warehousing market,
Oracle evidently calls this “SmartScan”, and says Oracle Big Data SQL does something similar with predicate pushdown into Hadoop.
Oracle also hints at using predicate pushdown to do non-tabular operations on the non-relational systems, rather than shoehorning operations on multi-structured data into the Oracle DBMS, but my details on that are sparse.
- Chris Kanaracus’ coverage of the announcement quotes me at length.