July 15, 2014

The point of predicate pushdown

Oracle is announcing today what it’s calling “Oracle Big Data SQL”. As usual, I haven’t been briefed, but highlights seem to include:

And by the way – Oracle Big Data SQL is NOT “SQL-on-Hadoop” as that term is commonly construed, unless the complete Oracle DBMS is running on every node of a Hadoop cluster.

Predicate pushdown is actually a simple concept:

“Predicate pushdown” gets its name from the fact that portions of SQL statements, specifically ones that filter data, are properly referred to as predicates. They earn that name because predicates in mathematical logic and clauses in SQL are the same kind of thing — statements that, upon evaluation, can be TRUE or FALSE for different values of variables or data.

The most famous example of predicate pushdown is Oracle Exadata, with the story there being:

Oracle evidently calls this “SmartScan”, and says Oracle Big Data SQL does something similar with predicate pushdown into Hadoop.

Oracle also hints at using predicate pushdown to do non-tabular operations on the non-relational systems, rather than shoehorning operations on multi-structured data into the Oracle DBMS, but my details on that are sparse.

Related link


9 Responses to “The point of predicate pushdown”

  1. Mark Callaghan on July 15th, 2014 11:04 am

    Your quotes in the Kanaracus article were excellent.

    What does it mean to push a predicate to Hadoop? Does it create map/reduce tasks to filter HDFS data? Does this assume HBase is there to evaluate index filters?

  2. Curt Monash on July 15th, 2014 11:28 am


    Thanks for the compliment, and I don’t know the answers to your questions.

    In particular, you’ve alerted me to the fact that I don’t really know whether predicates are being pushed down to the Hadoop cluster as a whole, or whether Oracle has written some kind of an engine to filter data directly on every node (presumably with a call to the Hive metadata store, nee’ HCatalog, coming first to help it figure out what it wants to do).

    Certainly enough other vendors are putting engines on each Hadoop data node for the latter option to be at least theoretically possible.

  3. Arun C. Murthy on July 15th, 2014 3:13 pm

    Posting my tweet-response to Curt here on request…

    Several optimized columnar file-formats are available for data stored in HDFS e.g. ORC in Hive, Parquet etc.

    I’d guess that they use columnar projection (scan only required columns) and predicate pushdown (skip rows which don’t match the filter i.e. the query predicate) available in the columnar formats to reduce IO during the scan from HDFS itself.

    Details differ, but, for e.g. ORC has light-weight indices inline with the data which allows to filter out ‘stripes’ which won’t match the predicate – “SELECT name FROM customer WHERE age < 35" can skip stripes where we'd know min(age) is 45. Therefore, they wouldn't need an active processing component.

    Hope that helps.

  4. shrikanth shankar on July 15th, 2014 3:16 pm

    The blog mentions ” Smart Scan for Hadoop introduces a new service into to the Hadoop ecosystem, which is co-resident with HDFS DataNodes and YARN NodeManagers. Queries from the new external tables are sent to these services to ensure that reads are direct path and data-local. Reading close to the data speeds up I/O, but minimizing data movement requires that Smart Scan do some things that are, well, smart.”. Also the example on the blog is JSON. Seems to point to an active component

  5. Greg Rahn on July 16th, 2014 12:11 am

    I interpret that quote from the blog[1] to mean there is an active processing component as well — a daemon that plays the same role on an HDFS DataNode that exists today on an Exadata Storage Server. In essence, this is a port of the Exadata cellsrv daemon that likely has common HDFS file format scanners (Parquet, ORCfile, RCfile, Text, etc.) and is able to apply predicate pushdown in the form of 1) column and data skipping for formats like Parquet or ORCfile that support such or 2) post-scan filtration, including the ability to apply bloom filters[2] (like Exadata), and column pruning for other formats. In either case, data is converted into Oracle internal tuple representation on the DataNodes[3] and flows up to the RAC DB nodes the same way it does from Exadata (which also means that Smart Scanning HDFS nodes don’t talk to each other to do JOINs — that is the responsibility of the RAC DB nodes).

    [1] https://blogs.oracle.com/datawarehousing/entry/oracle_big_data_sql_one
    [2] https://twitter.com/dominic_giles/status/489095115718537216
    [3] https://twitter.com/dan_mcclary/status/489081072471130112

  6. Ofir Manor on July 16th, 2014 3:33 am

    Greg is right…
    I was in their session at Hadoop Summit, they explained that they pre-install a local agent on every node of their Hadoop appliance… And Exadata can push down projections, filters and also occasionally bloom filters (for joins) to that agent. It is similar to the interaction between Exadata compute nodes (RAC) and storage nodes (Exadata cells).
    Of course, everything is locked down, can be used only with Oracle appliances (Exadata + BIg Data Appliance combo).

  7. Paul Johnson on July 16th, 2014 7:01 am

    Ah! The joys of Oracle predicate push-down…

    Following on from his success at Netezza, I joined Foster Hinshaw at Dataupia from 2006-09. The proposition was ‘complimentary MPP’.

    The idea was that Oracle (and latterly SQL Server and DB2) queries would be re-routed via Oracle’s remote table capability and be executed elsewhere, specifically on our Postgres MPP back-end, with the result set shipped seamlessly back to the Oracle host.

    The goal was that the Oracle user would still perceive the system to be an Oracle system, but it would benefit from a scale-out, shared-nothing architecture that Oracle simply couldn’t deliver on its own.

    To achieve this we had to deliver a level of Oracle trsansparency so that SQL query re-writes (by the user not the optimiser) were rare. As an aside, this is the core of the work carried out by Enterprise DB – making Postgres compatible with Oracle.

    One of the CDR tables we supported soon grew to over 100 billion rows. Predicate pushdown is clearly critical in a system with a single CPU Oracle host connected to a multi-cabinet MPP back-end. The demand on the Oracle server was so light we really did get away with a weedy 2U server.

    Predicate pushdown is where the ‘fun’ started. Connecting Oracle to another Oracle system seems to work well. The SQL query is passed from one to the other as expected.

    However, connecting Oracle to a non-Oracle system via heterogeneous services (HS) is another ball game altogether.

    Oracle seemed to randomly cut the WHERE and GROUP BY clauses out of the submitted SQL, and pass queries to the back-end that were not as submitted by the user.

    This came as somewhat of a surprise, to say the least, and brought our whole value proposition into question. Passing billions of rows back to the host to filter and aggregate (a la Exadata) is not pretty.

    No doubt Oracle will have a happier time getting Oracle predicate pushdown to work!

  8. Greg Rahn on July 16th, 2014 9:59 am

    Hadoop Summit 2014 slides and video on the topic for posterity:


  9. Mustafa Kirac on November 18th, 2015 5:55 am

    Apparently more info is now available at: https://blogs.oracle.com/datawarehousing/entry/big_data_sql_2_0

    It seems that the “big data sql agent” is responsible for transforming HDFS data into Oracle format and send it back to Exadata as Cell service does the same. The agent makes use of Hive metadata and SerDes if available. Predicate pushdown is basically reading the required columns (projection pruning) and filtering out rows that do not match the predicates. Data skipping or storage indexes only takes place at the Exadata side during query planning. The dictionary entry for the Hadoop external table is attached a storage index data structure that is filled during the first full scan on Hadoop, again similar to Cell service. The index is invalidated when the agent on hadoop lets Exadata know that the underlying content changed. HDFS table-to-block mapping is apparently let known by the Exadata dictionary via the Hadoop agent.

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.