June 8, 2009

The future of data marts

Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept — mixing mine and Greenplum’s together — include:

In essence, Greenplum is pitching the story:

When put that starkly, it’s overstated, not least because

Specialized Analytic DBMS != Data Warehouse Appliance

But basically it makes sense, for two main reasons:

Of course, the EDC vision isn’t quite as new or differentiated as Greenplum ideally would wish one to believe.

One particular source of potential confusion is Greenplum’s emphasis on the buzzphrase self-service (data mart). This seems to be a conflation of two related concepts:

One thing that’s needed for this technology to come to full fruition is sophisticated data movement and synchronization. Ideally, some tables in a data mart could be virtual — views against a central database. But others would be physically recopied from the center, with all the ETL/ELT/ETLT/replication issues that entails. Meanwhile, it’s not obvious that the ideal architecture is a simpleminded hub-spoke — perhaps one should be able to spin data marts out of other marts, perhaps at least somewhat reducing the proliferation of tables and the recopying of data. And it should be easy for administrators to change deployment strategies, e.g. by starting a table out as a view and changing over to making it a physical copy as usage profiles change.

Oliver Ratzesberger of eBay also argues that workload management — not a current Greenplum strength — can be crucial. For example, if the CEO wants the CFO to get her an answer TODAY, the fastest approach may be to create an entirely virtual data mart, with very favorable SLAs (Service Level Agreements). More generally, if you’re setting up dozens of marts that contain views of the central database, sophisticated SLA management can be essential. There’s a big virtualization opportunity here — but virtualization requires a lot of system management infrastructure.

Related links

Comments

27 Responses to “The future of data marts”

  1. Jerome Pineau on June 8th, 2009 10:51 am

    So is the only difference bet. Vertica and GP in the cloud is that GP has a dedicated infrastructure while Vertica uses EC2?

  2. Curt Monash on June 8th, 2009 11:24 am

    Not exactly. Greenplum doesn’t offer cloud-based services of any kind at this time. It encourages its customers to build “private clouds”.

    Your comment would be closer to accurate if you were contrasting Aster and Kognitio.

  3. Jerome Pineau on June 8th, 2009 2:37 pm

    Oh so hang on a second – their big announcement today was about on-premise “private GP clouds” and not some dedicated hosted service they provide??

  4. Amr Awadallah on June 8th, 2009 3:56 pm

    Very confused, isn’t that what Greenplum already offered?

  5. Jerome Pineau on June 8th, 2009 4:04 pm

    Well it looks like they’re basically saying look, go cobble something using metal/virtual/cloud together and we will fit on top of that. But it’s still your IT ops handling all the provisioning (for their private cloud). In essence what it seems like to me at this point is a set of best practices really relatively inline with their MAD EDW philosophy. Unless I’m missing something which is a distinct possibility – I mean geezus I initially assumed they were providing the cloud infrastructure :)

  6. Ben Werther on June 8th, 2009 7:29 pm

    I’d recommend a read of the Greenplum EDC whitepaper at http://www.greenplum.com/resources/complete-library/ (No registration required).

    The EDC initiative is about 3 things:
    – Platform technology that allow business analysts to self-serve provision warehouses/sandboxes via a web console and access/replicate data into their warehouse from anywhere in the EDC. (i.e. a ‘private cloud’ approach applied to scale-out data warehousing). This is not just about spinning up a database in virtual machines. We’re building a new layer of services that really allow business and IT to each focus on what they do best and reduce the areas of friction that exist today — e.g. self-serve cluster provisioning from server pools, local or geographically remote data replication, data lineage and cross-warehouse metadata, and more.
    – A new data warehousing methodology that challenges the formal ‘everything in one database and one data model’ that has been prevalent over the past 25 years. This isn’t something that Greenplum has cooked up — it is simply a reflection of what our customers are putting into practice today.
    – An ecosystem of customers and partners that believe in the vision and are working with us to shape and deliver on it.

    Note that most enterprises that we work with aren’t looking to the public cloud for data warehousing – largely because the data is being generated in-house and they don’t want to push TBs over the Internet daily. But they do want to achieve many of the touted ‘cloud’ benefits in-house. i.e. They want to empower business analysts to serve themselves without lots of process or IT delays in the way. And they want IT to consolidate infrastructure, get their arms around data mart proliferation, and improve service levels but without some heavy-handed approach that requires unifying all the data models.

  7. Sean Kain on June 8th, 2009 10:42 pm

    Isn’t this another “buzzword bingo” name for something that is pretty common in mature DW environments. Pretty much anyone who has a data mining team use the DW must do this.

    Curt has referred to it before at eBay with Oliver Ratzesberger’s Analytics as a Service blog at http://www.xlmpp.com/articles/16-articles/39-analytics-as-a-service. See also a presentation from Oliver on this topic at http://www.teradata.com/t/WorkArea/DownloadAsset.aspx?id=5761

  8. Jerome Pineau on June 9th, 2009 1:01 am

    So basically, what companies besides Aster, Kognitio and Vertica currently have production cloud implementations today?

  9. Per-terabyte pricing | DBMS2 -- DataBase Management System Services on June 9th, 2009 4:31 am

    [...] only charges you for it once.* But if you spin out data marts and recopy data into it — as Greenplum rightly encourages you to do — Greenplum wants to be paid for each copy.  Similarly, Vertica charges only for deployment, [...]

  10. Greenplum spins ‘Enterprise Data Cloud’ vision | Cervaza.com BLOG News From The Net ! on June 9th, 2009 5:27 pm

    [...] notion of self-service data marts has merit, but with certain caveats, Monash said in a blog posting Monday. “Suppose users could order up the data mart they want, perhaps test it at a very low [...]

  11. Steve Wooledge on June 10th, 2009 12:29 am

    I found this post from Daniel Abadi as a pretty balanced assessment of this news:
    http://dbmsmusings.blogspot.com/2009/06/quick-thoughts-on-greenplum-edc.html

    For example:
    “7. It appears that the only part of the EDC initiative that Greenplum’s new version (3.3) has implemented is online data warehouse expansion (you can add a new node and the data warehouse/data mart can incorporate it into the parallel storage/processing without having to go down). All this means is that Greenplum has finally caught up to Aster Data along this dimension. I’d argue that since Aster Data also has a public cloud version and has customers using it there, they’re actually farther along the EDC initiative than Greenplum is …”

    Aster has multiple customer deployments on public clouds — both Amazon and AppNexus. ShareThis is the largest DW-in-a-real-cloud deployment at Amazon (currently at 10 TB) and will be discussing their deployment at TDWI in San Diego in August at the Executive Summit:
    http://www.eiseverywhere.com/ehome/index.php?eventid=4983&tabid=929

  12. Ben Werther on June 10th, 2009 8:16 pm

    We’ve been running Greenplum internally on EC2 for almost 2 years now, and use both EC2 and internal VMware pools for a range of QA and scale testing work.

    Making Greenplum run on EC2 is almost zero work — we just haven’t seen material demand from large enterprises wanting to put their production, mission critical data warehouses in the public cloud yet. There’s no doubt it’ll come over time, and we’re supportive of the direction, but it just isn’t here yet.

    Matt Aslett from the the451 group wrote a nice analysis on this topic (unfortunately only available through paid subscription), where he reinforced this point:

    “Enabling cloud-computing deployments is about more than simply offering a version of your product running on Amazon . . . Adoption of data warehousing on public clouds has so far been limited to proofs-of-concept evaluations and trials rather than production deployments, we believe, and Greenplum’s focus on datacenter platforms could serve it well as enterprises look to private cloud architecture as a method of improving datacenter efficiencies before identifying workloads that could be migrated to public clouds.”

    We’re encouraged by folks like Aster, Vertica and others that find interest in public cloud offerings to serve the current market of Web 2.0 companies which is definitely a good use case. If anyone is seeing that large enterprises are ready today for meaningful adoption of public cloud services for data warehousing, we’re ready to serve ;)

  13. Curt Monash on June 11th, 2009 2:23 am

    Ben,

    I didn’t think it was possible to stretch the definition of Web 2.0 to the breaking point, but you may have just accomplished it. ;)

    Best,

    CAM

  14. The future of data marts by DBMS2 | Data mart, Business Intelligence, Data warehousing and Reporting on June 16th, 2009 3:49 pm

    [...] Read more Author: admin Categories: Data Governance, Data Mart Examples, Data mart Tags: Add new tag Comments (0) Trackbacks (0) Leave a comment Trackback [...]

  15. Netezza on concurrency and workload management | DBMS2 -- DataBase Management System Services on July 18th, 2009 12:52 am

    [...] meeting. But while I was there I asked where Netezza stood on concurrency, workload management, and rapid data mart spin-out. Netezza’s claims in those regards turned out to be surprisingly [...]

  16. Data marts in the world of text | Text Technologies on September 20th, 2009 5:09 am

    [...] The future of data marts Categories: Enterprise search, Ontologies, Search engines, Specialized search, Structured search  Subscribe to our complete feed! [...]

  17. BI-Quotient » Blog Archive » Data warehousing for free! Terabyte sized data warehouse and business intelligence without license costs on October 26th, 2009 12:38 pm

    [...] (1) Greenplum themselves promote this offering as part of their Enterprise Data Cloud. They have a vision of self service data marts. Based on this, data analysts can go to the Enterprise Data Warehouse and via interfaces create their own data marts for in depth analysis outside the EDW. Have a look at Curt Monash’s excellent article on the future of data marts. [...]

  18. BI-Quotient » Blog Archive » Data warehousing for free! Terabyte sized data warehouse and business intelligence without license costs on October 26th, 2009 12:38 pm

    [...] (1) Greenplum themselves promote this offering as part of their Enterprise Data Cloud. They have a vision of self service data marts. Based on this, data analysts can go to the Enterprise Data Warehouse and via interfaces create their own data marts for in depth analysis outside the EDW. Have a look at Curt Monash’s excellent article on the future of data marts. [...]

  19. ahmad on January 10th, 2010 5:58 pm

    Dr:
    hi, how are you please I’m student in university i want example for application data mart and explain this example.

  20. Interesting trends in database and analytic technology | DBMS2 -- DataBase Management System Services on January 31st, 2010 10:12 pm

    [...] area I flat-out forgot to mention is easy data mart spin-out. Categories: Analytic technologies, Business intelligence, Data models and architecture, Data [...]

  21. Three kinds of software innovation, and whether patents could possibly work for them | DBMS2 -- DataBase Management System Services on March 23rd, 2010 4:19 am

    [...] or functionality not just for end users, but also for administrators. In particular, SaaS, cloud, private cloud and/or appliance benefits are commonly concentrated in this [...]

  22. Greenplum Chorus and Greenplum 4.0 | DBMS2 -- DataBase Management System Services on April 12th, 2010 7:54 am

    [...] Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign. [...]

  23. eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more | DBMS 2 : DataBase Management System Services on October 10th, 2010 11:55 am

    [...] took the opportunity to ask what kinds of data marts (virtual or otherwise) were spun out in [...]

  24. Naym on February 28th, 2011 4:33 pm

    Curt, the link to Scott Yara’s own words does not work

  25. Curt Monash on February 28th, 2011 7:33 pm

    Thanks, Naym. That’s totally my fault, and I don’t now know what I had in mind. I’ll just delete the reference.

  26. Aden on July 4th, 2013 9:28 pm

    Big Data Analytics is not only for retail Business Intelligence, even toughh that is where some of the greatest advancements are currently occurring. Big Data Analytics is also the future of Infrastructure Asset Management. Each industry has core infrastructure that must be Asset Managed over its life-cycle through predictive modeling. Big Data Analytics will evolve too rapidly (with increasing volume, variety, velocity and complexity of available classes of data) for any industry organization to standardize and maintain THE method of doing Infrastructure Asset Management through Big Data Analytics.Those wishing to take a leadership role in the Big Data Analytics required for successful Integrated Asset Management of infrastructure need to establish the standards for the backbone of Big Data in their industry sector that is industry associations should establish the standards for data governance, management, control, and compliance through a Central Data Warehouse. Then let consultants, utilities, software companies, and academics knock yourself out and do the analytics you want any way you want to do it and provide competitive differentiation to the businesses. If you want to work with our data scientists, great; if you have your own data scientists or a third-party that helps you, fabulous. But there is only one place that you come to get the data and that’s [the industry association's Central Data Warehouse.] From the CDW, infrastructure asset design and performance can be independently Validated, Verified (iV&V), and benchmarked against peers (as is done in the software sector). As a civil engineer, I believe this is the future of the engineering standard of care in all sectors and will change engineering practice as we know it.

  27. plr article on September 23rd, 2014 5:00 am

    hello!,I like your writing very a lot! percentage we communicate more about your
    article on AOL? I require an expert on this house to resolve my problem.
    May be that is you! Having a look ahead to peer you.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.