Discussion of Pervasive Software and its products in database management (PSQL), data integration, and high-speed analytics (Datarush). Related subjects include:
There have been many recent announcements about how data integration/ETL (Extract/Transform/Load) vendors are going to work with MapReduce. Most of what they say boils down to one or more of a few things:
- Hadoop generally stores data in HDFS (Hadoop Distributed File System). ETL vendors want to be able to extract data from or load it into HDFS.
- ETL vendors have development environments that let you specify/script/whatever ETL jobs. ETL vendors want their development tools to develop ETL processes executed via MapReduce/Hadoop.
- In particular, this allows ETL vendors to exploit the parallel-processing capabilities of MapReduce.
Some additional twists include:
- Pentaho announced business intelligence and ETL for Hadoop last year.
- Syncsort thinks different sort algorithms should be usable with Hadoop. Consequently, it plans to contribute technology to the community to make sort pluggable into Hadoop. (However, Syncsort is keeping its own sort technology proprietary.)
- Syncsort is considering replicating some Hive functionality, starting with joins, hopefully running much faster. (However, Syncsort’s basic Hadoop support is a quarter or three away, so any more advanced functionality would probably come out in 2012 or beyond.)
- SnapLogic fondly thinks that its generation of MapReduce jobs is particularly intelligent.
Finally, my former clients at Pervasive, who haven’t briefed me for a while, seem to have told Doug Henschen that they have pointed DataRush at MapReduce.* However, I couldn’t find evidence of same on the Pervasive DataRush website beyond some help in using all the cores on any one Hadoop node.
*Also see that article because it names a bunch of ETL vendors doing Hadoop-related things.
|Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Hadoop, MapReduce, Parallelization, Pentaho, Pervasive Software, SnapLogic, Syncsort||1 Comment|
In my first post-fire briefing, I had a long-scheduled dinner with the Pervasive DataRush folks. Much of DataRush’s positioning, feature evolution, and so on remain To Be Determined. Most existing customers and applications remain To Be Disclosed. What’s more, DataRush is a technology to accelerate applications that
- Need to be parallelized
- Should run on SMP rather than shared-nothing hardware
and Pervasive hasn’t done a great job of explaining where #2 applies.
That said, there’s at least one use case for which DataRush should clearly be considered today. Suppose you have a messy ETL/data transformation task that requires custom code. Then I see three main choices:
- Write the code within the confines of an off-the-shelf ETL tool.
- Write the code to run on an analytic DBMS platform, ideally an MPP/shared-nothing one.
- Use something like DataRush (and I’m not familiar with any good alternatives to DataRush).
In some cases, DataRush may be best possibility.
|Categories: Analytic technologies, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Parallelization, Pervasive Software||Leave a Comment|
I’ve made a few references to Pervasive DataRush in the past — like this one — but I’ve never gotten around to seriously writing it up. I’ll now try to make partial amends. The key points about Pervasive Datarush are:
- DataRush grew out of Pervasive Software’s ETL business, as the underpinnings for a new data transformation tool they were building.
- DataRush is a Java framework for doing parallel programming automagically.
- Unlike most modern parallelization technologies, DataRush is focused on single SMP (Symmetric MultiProcessing) boxes rather than loosely-coupled grids.
- DataRush is based on dataflow programming.
- Pervasive says that DataRush is really fast.
Both Pervasive Software and Cast Iron Systems told me recently of fairly pure cloud offerings. In this, they’re joining Informatica, which started offering Salesforce.com integration-as-a-service back in 2006. So far as I can tell, the three vendors are doing somewhat different things. Read more
|Categories: Cast Iron Systems, Cloud computing, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Informatica, Pervasive Software, Software as a Service (SaaS)||8 Comments|
Many MPP data warehousing vendors have told me their products are used for ELT (Extract/Load/Transform) instead of ETL (Extract/Transform/Load). I.e., needed data transformations are done on the MPP system, rather than on the — probably SMP — system the data comes from.* If the data transformation is being applied on a record-by-record basis, then it’s automatically fully parallelized. Even if the transforms are more complex, considerable parallel processing may still be going on.
*Or it’s some of each, at which point it’s called ETLT — I bet you can work out what that stands for.
|Categories: Aster Data, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, MapReduce, Parallelization, Pervasive Software||8 Comments|
Call me slow on the uptake if you like, but it’s finally dawned on me that outsourced data marts are a nontrivial segment of the analytics business. For example:
- I was just briefed by Vertica, and got the impression that data mart outsourcers may be Vertica’s #3 vertical market, after financial services and telecom. Certainly it seems like they are Vertica’s #3 market if you bundle together data mart outsourcers and more conventional OEMs.
- When Netezza started out, a bunch of its early customers were credit data-based analytics outsourcers like Acxiom.
- After nagging DATAllegro for a production reference, I finally got a good one — TEOCO. TEOCO specializes in figuring out whether inter-carrier telcom bills are correct. While there’s certainly a transactional invoice-processing aspect to this, the business seems to hinge mainly around doing calculations to figure out correct charges.
- I was talking with Pervasive about Pervasive Datarush, a beta product that lets you do super-fast analytics on data even if you never load it into a DBMS in the first place. I challenged them for use cases. One user turns out to be an insurance claims rule-checking outsourcer.
- One of Infobright’s references is a French CRM analytics outsourcer, 1024 Degres.
- 1010data has built up a client base of 50-60, including a number of financial and retail blue-chippers, with a soup-to-nuts BI/analysis/columnar database stack.
- I haven’t heard much about Verix in a while, but their niche was combining internal sales figures with external point-of-sale/prescription data to assess retail (especially pharma) microtrends.
To a first approximation, here’s what I think is going on. Read more
I blogged recently about Cast Iron Systems, a simplicity-oriented data integration appliance vendor that is increasingly focusing on the SaaS market. Well, Pervasive Software is doing something similar.
Via Data Integrator, Pervasive is a leader in the low-cost integration market, with revenue split about 50/25/25 between direct sales, ISVs, and SaaS. Pervasive fondly believes that its products cost half as much as Cast Iron’s, and wind up taking no more installation effort when you factor in Pervasive’s broader capabilities in areas such as workflow. However, there’s some doubt as to whether this is apples-to-apples. Cast Iron does include hardware, after all, and as Pervasive itself points out, Cast Iron will bundle some professional services into a sale if you ask nicely.
Two things are new. Read more
|Categories: Cloud computing, EAI, EII, ETL, ELT, ETLT, Pervasive Software, Software as a Service (SaaS)||5 Comments|
For very high-end applications, the list of viable database management systems is short. Scalability can be a problem. (The rankings of most scalable alternatives differ in the OLTP and data warehouse realms.) Extreme levels of security can be had from only a few DBMS. (Oracle would have you believe there’s only one choice.) And if you truly need 99.99% uptime, there only are a few DBMS you even should consider.
But for most applications at any enterprise – and for all applications at most enterprises – super high-end DBMS aren’t required. There are relatively few applications that wouldn’t run perfectly well on PostgreSQL or EnterpriseDB today. Ingres and Progress OpenEdge aren’t far behind (they’re a little lacking in datatype support). Ditto Intersystems Cache’, although the nonrelational architecture will be off-putting to many. And to varying degrees, you can also do fine with MySQL, Pervasive PSQL, MaxDB, or a variety of other products – or for that matter with the cheap or free crippled versions of Oracle, SQL Server, DB2, and Informix.
What’s more, these mid-range database management systems can have significant advantages over their high-end brethren. Read more
|Categories: Actian and Ingres, EnterpriseDB and Postgres Plus, IBM and DB2, Intersystems and Cache', Microsoft and SQL*Server, Mid-range, MySQL, Open source, Oracle, Pervasive Software, PostgreSQL, Progress, Apama, and DataDirect, SAP AG||16 Comments|
Pervasive Software has a long history – 25 years, in fact, as they’re emphasizing in some current marketing. Ownership and company name have changed a few times, as the company went from being an independent startup to being owned by Novell to being independent again. The original product, and still the cash cow, was a linked-list DBMS called Btrieve, eventually renamed Pervasive PSQL as it gained more and more relational functionality.
Pervasive Summit PSQL v10 has just been rolled out, and I wrote a nice little white paper to commemorate the event, describing some of the main advances over v9, primarily for the benefit of current Pervasive PSQL developers. In one major advance, Pervasive made the SQL functionality much stronger. In particular, you now can have a regular SQL data dictionary, so that the database can be used for other purposes – BI, additional apps, whatever. Apparently, that wasn’t possible before, although it had been possible in yet earlier releases. Pervasive also added view-based security permissions, which is obviously a Very Good Thing.
There also are some big performance boosts. Read more
|Categories: Cache, Data models and architecture, Database compression, Emulation, transparency, portability, Memory-centric data management, Microsoft and SQL*Server, Mid-range, OLTP, Pervasive Software||Leave a Comment|
I chatted again recently with Simon Peel of Cast Iron Systems, and this time I got a better understanding of Cast Iron’s simplicity claim. It refers largely to a drag-and-drop interface that furthermore provides default mappings between pairs of application suites. Simon bristled a bit when I referred to this as mapping “like to like,” because he’s proud that it’s a little smarter than that. Still, “like to like” seems to be what it typically amounts to — customers go to customers, customer addresses go to customer addresses, and so on. Read more