Introduction to Syncsort and DMExpress
Let’s start with some Syncsort basics.
- Syncsort was founded in 1968.
- As you might guess from its name and age, Syncsort started out selling software for IBM mainframes, used for sorting data. However, for the past 30 or so years, Syncsort’s products have gone beyond sort to also do join, aggregation, and merge. This was the basis for Syncsort’s expansion into the more general ETL (Extract/Transform/Load) business.
- As you might further guess, along the way there was a port to UNIX, development of a GUI (Graphical User Interface), and a change of ownership as Syncsort’s founder more or less cashed out.
- At this point, Syncsort sees itself primarily as a data integration/ETL company, whose main claim to fame is performance, with further claims of linear scaling and no manual tuning.*
One of Syncsort’s favorite value propositions is to contrast the cost of doing ETL in Syncsort, on commodity hardware, to the cost of doing ELT (Extract/Load/Transform) on high-end Teradata gear.
Categories: Data integration and middleware, Database compression, EAI, EII, ETL, ELT, ETLT, Specific users, Syncsort | 9 Comments |
So can logistic regression be parallelized or not?
A core point in SAS’ pitch for its new MPI (Message-Passing Interface) in-memory technology seems to be logistic regression is really important, and shared-nothing MPP doesn’t let you parallelize it. The Mahout/Hadoop folks also seem to despair of parallelizing logistic regression.
On the other hand, Aster Data said it had parallelized logistic regression a year ago. (Slides 6-7 from a mid-2010 Aster deck may be clearer.) I’m guessing Fuzzy Logix might make a similar claim, although I’m not really sure.
What gives?
Categories: Aster Data, Hadoop, Parallelization, Predictive modeling and advanced analytics, SAS Institute | 32 Comments |
Whither MarkLogic?
My clients at MarkLogic have a new CEO, Ken Bado, even though former CEO Dave Kellogg was quite successful. If you cut through all the happy talk and side issues, the reason for the change is surely that the board wants to see MarkLogic grow faster, and specifically to move beyond its traditional niches of publishing (especially technical publishing) and national intelligence.
So what other markets could MarkLogic pursue? Before Ken even started work, I sent over some thoughts. They included (but were not limited to): Read more
Categories: MarkLogic, Object, RDF and graphs, Structured documents | 6 Comments |
Comments on EMC Greenplum
I am annoyed with my former friends at Greenplum, who took umbrage at a brief sentence I wrote in October, namely “eBay has thrown out Greenplum“. Their reaction included:
- EMC Greenplum no longer uses my services.
- EMC Greenplum no longer briefs me.
- EMC Greenplum reneged on a commitment to fund an effort in the area of privacy.
The last one really hurt, because in trusting them, I put in quite a bit of effort, and discussed their promise with quite a few other people.
Some thoughts on Oracle Express Edition
I was asked by a press person about Oracle 11g Express Edition. So I might as well also share my thoughts here.
1. Oracle 11g Express Edition is seriously crippled. E.g., it’s limited to 1 GB of RAM and 11 GB of data. However …
2. … I recall when I excitedly uncovered the first 1 GB relational databases, the way I’ve uncovered petabyte-scale databases in recent years. It was less than 20 years ago. This illustrates that …
3. … the Oracle 11g Express Edition crippleware is better than what top relational database users had 20 years ago. That in turn suggests …
4. … there are plenty of businesses small enough to use Oracle 11g Express Edition for real work today.
5. Sensible reasons for having an Oracle Express Edition start with test, development, and evaluation. But there’s also market seeding — if somebody uses it for whatever reason, then either the person, the organization, or both could at some point go on to be a real Oracle customer.
By the way, allowable database size of 11 GB is up from 4 GB a few years ago. That’s like treading water. 🙂
Categories: Mid-range, Oracle | 8 Comments |
The MongoDB story
Along with CouchDB/Couchbase, MongoDB was one of the top examples I had in mind when I wrote about document-oriented NoSQL. Invented by 10gen, MongoDB is an open source, no-schema DBMS, so it is suitable for very quick development cycles. Accordingly, there are a lot of MongoDB users who build small things quickly. But MongoDB has heftier uses as well, and naturally I’m focused more on those.
MongoDB’s data model is based on BSON, which seems to be JSON-on-steroids. In particular:
- You just bang things into single BSON objects managed by MongoDB; there is nothing like a foreign key to relate objects. However …
- … there are fields, datatypes, and so on within MongoDB BSON objects. The fields are indexed.
- There’s a multi-value/nested-data-structure flavor to MongoDB; for example, a BSON object might store multiple addresses in an array.
- You can’t do joins in MongoDB. Instead, you are encouraged to put what might be related records in a relational database into a single MongoDB object. If that doesn’t suffice, then use client-side logic to do the equivalent of joins. If that doesn’t suffice either, you’re not looking at a good MongoDB use case.
Categories: Clustering, Data models and architecture, MapReduce, MongoDB, NoSQL, Parallelization | 10 Comments |
10gen company basics
10gen, as you probably knew, is the company behind MongoDB. I’ve talked with Dwight Merriman of 10gen a few times, which is why his thoughts were featured in at least three different posts on NoSQL/document-oriented-database generalities. I also talked a month ago with new 10gen President Max Schireson (who at one point was the #2 guy at MarkLogic) and CTO Eliot Horowitz. After some delay, 10gen gave me permission to post some January, 2011 slides that summarize company status, target markets, customer brags — excuse me, customer success — and so on. Maybe it’s time I actually blogged about 10gen and MongoDB. 🙂
By the way, our vendor client disclosures are a bit out of date in the area of short-request processing data management. The list is actually 10gen /MongoDB, Cloudera/HBase, CodeFutures/dbShards, Couchbase, DataStax/Cassandra, MarkLogic (yes, I count them, even though I didn’t count them as “NoSQL”), salesforce.com/database.com, and Schooner Information Technology.
10gen company highlights include:
- 10gen boasts a couple hundred paying customers total, but that includes small consulting/training deals.
- 10gen has dozens of MongoDB support subscription customers. That figure was around 25 in August.
- Like similar vendors, 10gen plans to add some proprietary code to its MongoDB support subscription services.
- 10gen is bicoastal, having started in New York City but ramping up in Redwood Shores. For one thing, a lot of 10gen’s customers are in the SF Bay area.
- 10gen headcount was 16 or so in August, 25-30 in January.
For a bit more detail, please see the slides linked above. I shall now do a separate post that is actually about MongoDB.
Categories: Market share and customer counts, MongoDB | 2 Comments |
ANTs Software updates
I drafted the partial post quoted below some months ago, but never finished it, as my general posting hiatus hit. Anyhow, I just thought of ANTs again, due to a LinkedIn request from an exec, and it came back to mind. Subsequent news includes that the product had to be temporarily pulled from the market (what a shock), there was $200,000 of IBM revenue through the end of 2010 (by ANTs’ standards, that’s a lot), and at some point three Sybase-to-IBM product sales actually got closed.
ANTs Software recently came (back) to my attention when, ego-surfing, I saw they had made up some falsehoods about me and posted same in their blog. So I posted about ANTs Software. Now that the ANTs Software blog is on my radar, I see there’s another post from CEO Joe Kozak stating his case that ANTs Software is a good investment. I also notice that there’s an active S-1 to sell ANTs Software stock, dated two weeks before the blog post. Frankly, it surprises me that it’s legal to recommend your own stock that emphatically while you’re in registration — but hey, I tend to be on the side of favoring more communication over less.
According to the ANTs Software 10-Q for the quarter ended June 30, ANTs Software has >$2 million in negative working capital — which this offering apparently won’t change (it’s for a shareholder to sell stock, not for ANTs to raise more money for itself).
Actually, ANTs did manage to get its working capital positive again. The key paragraph from the 10-K linked above, emphasis mine, is
The consolidated financial statements contemplate continuation of the Company as a going concern. However, the Company has had minimal revenues since inception, suffered recurring losses from operations, has generated negative cash flows from operations and has an accumulated deficit of $156.97 million as of December 31, 2010 that raise substantial doubt about the Company’s ability to continue as a going concern. The Company also had significant near-term liquidity needs as of December 31, 2010, including $0.25 million currently due on a line of credit and $2.00 million in notes payable due January 31, 2011. Subsequent to December 31, 2010, the Company received proceeds from a $3.00 million subscription receivable (less $0.39 million in fees, including $0.24 million in dispute) for the sale of 5.18 million shares of common stock pursuant to the BRG Agreement, $0.06 million in proceeds from the exercise of warrants covering 0.13 million shares of common stock and gross proceeds of $0.75 million from the Note and Warrant Purchase Agreements. The outstanding balance on the line of credit was subsequently repaid and the notes payable were subsequently deferred until January 31, 2013. The Company’s ability to continue as a going concern is dependent upon management’s ability to generate profitable operations in the future and obtain the necessary financing to meet obligations and repay liabilities arising from normal business operations when they come due. The Company anticipates generating profitable operations from marketing and sales of ACS and the growth of our Professional Services offerings for ACS implementations. If the Company does not generate profitable operations or obtain the necessary financing, the Company may not have enough operating funds to continue to operate as a going concern. Securing additional sources of financing to enable the Company to continue the development and commercialization of proprietary technologies will be difficult and there is no assurance of our ability to secure such financing. A failure to generate profitable operations or obtain additional financing could prevent the Company from making expenditures that are needed to pay current obligations, allow the hiring of additional development personnel and continue development of its software and technologies. The Company continues actively seeking additional capital through private placements of equity and debt.
Bottom line: $157 million in losses have produced 3 sales (with more presumably coming) of a product that isn’t that important in the first place (it just helps you move from a perfectly decent DBMS to one you might like better while saving on migration costs). That makes almost any other failure in software industry history look like a rousing success by comparison.
Categories: ANTs Software, IBM and DB2, Sybase | 6 Comments |
The client that was confused about security
The competition for April Fool’s Day humor is brisk, as I documented in 2010 with two lists of excellent pranks. So I went against the grain that year, offering a collection of strange-but-true stories — such as how I came to have heartthrob James Marsters autograph a shirtless picture of himself, why I regretted that graduating athletic powerhouse Ohio State University at age 16 cost me my NCAA eligibility, and the sore butt I got from spending an afternoon with Bill Gates’ girlfriend, herself a well-known industry figure.
That post seemed to go over well, even if I’m a little disappointed at how few people joined in with stories of their own. So I’m opting for strange-but-true this year also, just more aligned with the usual subjects of my blogging. And thus, without further ado, here’s the story of
The client that was confused about security
Short-request and analytic processing
A few years ago, I suggested that database workloads could be divided into two kinds — transactional and analytic. The advent of non-transactional NoSQL has suggested that we need a replacement term for “transactional” or “OLTP”, but finding one has been a bit difficult. Numerous tries, including high-volume simple processing, online request processing, internet request processing, network request processing, short request processing, and rapid request processing have turned out to be imperfect, as per discussion at each of those links. But then, no category name is ever perfect anyway. I’ve finally settled on short request processing, largely because I think it does a good job of preserving the analytic-vs-bang-bang-not-analytic workload distinction.
The easy part of the distinction goes roughly like this:
- Anything transactional or “OLTP” is short-request.
- Anything “OLAP” is analytic.
- Updates of small amounts of data are probably short-request, be they transactional or not.
- Retrievals of one or a few records in the ordinary course of update-intensive processing are probably short-request.
- Queries that return or aggregate large amounts of data — even in intermediate result sets — are probably analytic.
- Queries that would take a long time to run on badly-chosen or -configured DBMS are probably analytic (even if they run nice and fast on your actual system).
- Analytic processes that go beyond querying or simple arithmetic are — you guessed it! — analytic.
- Anything expressed in MDX is probably analytic.
- Driving a dashboard is usually analytic.
Where the terminology gets more difficult is in a few areas of what one might call real-time or near-real-time analytics. My first takes are: Read more
Categories: Analytic technologies, Data warehousing, MySQL, NoSQL, OLTP | 34 Comments |