Investment research and trading
Discussion of how data management and analytic technologies are used in trading and investment research. (As opposed to a discussion of the services we ourselves provide to investors.) Related subjects include:
- CEP (Complex Event Processing)
- (in Text Technologies) The use of text analytics in trading and investment research
Aster Data nCluster 4.5
Like Vertica, Netezza, and Teradata, Aster is using this week to pre-announce a forthcoming product release, Aster Data nCluster 4.5. Aster is really hanging its identity on “Big Data Analytics” or some variant of that concept, and so the two major named parts of Aster nCluster 4.5 are:
- Aster Data Analytic Foundation, a set of analytic packages prebuilt in Aster’s SQL-MapReduce
- Aster Data Developer Express, an Eclipse-based IDE (Integrated Development Environment) for developing and testing applications built on Aster nCluster, Aster SQL-MapReduce, and Aster Data Analytic Foundation
And in other Aster news:
- Along with the development GUI in Aster nCluster 4.5, there is also a new administrative GUI.
- Aster has certified that nCluster works with Fusion I/O boards, because at least one retail industry prospect cares. However, that in no way means that arm’s-length Fusion I/O certification is Aster’s ultimate solid-state memory strategy.
- I had the wrong impression about how far Aster/SAS integration has gotten. So far, it’s just at the connector level.
Aster Data Developer Express evidently does some cool stuff, like providing some sort of parallelism testing right on your desktop. It also generates lots of stub code, saving humans from the tedium of doing that. Useful, obviously.
But mainly, I want to write about the analytic packages. Read more
| Categories: Analytic technologies, Aster Data, Data warehousing, Investment research and trading, RDF and graphs, SAS Institute, Teradata | 1 Comment |
Quick thoughts on the StreamBase Component Exchange
Streambase is announcing something called the StreamBase Component Exchange, for developers to exchange components to be used with the StreamBase engine, presumably on an open source basis. I simultaneously think:
- This is a good idea, and many software vendors should do it if they aren’t already.
- It’s no big deal.
For reasons why, let me quote an email I just sent to an inquiring reporter:
- StreamBase sells mainly to the financial services and intelligence community markets. Neither group will share much in the way of core algorithms.
- But both groups are pretty interested in open source software even so. (I think for both the price and customizability benefits.)
- Open source software commonly gets community contributions for connectors, adapters, and (national) language translations.
- But useful contributions in other areas are much rarer.
- Linden Labs is one of StreamBase’s few significant customers outside its two core markets.
- All of the above are consistent with the press release (which quotes only one StreamBase customer — guess who?).
| Categories: Complex event processing (CEP), Games and virtual worlds, Investment research and trading, Open source, StreamBase | 5 Comments |
The Sybase Aleri RAP
Well, I got a quick Sybase/Aleri briefing, along with multiple apologies for not being prebriefed. (Main excuse: News was getting out, which accelerated the announcement.) Nothing badly contradicted my prior post on the Sybase/Aleri deal.
To understand Sybase’s plans for Aleri and CEP, it helps to understand Sybase’s current CEP-oriented offering, Sybase RAP. So far as I can tell, Sybase RAP has to date only been sold in the form of Sybase RAP: The Trading Edition. In that guise, Sybase RAP has been sold to >40 outfits since its May, 2008 launch, mainly big names in the investment banking and stock exchange sectors. If I understood correctly, the next target market for Sybase RAP is telcos, for real-time network tuning and management.
In addition to any domain-specific applications, Sybase RAP has three layers:
- CEP (Complex Event Processing). Sybase RAP CEP is based on a version of the Coral8 engine Sybase licensed and has been subsequently developing.
- In-memory DBMS. Sybase’s IMDB is part of (but I guess separable from) and has the same API as Sybase’s OLTP DBMS Adaptive Server Enterprise (ASE, aka Sybase Classic).
- Sybase IQ. Actually, Sybase used the phrase “based on Sybase IQ,” but I’m guessing it’s just Sybase IQ.
Quick thoughts on Sybase/Aleri
Sybase announced an asset purchase that amounts to a takeover of CEP (Complex Event Processing) Aleri. Perhaps not coincidentally, Sybase already had technology under the hood from Aleri predecessor/acquiree Coral8, for financial services uses (notwithstanding that between Aleri Classic and Coral8, Aleri Classic was the one of the two more focused on financial services). Quick reactions include:
- The folks at Sybase still haven’t figured out when to prebrief me. (Edit: I’ve been briefed subsequently.)
- Sybase/Aleri is a potentially powerful combination, if they can effectively address the point I just made about integrating disparate latencies. That said, I’m not expecting a lot, because the CEP industry always disappoints me.
- Microsoft, IBM, and (somewhat less clearly) Oracle are all trying to do CEP inhouse. Sybase is making a good choice in having serious CEP inhouse itself
- Surely the main focus and financial justification for the Sybase/Aleri acquisition is the financial services market.
- Specifically, I expect the focus of technical integration between Aleri and Sybase’s DBMS products to start with Sybase IQ.
- Coral8 had some interesting ideas about how to integrate CEP with OLTP/operational BI, but I’m not aware that they got much traction.
- I bet there are use cases where Sybase tries and fails to sell Adaptive Server SQL Anywhere that CEP would be a better technical fit, but I don’t immediately see much practical business significance to that observation.
- While this deal could easily strengthen the Vertica/StreamBase partnership, I don’t see any reason why it would lead those two companies to actually merge.
Related link
| Categories: Aleri and Coral8, Analytic technologies, Complex event processing (CEP), Investment research and trading, Sybase | 7 Comments |
Three broad categories of data
People often try to draw a distinction between:
- Traditional data of the sort that’s stored in relational databases, aka “structured.”
- Everything else, aka “unstructured” or “semi-structured” or “complex.”
There are plenty of problems with these formulations, not the least of which is that the supposedly “unstructured” data is the kind that actually tends to have interesting internal structures. But of the many reasons why these distinctions don’t tend to work very well, I think the most important one is that:
Databases shouldn’t be divided into just two categories. Even as a rough-cut approximation, they should be divided into three, namely:
- Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays
- Human/Nontabular data — i.e., all other data generated by humans
- Machine-Generated data
Even that trichotomy is grossly oversimplified, for reasons such as:
- These categories overlap.
- There are kinds of data that get into fuzzy border zones.
- Not all data in each category has all the same properties.
But at least as a starting point, I think this basic categorization has some value. Read more
| Categories: Database diversity, Investment research and trading, Log analysis, Telecommunications, Web analytics | 9 Comments |
A framework for thinking about data warehouse growth
There are only three ways that the amount of data stored in data warehouses can grow:
- The same kinds of data are stored as before, with more being added over time.
- The same kinds of data are stored as before, but in more detail.
- New kinds of data are stored.
| Categories: Analytic technologies, Application areas, Data warehousing, Investment research and trading, Log analysis, Solid-state memory, Storage, Telecommunications, Text, Web analytics | 7 Comments |
Boston Big Data Summit keynote outline
Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.
General introduction to Splunk
I dropped by log analysis software vendor Splunk a few weeks ago for a chat with Marketing VP Steve Sommer (who some you may know from Cognos and/or Informix), Product Management VP Christina Noren, and above all co-founder/CTO Erik Swan. Splunk turns out to be a pretty interesting company, from both business and technical standpoints. For one thing, Splunk seems highly regarded by most people I mention it to.
Splunk’s technical stories include:
- Text search over log files.
- Business intelligence over text search. (That part sounds a lot like Attivio.)
- MapReduce with schema flexibility and smart multi-stage execution plans. (That part sounds a lot like Aster Data.)
More on those in a separate post.
Less technical Splunk highlights include: Read more
| Categories: Analytic technologies, Fox and MySpace, Investment research and trading, Log analysis, Splunk, Telecommunications, Text, Web analytics | 1 Comment |
Infobright notes
I had lunch w/ Bob Zurek and Susan Davis of Infobright today. This wasn’t primarily a briefing, but a few takeaways are:
- Infobright now has >100 paying customers.
- Typical database size is from the low 100s of gigabytes to the low single-digit number of terabytes.
- Agile development is at or approaching two-week release cycles.
- Like Kickfire, Infobright has a multi-year deal with MySQL that insulates it against many potential Oracle/MySQL shenanigans.
- From an industry perspective, Infobright’s customer base sounds a lot like other vendors’:
- Data mart outsourcing/online analytics
- Log files for websites
- Telecommunications
- Financial services
- OEM, especially in the markets cited above
- “Hey, we’re beginning to see the occasional energy deal”
- A few random others
- Infobright is seeing some household-name customers, who surely have big-name analytic DBMS products, but who also have a policy that open source is the default choice, and if open source can get the job done then the favorite closed-source choices aren’t used.
- Infobright has the usual open-source community story — lots of involvement and engagement in the forums, but contributions are limited mainly to connectivity, utility scripts, etc. (Maybe some national language translation too; I’m not sure.)
How 30+ enterprises are using Hadoop
MapReduce is definitely gaining traction, especially but by no means only in the form of Hadoop. In the aftermath of Hadoop World, Jeff Hammerbacher of Cloudera walked me quickly through 25 customers he pulled from Cloudera’s files. Facts and metrics ranged widely, of course:
- Some are in heavy production with Hadoop, and closely engaged with Cloudera. Others are active Hadoop users but are very secretive. Yet others signed up for initial Hadoop training last week.
- Some have Hadoop clusters in the thousands of nodes. Many have Hadoop clusters in the 50-100 node range. Others are just prototyping Hadoop use. And one seems to be “OEMing” a small Hadoop cluster in each piece of equipment sold.
- Many export data from Hadoop to a relational DBMS; many others just leave it in HDFS (Hadoop Distributed File System), e.g. with Hive as the query language, or in exactly one case Jaql.
- Some are household names, in web businesses or otherwise. Others seem to be pretty obscure.
- Industries include financial services, telecom (Asia only, and quite new), bioinformatics (and other research), intelligence, and lots of web and/or advertising/media.
- Application areas mentioned — and these overlap in some cases — include:
- Log and/or clickstream analysis of various kinds
- Marketing analytics
- Machine learning and/or sophisticated data mining
- Image processing
- Processing of XML messages
- Web crawling and/or text processing
- General archiving, including of relational/tabular data, e.g. for compliance
