Investment research and trading
Discussion of how data management and analytic technologies are used in trading and investment research. (As opposed to a discussion of the services we ourselves provide to investors.) Related subjects include:
- CEP (Complex Event Processing)
- (in Text Technologies) The use of text analytics in trading and investment research
Agile predictive analytics – the heart of the matter
I’ve already suggested that several apparent issues in predictive analytic agility can be dismissed by straightforwardly applying best-of-breed technology, for example in analytic data management. At first blush, the same could be said about the actual analysis, which comprises:
- Data preparation, which is tedious unless you do a good job of automating it.
- Running the actual algorithms.
Numerous statistical software vendors (or open source projects) help you with the second part; some make strong claims in the first area as well (e.g., my clients at KXEN). Even so, large enterprises typically have statistical silos, commonly featuring expensive annual SAS licenses and seemingly slow-moving SAS programmers.
As I see it, the predictive analytics workflow goes something like this Read more
| Categories: Investment research and trading, Predictive modeling and advanced analytics, SAS Institute, Telecommunications, Web analytics | 19 Comments |
Agile predictive analytics — the “easy” parts
I’m hearing a lot these days about agile predictive analytics, albeit rarely in those exact terms. The general idea is unassailable, in that it boils down to using data as quickly as reasonably possible. But discussing particulars is hard, for several reasons:
- Pundits tend to sketch castles in the air.
- Vendors tend to confuse part of the story — generally the part they happen to offer
— with the whole. - Different use cases give rise to different kinds of issues.
At least three of the generic arguments for agility apply to predictive analytics:
- Doing the correct thing soon is usually better than doing the same correct thing later.
- If it doesn’t take much time to do something, hopefully it doesn’t take that much expense (labor and so on) either.
- It’s hard to get new stuff completely right on the first try. Often, the best strategy is to come close fast, then fix what’s still not ideal.
But the reasons to want agile predictive analytics don’t stop there.
| Categories: EAI, EII, ETL, ELT, ETLT, Investment research and trading, Predictive modeling and advanced analytics | 12 Comments |
Terminology: Data mustering
I find myself in need of a word or phrase that means bring data together from various sources so that it’s ready to be used, where the use can be analysis or operations. The first words I thought of were “aggregation” and “collection,” but they both have other meanings in IT. Even “data marshalling” has a specific meaning different from what I want. So instead, I’ll go with data mustering.
I mean for the term “data mustering” to encompass at least three scenarios:
- Integrated (relational) data warehouse.
- Big bit bucket.
- Big bit stream.
Let me explain what I mean by each. Read more
| Categories: Complex event processing (CEP), Data warehousing, Investment research and trading, Sybase, Teradata | 10 Comments |
Some big-vendor execution questions, and why they matter
When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was “execution of various big vendors’ ambitious initiatives”. By “execute” I mean mainly:
- “Deliver products that really meet customers’ desires and needs.”
- “Successfully convince them that you’re doing so …”
- “… at an attractive overall cost.”
Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:
- salesforce.com (multiple subjects).
- SAS HPA.
- The evolution of Hadoop.
StreamBase catchup
While I was cryptic in my general CEP/streaming catchup, I’ll say a bit more regarding StreamBase in particular. At the highest level, non-technically:
- StreamBase once planned to conquer the world.
- However, StreamBase really only sold effectively in the financial trading and intelligence markets.
- StreamBase retrenched, focusing almost exclusively on the financial trading market.
- With StreamBase LiveView, StreamBase is expanding from embedded operational analytics to do (also operational) business intelligence as well.
- StreamBase is hopeful that, perhaps starting with Version 2 or so, LiveView will be successful outside the financial trading market.
| Categories: Complex event processing (CEP), Investment research and trading, Parallelization, StreamBase | 2 Comments |
IBM is buying parallelization expert Platform Computing
IBM is acquiring Platform Computing, a company with which I had one briefing, last August. Quick background includes: Read more
| Categories: Hadoop, IBM and DB2, Investment research and trading, MapReduce, Parallelization, Scientific research | 5 Comments |
Hadoop notes
I visited California recently, and chatted with numerous companies involved in Hadoop — Cloudera, Hortonworks, MapR, DataStax, Datameer, and more. I’ll defer further Hadoop technical discussions for now — my target to restart them is later this month — but that still leaves some other issues to discuss, namely adoption and partnering.
The total number of enterprises in the world paying subscription and license fees that they would regard as being for “Hadoop or something Hadoop-related” probably is not much over 100 right now, but I’d expect to see pretty rapid growth. Beyond that, let’s divide customers into three groups:
- Internet businesses.
- Traditional enterprises ‘ internet operations.
- Traditional enterprises’ other operations.
Hadoop vendors, in different mixes, claim to be doing well in all three segments. Even so, almost all use cases involve some kind of machine-generated data, with one exception being a credit card vendor crunching a large database of transaction details. Multiple kinds of machine-generated data come into play — web/network/mobile device logs, financial trade data, scientific/experimental data, and more. In particular, pharmaceutical research got some mentions, which makes sense, in that it’s one area of scientific research that actually enjoys fat for-profit research budgets.
| Categories: Cloudera, Hadoop, Health care, Hortonworks, Investment research and trading, Log analysis, MapR, MapReduce, Market share and customer counts, Scientific research, Web analytics | 5 Comments |
Petabyte-scale Hadoop clusters (dozens of them)
I recently learned that there are 7 Vertica clusters with a petabyte (or more) each of user data. So I asked around about other petabyte-scale clusters. It turns out that there are several dozen such clusters (at least) running Hadoop.
Cloudera can identify 22 CDH (Cloudera Distribution [of] Hadoop) clusters holding one petabyte or more of user data each, at 16 different organizations. This does not count Facebook or Yahoo, who are huge Hadoop users but not, I gather, running CDH. Meanwhile, Eric Baldeschwieler of Hortonworks tells me that Yahoo’s latest stated figures are:
- 42,000 Hadoop nodes …
- … holding 180-200 petabytes of data.
Eight kinds of analytic database (Part 2)
In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more
Eight kinds of analytic database (Part 1)
Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.
Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more
