July 2, 2012

Introduction to Yarcdata

Move forward with the classic supercomputer business.
Diversify into related areas.

At the moment, the main diversifications are:

Boxes that are like supercomputers, but at a lower price point.
Storage.
“(Big) data”.

The last of the three is what Cray subsidiary Yarcdata is all about.

“Yarc” = “Cray” spelled backwards.

To a first approximation, Yarcdata is a bunch of Cray guys, with an overlay out of Informatica/Siperian and other database-oriented software companies. Yarcdata’s first effort is to manage graph data, via an appliance product called uRika.* More precisely, uRika manages RDF triples, with SPARQL as the query language. More precisely yet, uRika manages quadruples, with the fourth field being for “subgraph ID”. Having multiple subgraphs sounds like it’s somewhere between having:

Multiple tables in one database.
Multiple databases managed by one DBMS.

A natural way to wind up with multiple subgraphs is to import data from different sources.

Yarcdata is still trying to figure out exactly which relationship analytics application areas it is pursuing. Yarcdata’s big multi-year design partner was a large intelligence agency, for an unspecified application that obviously has a lot to do with terrorism and national security. Also mentioned, as is appropriate for a Cray subsidiary, are application areas that feel more scientific or technical (life sciences, financial services). Not mentioned much so far — except perhaps by me — are telecom/influencer-detection and anti-fraud.

The last time Yarcdata gave me a customer count, it was 5, but that was some months ago.

As best I understand, uRika has two tiers of servers. One tier features commodity hardware, and runs a stack of data access software from or at least based on the Apache Jena project. The other tier has classic Cray hardware, running a proprietary data store. This data store is in-memory, except that like most in-memory analytic stores, it can be initialized from disk. Notes on the data store part include:

It’s shared-everything, with one global address space for RAM. There’s no explicit data partitioning.
Cray talks a lot about half a petabyte of RAM, to the point that I’m guessing that that’s what the classified first customer actually has. But of course you can get uRika in various different sizes.
A key point is that Cray lets you have lots of threads going. Figures on that included 128 threads/processor and 8000 processors, for 1 million threads.
Why so many threads? To help “tolerate” memory latency. If one thread is delayed, just switch to the next.

On the graph analytic functionality, there seems to be less in the way of uRika secret sauce at this time. SPARQL 1.0 and Jena get mentioned, but innovative extensions are discussed not so much in the present tense, but rather in future or hypothetical terms. Anyhow, I haven’t spent a lot of time looking at what SPARQL can or can’t do, but I gather that if you want to do a straightforward graph query, SPARQL can handle it. But for graph analytics such as centrality measures or whatever, you need tools or extensions.

Categories: Data models and architecture, Health care, In-memory DBMS, Investment research and trading, Market share and customer counts, Parallelization, Petabyte-scale data management, RDF and graphs, Yarcdata and Cray

Subscribe to our complete feed!

Comments

One Response to “Introduction to Yarcdata”

Cray | DBMS 2 : DataBase Management System Services on July 2nd, 2012 4:57 am

[…] I’m now consulting to Cray largely because of Bill Blake, specifically to Cray subsidiary Yarcdata. Along the way, I’ve picked up enough about Cray in general — largely from Bill and […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Introduction to Yarcdata

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin