June 5, 2010

Extended set theory, aka “What is a tuple anyway?”

The Algebraix folks are trying to repopularize David Childs’ idea of “Extended set theory.” In a nutshell, the extended set theory idea is:

A tuple is a set of (field-name, field-value) pairs.

I’ve been fairly negative about the extended set theory concept – but in fairness, that may be because I misunderstood how other people thought of tuples. Any time I’ve had to formalize what I thought of a tuple as being, I came up with something very much like the above, except that if one wants to be relational one needs a requirement like:

In any one tuple, each field-name must be unique.

In line with that definition, I’d say a table is something like: 

  1. A set of tuples, such that any two tuples in the set can be put into one-to-one correspondence with each other in such a way that the field-names of corresponding pairs are the same, plus
  2. (Optionally!) A convention for consistently arranging the members of a tuple in a certain order according to field-name.

If you buy that, then the next step is to define a join along the lines of

Definition: f is a field-name for table T if for any tuple t in T, t contains a pair (f,v) for some v.

Definition: Let T and U be tables. Define pseudocart(T,U) to be the image of the Cartesian product T x U under the amazingly simple mapping that sends (t,u) to the union of t and u.

Definition: Let T and U be tables with field-names f and g respectively. The join of T and U on fields f and g is the subset J of pseudocart(T,U) such that, for all s in pseudocart(T,U), s is in J if and only if, for (f,v) and (g,w) in s, v=w.

Note: s is of course itself a set. (f,v) and (g,w) must each exist in s and be unique, by the definitions. That’s why I didn’t say “for all”.

Note: I came up with all of the above by myself. No doubt the whole thing has been done better many times before — or at least with different, possibly more scalable, terminology. Links to same in the comment thread would be appreciated.

The net result is that joins are commutative, associative, and generally nicely behaved, which extended set theory proponents say they aren’t in a standard relational algebra.

In fairness to the extended set theory folks, Wikipedia shows us some common definitions of tuple are rather different from what I gave above. But so what? Use any other definition. Then there’s a one-to-one mapping between the tables and tuples you have and the ones I outlined above. Use that mapping, plus the definition of join I outlined above, to induce a join operation on the tables you defined. It will have the same nice algebraic properties as mine.

A second claim made for extended set theory is that it doesn’t just cover the relational case. I.e, if you relax the rule “In any one tuple, each field-name must be unique,” then all of a sudden the same set-theoretic model can be used to describe XML as well. Coolness. But I’ll admit to not yet knowing of any reasons why that description would actually be useful for anything, unless somebody (e.g., Algebraix) really comes out with a general, extended-set-theory-inspired software engine that does a great job of managing multiple kinds of data.

Comments

23 Responses to “Extended set theory, aka “What is a tuple anyway?””

  1. David Childs | Software Memories on June 5th, 2010 5:02 am

    […] I’m far from convinced that “Extended set theory” has much to offer versus the sta…. But that debate quite aside — Childs’ original achievement doesn’t get the credit it deserves. Categories: Business intelligence, Database management systems  Subscribe to our complete feed! […]

  2. Does donating a pint of blood affect a weight loss program/diet? | Weight Loss Guide on June 5th, 2010 8:17 pm

    […] Extended set theory, aka “What is a tuple anyway?” | DBMS2 … […]

  3. D L Childs on June 6th, 2010 11:48 am

    Your readers might be interested in the following two articles that explain why classical definitions of n-tuples are not suitable for consideration as operands:

    SKOLEM: TWO REMARKS ON SET THEORY
    http://www.mscand.dk/article.php?id=1481

    AXIOMS FOR AN EXTENDED SET THEORY
    http://xsp.xegesis.org/X_axioms.pdf

  4. Curt Monash on June 6th, 2010 6:59 pm

    Hi David. Thanks for posting!

    While Section 2 of the 1957 Skolem paper raises some theoretical issues in set theory, I didn’t see anything that spoke directly to the point of “not suitable for consideration as operands.”

    More to the point — what do you see wrong with the definitions I proposed in my post? 😉

    Best,

    CAM

  5. D L Childs on June 7th, 2010 1:29 am

    Curt:

    The second reference expands on the foundational issues raised by Skolem adding that any proposed set definition for n-tuples as sets does not support meaningful behavior in conjunction with set operations. For example, using the Kuratowski definition: [David said something here that is interpreted as HTML by WordPress. I’ll ask him to repost it without using angle brackets]. These are not particularly useful results when trying to formally model set-store data architectures at the I/O interface.These are not particularly useful results when trying to formally model set-store data architectures at the I/O interface.These are not particularly useful results when trying to formally model set-store data architectures at the I/O interface.

    John McCarthy experienced the difficulty of modeling arbitrary M-expressions as n-tuples when he tried to formally define a phi operator that plucked the i-th component out of an n-tuple. In LISP he finally settled on S-expressions using CDR, CAR, CONS supported by ordered pairs to construct simulated n-tuples.

    My work as always be focused on set-theoretic foundations with application to modeling system behavior. Codd focused on supporting data independence between user applications and system processing. My interest is formally modeling the I/O interface supporting data independence between system processing and system storage. Your definition for tuples seems perfectly adequate for high-level modeling, but it does not seem to address the formal set-membership requirements necessary to support well-behaved behavior under set operations.

    The following paper may be of some help in clarifying the distinction between using tuples at a high-level user interface like the RDM versus using tuples as operands for modeling system behavior at the low-level I/O interface.

    SET-STORE DATA ACCESS ARCHITECTURES at http://xsp.xegesis.org/SSDAA.pdf

    Thanks for the opportunity to discuss extended set theory.

  6. Curt Monash on June 7th, 2010 1:46 am

    David,

    I’m tentatively inclined to believe that the most natural way to describe a tuple is as a bag of name-value pairs. (Naturally, you’ll want an implementation that doesn’t actually recite the name part for each datum, but there are several widely adopted ways of ensuring that.) That strikes me as making both logical and physical sense.

    But that can be done within the framework of the set theory everybody learns in grade school. More generally — from where I sit, set theory doesn’t have a lot of problems until sets get infinite, and infinite sets are not a concern for computer science.

    What, in your opinion, am I missing here?

    CAM

  7. D L Childs on June 7th, 2010 2:31 am

    Curt:

    The key is the underlying membership condition or mathematical identity of a symbolic expression. You seem to be phrasing your question in the framework of an RDM. In CST or XST there is no formal construct called a “bag” or “name-value-pairs”.

    For me all definitions need to be derivable from a collection of axioms. ZFC does not work for me, but XZFCK (referenced earlier) does.

  8. Curt Monash on June 7th, 2010 9:45 am

    David,

    If you look back at my original post, “bag” = “set” and “name-value pair” = “ordered pair”.

    CAM

  9. D L Childs on June 7th, 2010 11:24 am

    Curt:

    In order to support data independence at the I/O interface I need to define operations that can recognize and manipulate physical data representations in storage. I am not able to define these operations under classical set theory axioms.

  10. D L Childs on June 7th, 2010 2:28 pm

    RE: Posting Obfuscation

    In a previous posting the computer decided to obscure a sensitive point regarding n-tuples. The sentence is reworded below disguising angle-brackets as round-brackets.

    “For example, using the Kuratowski definition: (a,b).IN.(a,c)=(a,a) and (a,b,c).IN.(x,b,y)=NULL.”

    Earlier (15 years or so) criticism of this behavioral anomaly was that no one would ever need to use set operations to manipulate n-tuples. However, resolving this anomaly allows set operations to be very useful for manipulating physical data representations in storage.

  11. Curt Monash on June 7th, 2010 5:58 pm

    David,

    It sounds like you’re saying “I want to build my query execution plan out of primitives in such a way that each of those primitives can be agnostic as to physical representation.” If so, that raises two questions:

    1. Which primitives?
    2. How agnostic? (There are performance optimizations for everything, after all …)

  12. Bryan Pendleton on June 7th, 2010 6:49 pm

    If I’m understanding you correctly, Model 204 implemented the extended set theory data model in the late 60’s. In the 70’s and 80’s Model 204 was very successful with certain data management problems. Read more about it here: http://sirius-software.com/m204.html

  13. Curt Monash on June 7th, 2010 8:11 pm

    Good link. Thanks!

    I’ve always thought of Model 204 as an inverted list system. But then, there’s not necessarily all that much difference in the approaches taken by, say, Attivio, Cache’, MarkLogic, or Adabas, even though most people would put those into rather different categories.

  14. D L Childs on June 8th, 2010 2:48 am

    Curt:

    I’m not sure I understand your questions which seem to imply that an application has the ability to dictate performance directives to the underlying system data provider. This only makes sense if the system architecture subverts data independence between user environment and processing environment. As is the case when allowing index structures to be defined at the application level.

    In a truly data independent architecture an application can only express desired results and a minimum data covering that supports derivation of these resells – a set specification of the desired result. The capability is latent in SQL and can be easily achieved if execution plans are eliminated at the application level.

    The real value of set processing, as I see it, is at the architectural level, not at the programming level. Using set theory to program seems akin to using Dedekind cuts to balance a checkbook.

    If by “agnostic as to physical primitives” you mean a belief in a Maxwell Demon that will only let informational dense data pass through the I/O interface, then, yes, any precise expression of a desired result (like an SQL statement or a SQUARE expression) can be executed as optimally as the underlying set processing engine and hardware performance potential allow.

    This “answer” may produce more questions then it answers. A current paper in progress (SET-PROCESSING I/O ARCHITECTURES For Application Independent Data Access) expands on the architectural requirements for using set processing to tap the I/O performance potential of any given hardware configuration. Hopefully it will do a better job of reducing questions than in creating them.

  15. Curt Monash on June 8th, 2010 3:44 am

    David,

    In an ideal world, computer systems would be designed to give great performance for any or all of the applications they are logically capable of running, with no user optimization needed, whether at purchase time, installation time, or later.

    Reality, as you know, is rather far from that ideal.

    CAM

  16. D L Childs on June 8th, 2010 4:15 am

    Curt:

    I don’t understand your last posting, but on rereading you questions and applying them to the middle level component of a three tiered architecture (APP-PRO-STO each component separated by a data independence that allows communication only by set-membership exchange), then the answers to your questions are:

    1. Which primitives?
    No predetermine universal collection, but any collection with XST definitions will work. The only architectural concern is what allowable operations will support the best data representations and access performance given a chosen hardware configuration.

    2. How agnostic?
    Since all data is stored and accessed as a set, the only belief required is in the consistency of XST axioms.

  17. Curt Monash on June 8th, 2010 7:43 am

    David,

    And now back to my original post.

    If we define a tuple as a set of ordered pairs, a relational tuple as a set of ordered pairs in which each of the first elements is unique, and a relational table as set of relational tuples that each have the same set of first-elements — doesn’t that get the job done?

    I did keep things simple by only defining an equi-join, but in another line or two I could have covered the general case of joins just as well.

    The tap dance to go from the true Cartesian product (ordered pair of sets) to the pseudo-Cartesian product (union of the two sets in the ordered pair) wasn’t bad at all.

    ———————————–

    Hmm. I’m seeing one little thing I didn’t say, which is that two different tables have to have disjoint sets of field-names. But that can be assumed true as a tautology.

  18. D L Childs on June 8th, 2010 12:32 pm

    Curt:

    If your definitions work for your purposes, then that is all that matters. In order to answer if they will work for my purposes, I will need answers to the following questions:

    1) What base collection of axioms are you using: ZFC, Quine, NBG, other?

    2) What is the set-membership of your “set of ordered pairs”: Kuratowski, Wiener, Quine, other?

    3) How do your “pairs” and “tuples” behave under intersection?

    4) Using (a,b) and (x,y,z) for pairs and tuples what is the cardinality of the set { (a,b,c), ((a,b),c), (a,(b,c)), ((a,1), (b,2), (c,3)} }? In ZFC the answer is 3. In XST the answer is 4.

    5) How do you define the McCarthy rho operator that plucks the i-th element from a n-tuple? Ex. rho(4) of (a,b,c,d,e) equals d.

    6) How do you define “tables” as sets? Tables (Codd uses “arrays”) are not a formal constituent of the RDM (see http://xsp.xegesis.org/Relations_70.pdf ). Tables are “expository” representations to conveniently represent “relationships”, which in turn are equivalence classes of relations. The RDM uses set theory in spirit but not in fact. Tables are not sets!

    The foundations of the RDM suffer all the failings imbued by the Kuratowski definition of ordered pair. Thus any definition of tuple that shares the same formal support as the RDM will not work for my purposes.

    A more comprehensive analysis of tuples, tables, records and sets is presented in the following:
    Tuples: http://xsp.xegesis.org/xst_etc5a.pdf
    Klasses: http://xsp.xegesis.org/N_isntk.pdf
    Records(p. 6): http://xsp.xegesis.org/Spio.pdf
    Tables(p. 5): http://xsp.xegesis.org/SSDAA.pdf

  19. Curt Monash on June 8th, 2010 5:08 pm

    1) What base collection of axioms are you using: ZFC, Quine, NBG, other?

    Anything compatible with elementary-school textbooks.

    2) What is the set-membership of your “set of ordered pairs”: Kuratowski, Wiener, Quine, other?

    Ditto

    3) How do your “pairs” and “tuples” behave under intersection?

    4) Using (a,b) and (x,y,z) for pairs and tuples what is the cardinality of the set { (a,b,c), ((a,b),c), (a,(b,c)), ((a,1), (b,2), (c,3)} }? In ZFC the answer is 3. In XST the answer is 4.

    That’s four different sets.

    5) How do you define the McCarthy rho operator that plucks the i-th element from a n-tuple? Ex. rho(4) of (a,b,c,d,e) equals d.

    As per the above, I don’t define tuples to be ordered. There’s might be an “AddressLineOne” element, but there isn’t a “third” element.

    By the way, my definitions may not work as well unless we say all the sets discussed are finite. I was taking that as implicit.

    6) How do you define “tables” as sets? Tables (Codd uses “arrays”) are not a formal constituent of the RDM (see http://xsp.xegesis.org/Relations_70.pdf ). Tables are “expository” representations to conveniently represent “relationships”, which in turn are equivalence classes of relations. The RDM uses set theory in spirit but not in fact. Tables are not sets!

    Again as per the above, a table is a set of tuples, subject to a constraint about the set of first-values of the ordered pairs that make up each tuple.

    Perhaps I should define that set explicitly as the “implied-table-schema” of the tuple of something.

    The foundations of the RDM suffer all the failings imbued by the Kuratowski definition of ordered pair. Thus any definition of tuple that shares the same formal support as the RDM will not work for my purposes.

    I don’t see why, so long as you don’t go down the path that, for example, an ordered triple is an ordered pair of a singleton and an ordered pair.

  20. D L Childs on June 9th, 2010 11:31 am

    Curt:

    We seem to have different requirements for properties of n-tuples.

    Tuples were introduced in set theory to support the concept of a function, f(a)=b. This required a notation (with a proper underlying membership condition) that distinguished (a,b) from (b,a) when a was not equal to b. This was expanded to a general definition of n-tuple where (a(1),..,a(n)) = (b(1),..,b(n)) iff a(i)=b(i) for all i. A requisite property for n-tuples is “order”. Your definition does not seem to support this property.

    For your database interests your definition may be more than adequate. For my set theoretic interest in support of functional behavior, it does not work.

    For a better understanding of my set theoretic needs for tuples please read the first and last pages of Axioms, and as much of others as you have time for.

    Axioms: http://xsp.xegesis.org/X_axioms.pdf
    Tuples: http://xsp.xegesis.org/xst_etc5a.pdf
    Klasses: http://xsp.xegesis.org/N_isntk.pdf
    Records(p. 6): http://xsp.xegesis.org/Spio.pdf
    Tables(p. 5): http://xsp.xegesis.org/SSDAA.pdf

  21. Curt Monash on June 9th, 2010 5:20 pm

    You’re right, David. I’m dropping the notion of order because it’s inessential for database work.

    Interestingly, columnar database architectures laugh at the notion of order. And just about every row-based analytic DBMS vendor has a columnar option coming. Greenplum’s is here; Oracle’s is known to me and not under NDA; others are surely coming too.

  22. Ken North on June 11th, 2010 2:16 pm

    > I’m dropping the notion of order because it’s inessential for database work.

    Preserving order is essential when a database is used to store XML documents.

  23. Curt Monash on June 11th, 2010 7:50 pm

    @Ken,

    What examples do you have in mind? I see your point as being more obvious for full-text (enhanced by XML) than for pure XML use cases.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.