March 13, 2011

So how many columns can a single table have anyway?

I have a client who is hitting a 1000 column-per-table limit in Oracle Standard Edition. As you might imagine, I’m encouraging them to consider columnar alternatives. Be that as it may, just what ARE the table width limits in various analytic or general-purpose DBMS products?

By the way — the answer SHOULD be “effectively unlimited.” Like it or not,* there are a bunch of multi-thousand-column marketing-prospect-data tables out there.

*Relational purists may dislike the idea for one reason, privacy-concerned folks for quite another.

Comments

36 Responses to “So how many columns can a single table have anyway?”

  1. Vilho Raatikka on March 13th, 2011 4:34 am

    Silly question, what makes multi-thousand column relation superior if the alternative is even somehow normalized schema?

  2. David Aldridge on March 13th, 2011 8:36 am

    The 1,000 column limit is not just Oracle Standard Edition, it’s an Enterprise Edition limit also.

    http://download.oracle.com/docs/cd/E11882_01/server.112/e17110/limits003.htm

  3. Curt Monash on March 13th, 2011 8:09 pm

    Thanks, David. I’m not surprised, but I was being cautious in my phrasing.

    Matt Pfeil wrote in to say that Cassandra’s limit was 2 billion, which for me equates to “effectively unlimited.” Haven’t heard about any other relational DBMS yet.

  4. Curt Monash on March 13th, 2011 8:12 pm

    Vilho — other than that it takes less human effort and less hardware to get the job done, nothing.

  5. Hubi on March 13th, 2011 9:53 pm

    Microsoft SQL Server is 30K for tables with sparse columns.

    http://msdn.microsoft.com/en-us/library/ms143432.aspx

  6. Curt Monash on March 13th, 2011 10:44 pm

    Hubi,

    Thanks!

    Is the “sparse” just a practical matter — as in one couldn’t fill 30,000 non-sparse columns without blowing through max table size — or is there some more specific issue involved?

  7. Tony Rogerson on March 14th, 2011 2:42 am

    With sparse columns the max row size is still a limiting factor, spare columns simply save you space if you have a table with lots of columns that are predominately NULL.

    So, in reality if you have many columns that will not be null then you’ll hit the same wall.

    Tony.

  8. Vilho Raatikka on March 14th, 2011 3:39 am

    Hi Curt.
    I didn’t really understand from your answer how would using thousands of columns in one table make query execution faster or less consuming. Assuming that the reason is not because it was faster to create a schema with one table.

    By the way, solidDB supports 1000 columns. I was just curious to know why would it pay off to change that upper bound value to something else.

  9. Michael Hummel on March 14th, 2011 4:50 am

    I would go with “effectively unlimited.” for ParStream – only limited by the number of files you can have on a distributed server cluster / cloud … or maybe limited by disc space if at all.

    I agree with Curt that there are applications that use and need that much columns – e.g. in employee satisfaction surveying where people are asked many different questions. Because of the real-life complexity there are far more questions defined than asked a single employee. Depending on their answers whole question-tracks are left blank.
    Every possible question is assigned a column – believe me, more than a 1000.
    By storing it in a single table extremly fast pattern recognition can be performed, i.e. how many employees are between 20 and 30, work in a production department, are satisfied or very satisfied with their superiour … and many more.

    By using a bitmap index for every column advanced analytics can be made very fast. I would argue that performing the analysis on a single table with (highly compressed row-synchronized) bitmap-indices leads to much shorter response times (or more correlations checked per time) than by joining 1000 tables together.
    Any thoughts on that?

  10. Rob Verschoor on March 14th, 2011 5:03 am

    Sybase IQ supports 45,000 columns/table.

  11. Daniel Lemire on March 14th, 2011 7:16 am

    Oracle 10g has a limit of 1000 columns per table and multicolumn indexes are limited to 32 columns, IBM DB2 (version 8) has
    a limit of 750 columns per table whereas MySQL 5.0 has a limit of 4096 columns per table.

  12. Thorsten Grahl on March 14th, 2011 7:24 am

    Seems with DB2 9.7 the limit has increased to 1012 Maximum number of columns in a table (32K page size limit)

    http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.sql.ref.doc/doc/r0001029.html

  13. Curt Monash on March 14th, 2011 8:03 am

    Vilho,

    You seem to be proposing an alternative of doing lots of joins. Yet you also seem to be saying you don’t see how that’s slower than not doing lots of joins.

    Presumably, I’m misunderstanding at least half of that.

  14. Dawit on March 14th, 2011 8:58 am

    Teradata has a limit of 2048 columns per table. At least in Teradata, this limit is a function of the 128KB table header limit in its file system.

  15. Curt Monash on March 14th, 2011 10:27 am

    Dawit:

    Are you saying 64 bytes per column are needed for the headers in Teradata, or something like that?

  16. ssc on March 14th, 2011 11:06 am

    fyi, postgres is limited to 250 – 1600 columns per table, depending on the column type

    http://www.postgresql.org/about/

  17. RDL on March 14th, 2011 11:44 am

    PostgreSQL has a limit of 250-1600 columns per table (depending on column types):

    http://www.postgresql.org/about/

  18. Dawit on March 14th, 2011 12:06 pm

    In addition to column descriptions, the table header also contains other info like partition and compression info. The 128KB space is for all these info requiring the imit on number of columns per table.

  19. Vilho Raatikka on March 14th, 2011 12:08 pm

    Curt, I’m not proposing anything, I don’t know enough about analytic processing so that I could guess what kind of queries would be executed against that multi-thousand column table. Without knowing that, I don’t even try to claim how should the processing be done. It is just interesting to hear you arguments.

  20. Dawit on March 14th, 2011 12:38 pm

    In many applications, the limit on the number of columns per table is much more constraining as it applies to the intermediary or final results produced during a query processing than as it applies to base tables. At least in some DBMSs, the same limit applies for both. It is much more common to have complex queries that produce an intermediary or final result table that exceeds the limit. For example, computing aggregations over multiple categorical variables in SQL that makes only one-pass over the table results in a single row with values converted to columns.

  21. Bill on March 14th, 2011 1:10 pm

    The current limit for Ingres and VectorWise is 1024 cols per table, but for VectorWise we should consider raising it. I’ll second Michael’s comment above. It is not at all crazy for a DW to join all of the dimensions into the fact table, even where it results in a very wide table. Column-wise compression (e.g. dictionary compression) means this can be done with little or not storage overhead. Column selection means queries using a small subset of columns don’t suffer. And then there is no join left to do for queries, except for the special case of putting columns back into rows, for which column stores have very efficient mechanisms.

  22. Jeff Kibler on March 14th, 2011 1:47 pm

    Hi Curt,

    At Infobright, it all depends on the types.

    In MySQL and in us, there’s a limit of just under 65536 bytes per row (65532?) so that effectively limits the number of columns. Any column can be up to 64k, but the total width of all columns cannot be more than 64k. Thus, just under 16384 integer columns.

    Thanks,

    Jeff

  23. John M. Wildenthal on March 14th, 2011 4:29 pm

    Just ran a check on SAS. I was able to successfully construct a dataset with 2,072,576 eight byte columns on my laptop. On a 64bit box I was able to successfully create a dataset with 3,133,440 eight byte columns before I decided to kill it.

    I think SAS has a total observation length limit rather than a number of columns limit, as the others do. But I think it may have more to do with available memory rather than any absolute limit. So the 64bit implementation might have a truly impressive limit on the number of columns.

  24. Anders Skollermo on March 14th, 2011 5:08 pm

    Half a million columns i SAS!

    I am working on a paper in Risk Theory at Stockholm University. Since I do not have any C or C+ compiler installed on my home PC, I have used the SAS Data-steg.
    It turns out (partly to my surprise) that:
    * Processing of 10000-20000 columns is Very fast.
    (Please note that I only have one single row,
    so this is a Very unusual situation).
    * The code is very clear, slightly better in
    readability than Matlab.
    * Anyway to have Your results in 10000 variables,
    means that you have to restructure them, so
    you can see the result.
    * I think that SAS internally produces a C-like
    program, which stores the data in the one
    single observation, in real memory.
    * I have (for fun) tried with 500000 columns and
    performed a simple addition. Also fast – 17
    seconds.

  25. Paul Kent on March 14th, 2011 6:37 pm

    Thanks John and Anders (sas users) for getting here first! Its been recently (v8 or v9) that SAS increased our columns per table limit up from 32K (signed short) to essentially unlimited (unsigned long or roughly 2 billion). The “catch” is you need enough 64 bit address space to store all the column descriptors and at least one row of data in-memory.

    Genomics models were the primary driver. these folks typically have short but wide datasets of the order of 300,000 columns.

  26. Leon Katsnelson on March 14th, 2011 11:35 pm

    For IBM DB2, the maximum number of columns in a table is 1012 for all page sizes (8K, 16K, 32K) except 4K pages. If you use 4K pages then maximum number of columns in a table is 500.

  27. Vilho Raatikka on March 15th, 2011 6:21 am

    Curt, Michael, Dewit, and others, thanks for clafying comments. I had look years back to get into correct context. SolidDB is more OLTP and from that perspective things are looked from quite different angle.

    Since everybody is voting for ‘collapsed’ schema, one big fact table, is there any role for snowflake schema anymore? Are there any opposite arguments that you guys could, if not agree, but accept as the other way of solving the problem?

  28. Curt Monash on March 15th, 2011 6:56 am

    Vilho,

    The quick answer is that normalization is good (especially performance-wise) for some kinds of updates and a few kinds of queries, while denormalization is good for other kinds of updates and many kinds of queries. So it depends on your workload.

    It also depends on what software you use, as different software responds differently to different workloads.

    Basically, if updating with data integrity is the most demanding part of the task, normalization may make a lot of sense. If most retrievals are of single short rows, why not normalize? If a whole lot of different applications will use the same database, normalization may well turn out to be best. Absent those three scenarios, however, normalization’s benefits are often overblown.

  29. Graham Toppin on March 15th, 2011 7:53 am

    Hi Curt,

    Re: Infobright –

    One quick note to follow up on Jeff Kibler’s observation about the limit on the number of columns – it is a function of how MySQL (the database front end for Infobright) handles the column definitions. There isn’t any such limitation inside of the Infobright engine proper.

    In terms of the table structures, the maximum number of columns would be 2^32; however the current file abstraction would only allow 100,000 columns. This latter limitation can be easily removed.

    .g.

  30. Sandy Steier on March 15th, 2011 10:17 am

    1010data has no hard limit on the number of columns in a single table but as a practical matter system resource considerations probably puts it in the hundreds of thousands. We certainly have handled tens of thousands in a test table but in truth no “real” table has exceeded several thousand. Except possibly in cases of efficiency-motivated denormalization (to avoid joins), which in 1010data is usually unnecessary, or time series, which would be better organized differently, it’s hard to see why tables in almost any business sector would have more than a few thousand columns.

  31. Joe Ramsey on March 15th, 2011 11:46 am

    It’s also good to note that for Oracle that the 1000 column limit also includes “hidden columns”. Each column that has a function-based index applied to it, for each function that column gets materialized as a hidden column so effectively you have 1000 columns minus the number of function+column pairs for your function-based indexes.

  32. Randolph on March 15th, 2011 9:41 pm

    DeepCloud.co also has a 1024 column maximum. However if Ingres raise the limit for Vectorwise, we will increase the capacity of our MPP engine accordingly.

    What would be a reasonable maximum – 4K ?

  33. Curt Monash on March 15th, 2011 10:09 pm

    Today I’m sure 4K would be plenty. However, I don’t have a good feel for the rate of growth.

  34. Mike Pilcher on March 16th, 2011 1:22 pm

    To re-use one of my favourite quotes, “In theory, there is no difference between theory and practice. But, in practice, there is.” Jan L.A. van de Snepscheut.

    The theoretical limit for SAND is 4,294,967,296. Practically it is 8,192. We haven’t had a real world request for anything like that number. As you like to say as a grey beard in Chuck Taylors, we have seen a lot of requests over time.

  35. Balaji on September 10th, 2013 2:12 am

    How to count the no of columns in a table in DB2

  36. Some Vertica 6 features | DBMS 2 : DataBase Management System Services on January 31st, 2014 9:15 am

    […] opposed to Oracle’s annoying 1000 column limit, Vertica tables can have an effectively unlimited number of columns (whatever the machine’s […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.