Document Actions
Jose Felipe Ortega's blog at LibreSoft
Up one levelDivide and Conquer: Table Partitioning in MySQL
Last FOSDEM was simply terrific. Too many interesting talks to attend, too many cool topics covered. You have to choose, and then, you're always running the risk of missing that special talk you will remember during the next months. Fortunately, I made the right choice.
The talk was given in the MySQL devroom, by Guiseppe Maxia. The topic: "Boost performance with MySQL 5.1 Partitions". The live explanation of the difference between logical and physical partitioning was just great. Partitioning is something I have been waiting for a long time in MySQL.
Let's say that you have a very large table, storing information from FLOSS development repositories, from log archives, or any other chronological data you may think of. Obviously, querying for specific subsets of this data can be a time consuming task. But you can save a lot of time with table partitioning. You can divide the main table in logical partitions, stored in separate files in the disk. Each chunk of the original table contains a subset of data. You choose how to divide it: by years, months, a subset of the whole list of row identifiers. The only restrictions is that you must choose a numeric field (or any other type convertible to numeric format, e.g. DATE, or DATETIME). But the engine still sees only one table. The chunks are transparent, that's the magic of this technique.
Whenever you query for a subset of data, instead of searching over one huge file, the engine only searches in the small file corresponding to that subset (if you selected the range of chunks correctly). Or at least, in a smaller number of files, rather than over the entire table. Look at the presentation, and you will see that you can do many tricks with ARCHIVE tables, that you can avoid spending your time in endless queries to build indexes, and boost your performance.
However, ain't no silver bullet in DBs. GROUP BY clauses over the entire table will use all chunks, and performance may be even worse. That's why, unfortunately, I can't profit from this new feature in WikiXRay. At least, not yet.
I wonder... would it make sense to use parallel filesystems? Would MySQL be capable of reading several chunks at the same time? I'm afraid that the answer is no, but I feel it worths being investigated.

