Thursday, February 3, 2011

Modifying columns of very large mysql tables with little or no downtime

I periodically need to make changes to tables in mysql 5.1, mostly adding columns. Very simple with the alter table command. But my tables have up to 40 million rows now and they are growing fast... So those alter table commands take several hours. In a couple months they'll take days I'm guessing.

Since I'm using amazon RDS, I can't have slave servers to play with and then promote to master. So my question is if there's a way to do this with minimal downtime? I don't mind an operation taking hours or even days if users can still use the db of course... Can they at least read while columns are being added? What happens if my app tries to write? Insert or update? If it fails immediately that's actually not so bad, if it just hangs and causes problems for the db server that's a big problem..

This must be a fairly common scaling issue, everyone needs to add columns.. What's typically done to a production db? Slave -> master migration?

Update - I forgot to mention I'm using the innodb storage engine

  • From the manual: http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

    In most cases, ALTER TABLE makes a temporary copy of the original table. MySQL incorporates the alteration into the copy, then deletes the original table and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions. Updates and writes to the table are stalled until the new table is ready, and then are automatically redirected to the new table without any failed updates.

    So, reading will work fine. Writes will be stalled, but executed afterwards. If you want to prevent this, you'll have to modify your software.

    at01 : So I've done this and disabled the parts of my site that write to the table that I'm modifying right now. So far I have received several "Lock wait timeout exceeded; try restarting transaction" exceptions, that's not too bad. However, they were on PURELY read operations...
    From Erik
  • I periodically need to make changes to tables in mysql 5.1, mostly adding columns.

    Don't. No really. Just don't. It should be a very rare occasion when this is ever necessary.

    Assuming your data really is normalized to start with, the right way to solve the problem is to add a new table with a 1:1 relationship to the base table (non-obligatory on the new table).

    Having to add columns regularly is usually an indicator of a database which is not normalized - if your schema is not normalized then that's the problem you need to fix.

    Finally, if your schema really, really is normalized and you really, really must keep adding columns then:

    1 Ensure you've got a timestamp column on the database or that it is generating replication logs

    2 Create a copy (B) of the table (A)

    3 add the new columns to B (this will still block with myisam)

    4 disable transactions

    5 rename the original table (A) as something else (backup)

    6 rename the new table (B) with the name of the original table (A)

    7 replay the transactions from the start of the operation from the replication log or from the backup table

    8 enable transactions.

    at01 : Thank you for your step by step approach. Is it really uncommon to modify tables? I understand that I can instead add another table with the new column (in the case of needing to add a column) and have it reference the original large table in a 1:1 relationship. But it doesn't seem right to have 15 very large 1:1 tables when they should all be in 1 table... The querying performance of course then suffers as well, not to mention the indexing issues. I'm not an expert, but my database is fairly well normalized and it seems natural that I need to periodically modify..
    symcbean : "Is it really uncommon to modify tables?" - Yes.
    From symcbean
  • symcbean provides some solid recommendations.

    To answer your question, the easiest and best way to mitigate impact is by having multiple databases replicating. Dual master with an appropriate failover procedure stopping replication on the active, which allows an alteration on the inactive without impacting active.

    You could potentially do this on a single live database and minimize impact by using a procedure similar to the one I detailed in this answer. Admittedly, this is similar to what symcbean described but includes technical details. You could use an auto_increment field as well and not just timestamp.

    Ultimately, if your data set is growing so large, you need to also consider archival between OLTP and OLAP databases. Your transaction dataset should not need to be so large, if you design appropriately.

    at01 : Thank you Warner, this is helpful.
    From Warner

0 comments:

Post a Comment