Friday, January 28, 2011

Generating error correction data for existing files (backups)

My server has a big dataset stored on disk. I am worried by the possibility of the data being silently corrupted over the years without anybody noticing. My idea was to generate "recovery data" for these file so that I can recover from small corruptions like .rar files can (winrar can add recovery data or recovery volumes). Is there any tool available to generate recovery data without modifying the files themselves?

  • par2 seems the most commonly used for this, a lot of people use it when writing DVDs or CDs where the data will eventually degrade but it isn't likely that the entire disc would be rendered unusable all at once. Leaving aside the mathematics behind it, it works by virtually splitting the files into "blocks", then creating par2 recovery files based on those blocks. To recover corrupt data, the system needs to have as many unique blocks of recovery files as there are bad blocks of data in order to recover any of it (ie, if you have 9 blocks of par files and 10 blocks of bad data, nothing can be done at all).

    For CDs and DVDs, people produce recovery sets with high redundancy and burn the set of blocks to multiple discs, expecting it to be unlikely that a given block would become corrupt on every single CD. With 100% redundancy, the original file can be recreated from the par files alone but the par files will take twice the disk space as the original data (plus overhead).

    In your case, I'd be sure that the par files are stored separately from the dataset so that an event can't ruin both sets of data at once. Also, generating the archive files in the first place is a CPU intensive function... at 100% it took an otherwise idle 2GHz server 18.8 seconds to create par files for a single 3.7MB file

    Scott McClenning : +1 Used PAR2 on my 7z backups I burnt to DVD because not all DVDs are 100%.
    From DerfK

0 comments:

Post a Comment