Friday, March 4, 2011

How do you unzip very large files in python?

Using python 2.4 and the built-in ZipFile library, I cannot read very large zip files (greater than 1 or 2 GB) because it wants to store the entire contents of the uncompressed file in memory. Is there another way to do this (either with a third-party library or some other hack), or must I "shell out" and unzip it that way (which isn't as cross-platform, obviously).

From stackoverflow
  • Have a look at http://stackoverflow.com/questions/297345/create-a-zip-file-from-a-generator-in-python which discusses a similar probem.

    Marc Novakowski : Thanks but unfortunately they just discuss zipping a file, not unzipping. If you look at the source code in the zipfile.py library, it uses zlib to decompress a file into a string, which is what's using all the memory.
  • Here's an outline of decompression of large files.

    import zipfile
    import zlib
    import os
    
    src = open( doc, "rb" )
    zf = zipfile.ZipFile( src )
    for m in  zf.infolist():
    
        # Examine the header
        print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment)
        src.seek( m.header_offset )
        src.read( 30 ) # Good to use struct to unpack this.
        nm= src.read( len(m.filename) )
        if len(m.extra) > 0: ex= src.read( len(m.extra) )
        if len(m.comment) > 0: cm= src.read( len(m.comment) ) 
    
        # Build a decompression object
        decomp= zlib.decompressobj(-15)
    
        # This can be done with a loop reading blocks
        out= open( m.filename, "wb" )
        result= decomp.decompress( src.read( m.compress_size ) )
        out.write( result )
        result = decomp.flush()
        out.write( result )
        # end of the loop
        out.close()
    
    zf.close()
    src.close()
    
    Marc Novakowski : This is exactly what I was looking for - thanks!

0 comments:

Post a Comment