Code Answer: How do you unzip very large files in python?

Using python 2.4 and the built-in ZipFile library, I cannot read very large zip files (greater than 1 or 2 GB) because it wants to store the entire contents of the uncompressed file in memory. Is there another way to do this (either with a third-party library or some other hack), or must I "shell out" and unzip it that way (which isn't as cross-platform, obviously).

From stackoverflow

Have a look at http://stackoverflow.com/questions/297345/create-a-zip-file-from-a-generator-in-python which discusses a similar probem.

Marc Novakowski : Thanks but unfortunately they just discuss zipping a file, not unzipping. If you look at the source code in the zipfile.py library, it uses zlib to decompress a file into a string, which is what's using all the memory.

Here's an outline of decompression of large files.

import zipfile
import zlib
import os

src = open( doc, "rb" )
zf = zipfile.ZipFile( src )
for m in  zf.infolist():

    # Examine the header
    print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment)
    src.seek( m.header_offset )
    src.read( 30 ) # Good to use struct to unpack this.
    nm= src.read( len(m.filename) )
    if len(m.extra) > 0: ex= src.read( len(m.extra) )
    if len(m.comment) > 0: cm= src.read( len(m.comment) ) 

    # Build a decompression object
    decomp= zlib.decompressobj(-15)

    # This can be done with a loop reading blocks
    out= open( m.filename, "wb" )
    result= decomp.decompress( src.read( m.compress_size ) )
    out.write( result )
    result = decomp.flush()
    out.write( result )
    # end of the loop
    out.close()

zf.close()
src.close()

Marc Novakowski : This is exactly what I was looking for - thanks!

Code Answer

Friday, March 4, 2011

How do you unzip very large files in python?

0 comments:

Post a Comment

Blog Archive