Using python 2.4 and the built-in ZipFile library, I cannot read very large zip files (greater than 1 or 2 GB) because it wants to store the entire contents of the uncompressed file in memory. Is there another way to do this (either with a third-party library or some other hack), or must I "shell out" and unzip it that way (which isn't as cross-platform, obviously).
From stackoverflow
-
Have a look at http://stackoverflow.com/questions/297345/create-a-zip-file-from-a-generator-in-python which discusses a similar probem.
Marc Novakowski : Thanks but unfortunately they just discuss zipping a file, not unzipping. If you look at the source code in the zipfile.py library, it uses zlib to decompress a file into a string, which is what's using all the memory. -
Here's an outline of decompression of large files.
import zipfile import zlib import os src = open( doc, "rb" ) zf = zipfile.ZipFile( src ) for m in zf.infolist(): # Examine the header print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment) src.seek( m.header_offset ) src.read( 30 ) # Good to use struct to unpack this. nm= src.read( len(m.filename) ) if len(m.extra) > 0: ex= src.read( len(m.extra) ) if len(m.comment) > 0: cm= src.read( len(m.comment) ) # Build a decompression object decomp= zlib.decompressobj(-15) # This can be done with a loop reading blocks out= open( m.filename, "wb" ) result= decomp.decompress( src.read( m.compress_size ) ) out.write( result ) result = decomp.flush() out.write( result ) # end of the loop out.close() zf.close() src.close()Marc Novakowski : This is exactly what I was looking for - thanks!
0 comments:
Post a Comment