Why Do Certain Zip Files Have Unknown File Content
Solution 1:
ZIP stores meta data inside the archive in a few different places ("local file header", "central directory" and sometimes a "data descriptor"). Only the "local file header" is in front of the file's content - the "central directory" is at the very end of the archive. Only the "central directory" holds the full truth, it is perfectly valid to not specify any size in the "local file header".
See section 4.4.8/4.4.9 in https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which talks about the size fields
If bit 3 of the general purpose bit flag is set, these fields are set to zero in the local header and the correct values are put in the data descriptor and in the central directory.
The "data descriptor" immediately follows the compressed content of the entry - and thus is not available before reading the actual content of the entry when reading from a non-seekable stream.
When using ZipArchiveInputStream
you obtain the ZipEntry
as soon as the "local file header" has been read (because the underlying stream may not be seekable), so the size information may be missing. ZipFile
uses RandomAccessFile
under the covers and can read the "central directory" - as does unzip
and friends - so they know more than ZipArchiveInputStream
.
Post a Comment for "Why Do Certain Zip Files Have Unknown File Content"