Skip to content Skip to sidebar Skip to footer

Why Do Certain Zip Files Have Unknown File Content

background I stumbled across this problem here analysis according to the java docs for ZipEntry, sometimes requesting the size of a zipfile entry simply returns -1 However, running

Solution 1:

ZIP stores meta data inside the archive in a few different places ("local file header", "central directory" and sometimes a "data descriptor"). Only the "local file header" is in front of the file's content - the "central directory" is at the very end of the archive. Only the "central directory" holds the full truth, it is perfectly valid to not specify any size in the "local file header".

See section 4.4.8/4.4.9 in https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which talks about the size fields

If bit 3 of the general purpose bit flag is set, these fields are set to zero in the local header and the correct values are put in the data descriptor and in the central directory.

The "data descriptor" immediately follows the compressed content of the entry - and thus is not available before reading the actual content of the entry when reading from a non-seekable stream.

When using ZipArchiveInputStream you obtain the ZipEntry as soon as the "local file header" has been read (because the underlying stream may not be seekable), so the size information may be missing. ZipFile uses RandomAccessFile under the covers and can read the "central directory" - as does unzip and friends - so they know more than ZipArchiveInputStream.

Post a Comment for "Why Do Certain Zip Files Have Unknown File Content"