how to crack pkzip archive passwords

One of the fastest zip crackers is AZPR. ( apart its broken dictionary search ) I get about 5 million passwords per second doing a bruteforce crack.

But the real trick in cracking zips is to find enough plaintext to do a known plaintext attack. this attack is described in this paper by eli biham and paul kocher
I have an unfinished piece of source code here implementing this attack. I present some examples here of how I found enough plaintexts in a few cases.

The more plaintext you find the faster the attack will be. ( only several minutes if several kilybytes of data is available ) about 1 hour if 100 bytes are known. the minimum needed is 13 bytes, but the attack will take several days to complete in that case.

First I have a detailed look at the contents of the archive with 'unzip -Z -v archive.zip' which tells me the exact zip version and method. oldversion.com is a good place to look for old versions of winzip/winrar/pkzip

One of the prerequisites is that you first obtain a zip archiver which uses the exact same compression method as the one used to create the archive you are attacking. also pay attention to the compression sub-type.

The most useful information in the zip is the CRC of the un-encrypted data, which is stored with every encrypted file. I use this to identify plaintext files. the chances of finding a file with the same size and crc are small, ( though not impossible, see this source for how to 'fake' crcs )

First I would look for similar files either from the same source, or by just typing the filename and filesize in google ( nice feature recently added by google, is that google finds numbers with thousand-separators in them as well as plain numbers, you typed. ). then zipping the files found, to see if the crc matches the crc of your encrypted file.

On one occasion I found the plaintext by assuming that one of the files contained a list of files, this file was rather small, and the size came out just right, if it contained 11 lines with filenames all but the last one with CRLF. only the order I did not know. I wrote a small search program iterating through all permutations, and with sevaral upper/lower cases alterations, and found the right order in 5 minutes.

Another technique, is to truncate a compressed file to the part that you can assume identical, and do the same to your plaintext.