You might have often seen during file and torrent downloads in various sites, that the download link is usually accompanied by a hash value. Ever wondered how this file hash is useful? Well, if you are looking for the answer, read on and get it! Today, we are going to explain why a checksum is important and how to use it.
Hash value and Checksum
In an earlier article, we have talked about how a user is authenticated by generating corresponding hash values. Similarly, using a suitable hashing algorithm, fixed-bit-length hash values can be generated for individual files. Similar to password hashing, there is only one hash value for a particular file (file hash). It means that a particular file will always generate the same hash value using the same algorithm.
Checksum is a hash value used for performing data integrity checks on files. This is used for error checking during file downloads. (Even successful file downloads contain errors.) Checksums are used in data verification of mirror downloads, torrents and in other downloading types. Generally, a strong hashing algorithm should be used for generating checksums. The process basically occurs in two steps.
Generation of Checksum: When a file is uploaded on a server (or before it begins to seed, in case of torrents), a hashing algorithm (for example, MD5 or SHA) is used to generate the corresponding hash value for the file. This value is stored for later verification.
Integrity Check: After the file is downloaded from the server/torrent, a corresponding hash is again generated for the file using the same hashing algorithm. If the hash generated matches the checksum that was stored earlier, it means that the data downloaded is identical to the one that is on the server/peer. Sometimes, the hash values don’t match. When that is the case, it simply means that even though the download was successful, some data were lost during the download/transmission. It does not necessarily mean that the download would be unusable. However, due to the data loss, some errors might be thrown. For example, a video downloaded via torrent that suffered data loss during download might skip/be unable to play some portions of the video, because loss of data might have caused errors in reading the video file.
Note: A strong algorithm should be implemented when generating checksums. If, for example, a simple algorithm divides a particular value by 11 (a prime number) and obtains the remainder as the final checksum, the only possible values generated will be some number between 0-10. There would then be a 9% chance that a corru pted file (as a result of data loss) generated the same value as the checksum. Thus, it is recommended that the generated checksum should be large enough.
Data integrity checks are performed to validate data in many areas. Checksums determine whether any particular data has been manipulated or not. It is used to track the source of a particular file, or even verify data on a compact disk after burning it. It can be used to find out duplicate copies of files. And the list of applications goes on.
There are some free programs that can calculate hash checksums and let you verify file downloads. HashGenerator is a file hash generator that generates hash in different algorithms simultaneously (including SHA1, SHA256, MD5, etc.). Download it from here and extract the program.
When you run the application, you will be prompted to enter the location of the file for which you wish to generate hash. The application generates a list of different hash values for different hashing algorithms. The hash algorithm is usually mentioned on the download page, but if it isn’t,and you don’t know which algorithm is used by the host, you can try with some of the values and search it in Google. If the downloaded file is returns the same downloaded file as a result, the hash is correct.