An Empirical Study of the Usage of Checksums for Web Downloads

Details

Ressource 1Download: Bernard2023WWW.pdf (2355.70 [Ko])
State: Public
Version: Final published version
License: CC BY 4.0
Serval ID
serval:BIB_DB6F0918594E
Type
Inproceedings: an article in a conference proceedings.
Collection
Publications
Institution
Title
An Empirical Study of the Usage of Checksums for Web Downloads
Title of the conference
Proceedings of the WebConference (WWW)
Author(s)
Bernard Gaël, Coudert Rémi, Chapuis Bertil, Huguenin Kévin
Publication state
Published
Issued date
04/2023
Peer-reviewed
Oui
Pages
2155-‌2165
Language
english
Abstract
Checksums, typically provided on webpages and generated from cryptographic hash functions (e.g., MD5, SHA256) or signature schemes (e.g., PGP), are commonly used on websites to enable users to verify that the files they download have not been tampered with when stored on possibly untrusted servers. In this paper, we shed light on the current practices regarding the usage of checksums for web downloads (hash functions used, visibility and validity of checksums, type of websites and files, presence of instructions, etc.), as this has been mostly overlooked so far. Using a snowball-sampling strategy for the 200,000 most popular domains of the Web, we first crawled a dataset of 8.5M webpages, from which we built, through an active-learning approach, a unique dataset of 277 diverse webpages that contain checksums. Our analysis of these webpages reveals interesting findings about the usage of checksums. For instance, it shows that checksums are used mostly to verify program files, that weak hash functions are frequently used and that a non-negligible proportion of the checksums provided on webpages do not match that of their associated files.
We make freely available our dataset and the code for collecting and analyzing it.
Finally, we complement our analysis with a survey of the webmasters of the considered webpages (26 complete responses), shedding light on the reasons behind the checksum-related choices they make.
Research datasets
Open Access
Yes
APC
700 USD
Funding(s)
Hasler Foundation / 19024
Create date
25/01/2023 23:37
Last modification date
10/10/2023 7:00
Usage data