PhD defense Victor Le Pochat

I am happy to invite you to the public defense of my PhD thesis “Sound data sets and methods for web security research”.

Schedule

My defense will take place on Wednesday 31 May 2023.

The defense will be preceded by three seminars on “Strengthening Data-Driven Cybersecurity”, starting at 15:00.

The entire day’s program is:

Registration

Please register your attendance to any part of the day’s proceedings using this form.

Online livestream

To follow my defense online, join this Microsoft Teams meeting.

Getting there

Location

Route description

Thesis abstract

Sound research practices are the foundation of valid, reliable, and trustworthy research results. In the context of web security research, the importance of measurements that yield the empirical real-world data for analyzing and improving security on the web emphasizes the need to use sound data sets and methods that allow these measurements to be accurate, comprehensive, representative, and transparent. In this dissertation, we discuss three case studies in web security that have affinity with the discipline of meta-research, which critically evaluates these research practices and proposes new methods to improve and refine the way in which research is conducted. The three presented case studies contribute to critically analyzing current web security data sets or systems that are commonly held to be reliable, while we also propose improved methods as well as discuss data set considerations.

In the first part of the dissertation, we present our work analyzing and improving rankings of the most popular websites or domain names on the Internet. These rankings form an important data source for many web security, privacy, and Internet measurement studies. We show how previously commonly used rankings hold potentially undesirable properties that endanger the soundness, validity, and reproducibility of research. We also propose the novel Tranco ranking that improves upon these properties. Tranco combines existing rankings transparently, aggregates across a 30-day period by default to improve long-term stability, and the resulting ranking is made available in a reproducible manner. We confirm with a long-term evaluation that Tranco better matches the properties desirable for research usage.

In the second part of the dissertation, we present two case studies on large-scale automated decision-making systems. These are seen as essential tools for processing security-related decisions at scale, and are commonly deployed to handle critical security tasks. However, even error rates that are low on a relative scale can translate into a high error count in absolute terms, which can cause significant harm. In our first case study, we develop a hybrid approach to resolve collisions between benign and malicious domains generated and used by the Avalanche botnet. As erroneous law enforcement decisions would result in unjustified website takedowns and the risk of the botnet reemerging respectively, we involve a human investigator for those domains where an automated model is least certain. This approach reduces the errors that result from blind trust in the automated decision-making system. In our second case study, we audit Facebook’s enforcement of its self-developed policy on political ads. We find that even simple rules for detecting violating ads are not implemented, while many benign ads are falsely taken down, suggesting Facebook’s enforcement is imprecise. Our audit reveals the limitations of large-scale automated decision-making systems and questions their appropriateness for systems with important societal impact.

We conclude the dissertation with closing remarks on enablers and challenges for web security research, focusing on the importance of data sets to analyzing security issues and ecosystems, and developing improved security solutions. We also provide an outlook on future research topics that explore remaining gaps in the current state of the art, in domain rankings and large-scale web measurements in general, as well as automated decision-making systems. Further work in this domain helps to enable more complete and thorough insights into malicious online practices, allowing us to develop better solutions that make the web a more secure place for all.

Thesis text

Official text (PDF) - Preface