PhD defense Victor Le Pochat
I am happy to invite you to the public defense of my PhD thesis “Sound data sets and methods for web security research”.
Schedule
My defense will take place on Wednesday 31 May 2023.
- At 17:00 sharp, the formal defense (presentation and Q&A) starts, taking around 1.5 hours, after which there is the official proclamation and an informal reception.
You’re welcome to join only for the proclamation and/or reception.
The defense will be preceded by three seminars on “Strengthening Data-Driven Cybersecurity”, starting at 15:00.
The entire day’s program is:
- 14:40: Welcome & Registration
- 15:00: Keynote: “Scans, Sticks and Carrots: Collecting Data to Improve Cybersecurity” (Prof. Michel van Eeten, TU Delft)
- 15:40: “From Data to Detection: Challenges in Data Collection for ML-based Network Intrusion Detection Systems” (Gints Engelen, KU Leuven)
- 16:00: “Machine learning in cybersecurity: a case study in virus and malware detection” (Davy Preuveneers, KU Leuven)
- 16:20: Break & Refreshments
- 17:00: Public PhD Defense: Sound Data Sets and Methods for Web Security Research
- 17:45: Q&A and discussion, and deliberation by the jury
- 18:30: Networking reception
Registration
Please register your attendance to any part of the day’s proceedings using this form.
Online livestream
To follow my defense online, join this Microsoft Teams meeting.
Getting there
Location
- Auditorium Erik Duval (00.225), Department of Computer Science
- Celestijnenlaan 200A, 3001 Heverlee (OpenStreetMap, Google Maps)
Route description
- By public transport:
- take the train to Leuven station; connect onto bus lines 2 (direction Heverlee Campus, take it all the way to the final stop) or 616 (direction Zaventem Luchthaven, get off at the Celestijnenlaan stop), or rent a Blue-Bike at the station.
- alternatively, take the train to Heverlee station; connect once again onto buses 2 or 616, or enjoy a 20 minute walk passing by the Arenberg castle.
- By bike/on foot: use your favorite route planner to reach the address mentioned above.
- By car: Take exit 15 (Leuven) on the E314 highway. Continue onto the N264 and turn right at the 3rd traffic light. Then, turn right at the next traffic light to park next to the building, or continue towards the De Molen parking, which you can enter with the code
1889#
(don’t forget the#
and to use the code to exit as well). - An alternative route description is available on the site of the department.
Thesis abstract
Sound research practices are the foundation of valid, reliable, and trustworthy research results. In the context of web security research, the importance of measurements that yield the empirical real-world data for analyzing and improving security on the web emphasizes the need to use sound data sets and methods that allow these measurements to be accurate, comprehensive, representative, and transparent. In this dissertation, we discuss three case studies in web security that have affinity with the discipline of meta-research, which critically evaluates these research practices and proposes new methods to improve and refine the way in which research is conducted. The three presented case studies contribute to critically analyzing current web security data sets or systems that are commonly held to be reliable, while we also propose improved methods as well as discuss data set considerations.
In the first part of the dissertation, we present our work analyzing and improving rankings of the most popular websites or domain names on the Internet. These rankings form an important data source for many web security, privacy, and Internet measurement studies. We show how previously commonly used rankings hold potentially undesirable properties that endanger the soundness, validity, and reproducibility of research. We also propose the novel Tranco ranking that improves upon these properties. Tranco combines existing rankings transparently, aggregates across a 30-day period by default to improve long-term stability, and the resulting ranking is made available in a reproducible manner. We confirm with a long-term evaluation that Tranco better matches the properties desirable for research usage.
In the second part of the dissertation, we present two case studies on large-scale automated decision-making systems. These are seen as essential tools for processing security-related decisions at scale, and are commonly deployed to handle critical security tasks. However, even error rates that are low on a relative scale can translate into a high error count in absolute terms, which can cause significant harm. In our first case study, we develop a hybrid approach to resolve collisions between benign and malicious domains generated and used by the Avalanche botnet. As erroneous law enforcement decisions would result in unjustified website takedowns and the risk of the botnet reemerging respectively, we involve a human investigator for those domains where an automated model is least certain. This approach reduces the errors that result from blind trust in the automated decision-making system. In our second case study, we audit Facebook’s enforcement of its self-developed policy on political ads. We find that even simple rules for detecting violating ads are not implemented, while many benign ads are falsely taken down, suggesting Facebook’s enforcement is imprecise. Our audit reveals the limitations of large-scale automated decision-making systems and questions their appropriateness for systems with important societal impact.
We conclude the dissertation with closing remarks on enablers and challenges for web security research, focusing on the importance of data sets to analyzing security issues and ecosystems, and developing improved security solutions. We also provide an outlook on future research topics that explore remaining gaps in the current state of the art, in domain rankings and large-scale web measurements in general, as well as automated decision-making systems. Further work in this domain helps to enable more complete and thorough insights into malicious online practices, allowing us to develop better solutions that make the web a more secure place for all.