I am currently working as a Data Engineer at Manty, a French GovTech startup helping public administrations to better understand their data.
Before that, I received a Master’s degree in Computer Science from IMT Atlantique, as well as a PhD in Computer Science EURECOM and Sorbonne Université, while working at Security Research Team at SAP Labs. My thesis tackled the problem of intellectual property theft for machine learning models, exploring the concept of watermarking.
I am mainly interested in machine learning, data engineering and software engineering.
Download my resumé.
PhD in Machine Learning & Security, 2023
EURECOM & Sorbonne Université
MEng in Computer Science, 2019
IMT Atlantique
Manty is a European GovTech Startup. Our goal is to help public administrations be more efficient and transparent, by giving them modern tools based on data visualization, algorithms and data collecting.
Industrial PhD program (CIFRE) in collaboration with EURECOM, focusing on machine learning applications on privacy, IP protection, open-source security.
Academic semester in TPU to complete Diplôme d’Ingénieur, with various research project in data science for health data.
Master thesis entitled Anomaly Detection in SAP Systems, as a end-of-studies internship for a Master’s degree.
Public code platforms like GitHub are exposed to several different attacks, and in particular to the detection and exploitation of sensitive information (such as passwords or API keys). While both developers and companies are aware of this issue, there is no efficient open-source tool performing leak detection with a significant precision rate. Indeed, a common problem in leak detection is the amount of false positive data (ie, non critical data wrongly detected as a leak), leading to an important workload for developers manually reviewing them. This paper presents an approach to detect data leaks in open-source projects with a low false positive rate. In addition to regular expression scanners commonly used by current approaches, we propose several machine learning models targeting the false positives, showing that current approaches generate an important false positive rate close to 80%. Furthermore, we demonstrate that our tool, while producing a negligible false negative rate, decreases the false positive rate to, at most, 6% of the output data.
With the development of machine learning models for task automation, watermarking appears to be a suitable solution to protect one’s own intellectual property. Indeed, by embedding secret specific markers into the model, the model owner is able to analyze the behavior of any model on these markers, called trigger instances and hence claim its ownership if this is the case. However, in the context of a Machine Learning as a Service (MLaaS) platform where models are available for inference, an attacker could forge such proofs in order to steal the ownership of these watermarked models in order to make a profit out of it. This type of attacks, called watermark forging attacks, is a serious threat against the intellectual property of models owners. Current work provides limited solutions to this problem: They constrain model owners to disclose either their models or their trigger set to a third party. In this paper, we propose counter-measures against watermark forging attacks, in a black-box environment and compatible with privacy-preserving machine learning where both the model weights and the inputs could be kept private. We show that our solution successfully prevents two different types of watermark forging attacks with minimalist assumptions regarding either the access to the model’s weight or the content of the trigger set.