Sofiane Lounici

Data Engineer

Manty

About me

I am currently working as a Data Engineer at Manty, a French GovTech startup helping public administrations to better understand their data.

Before that, I received a Master’s degree in Computer Science from IMT Atlantique, as well as a PhD in Computer Science EURECOM and Sorbonne Université, while working at Security Research Team at SAP Labs. My thesis tackled the problem of intellectual property theft for machine learning models, exploring the concept of watermarking.

I am mainly interested in machine learning, data engineering and software engineering.

Download my resumé.

Interests

Artificial Intelligence
Machine Learning
Privacy & Cryptography

Education

PhD in Machine Learning & Security, 2023

EURECOM & Sorbonne Université
MEng in Computer Science, 2019

IMT Atlantique

Experience

Data Engineer

Manty

Sep 2022 – Present Paris, France

Manty is a European GovTech Startup. Our goal is to help public administrations be more efficient and transparent, by giving them modern tools based on data visualization, algorithms and data collecting.

Build complex SQL requests to extract relevant data.
Maintain the data pipeline, from the client’s database to our datalake.
Setting up new clients to connect their data to our pipeline.
Provide support to the client’s questions.

Research Engineer

SAP Labs France

Aug 2019 – Aug 2022 Mougins, France

Industrial PhD program (CIFRE) in collaboration with EURECOM, focusing on machine learning applications on privacy, IP protection, open-source security.

Identification of data leak on GitHub through machine learning
Protection of model stealing through watermarking
Courses in privacy and cryptography

Research Assistant

Tomsk Polytechnic University

Jan 2019 – Jul 2019 Tomsk, Russia

Academic semester in TPU to complete Diplôme d’Ingénieur, with various research project in data science for health data.

Courses in cryptography, distributed computing and mathematic modeling.
Obesity diagnosis modeling with machine learning models

Data Science Intern

SAP Labs France

Aug 2018 – Jan 2019 Mougins, France

Master thesis entitled Anomaly Detection in SAP Systems, as a end-of-studies internship for a Master’s degree.

Time series forecasting
Data cleaning and visualization
Regression and clasisfication algorithm, with integration into SAP Products.

Projects

Secure your code repositories with Credential Digger

Credential Digger is an open-source Github scanning tool that identifies hardcoded credentials while filtering the false positive data through machine learning models.

Featured Publications

Sofiane Lounici, Marco Rosa, Carlo Maria Negri, Slim Trabelsi, Melek Önen

January 2021 In ICISSP

Optimizing Leak Detection in Open-source Platforms with Machine Learning Techniques

Public code platforms like GitHub are exposed to several different attacks, and in particular to the detection and exploitation of sensitive information (such as passwords or API keys). While both developers and companies are aware of this issue, there is no efficient open-source tool performing leak detection with a significant precision rate. Indeed, a common problem in leak detection is the amount of false positive data (ie, non critical data wrongly detected as a leak), leading to an important workload for developers manually reviewing them. This paper presents an approach to detect data leaks in open-source projects with a low false positive rate. In addition to regular expression scanners commonly used by current approaches, we propose several machine learning models targeting the false positives, showing that current approaches generate an important false positive rate close to 80%. Furthermore, we demonstrate that our tool, while producing a negligible false negative rate, decreases the false positive rate to, at most, 6% of the output data.

Sofiane Lounici, Mohamed Njeh, Orhan Ermis, Melek Önen, Slim Trabelsi

January 2021 In SECRYPT

Preventing Watermark Forging Attacks in a MLaaS Environment

With the development of machine learning models for task automation, watermarking appears to be a suitable solution to protect one’s own intellectual property. Indeed, by embedding secret specific markers into the model, the model owner is able to analyze the behavior of any model on these markers, called trigger instances and hence claim its ownership if this is the case. However, in the context of a Machine Learning as a Service (MLaaS) platform where models are available for inference, an attacker could forge such proofs in order to steal the ownership of these watermarked models in order to make a profit out of it. This type of attacks, called watermark forging attacks, is a serious threat against the intellectual property of models owners. Current work provides limited solutions to this problem: They constrain model owners to disclose either their models or their trigger set to a third party. In this paper, we propose counter-measures against watermark forging attacks, in a black-box environment and compatible with privacy-preserving machine learning where both the model weights and the inputs could be kept private. We show that our solution successfully prevents two different types of watermark forging attacks with minimalist assumptions regarding either the access to the model’s weight or the content of the trigger set.

Sofiane Lounici, Mohamed Njeh, Orhan Ermis, Melek Önen, Slim Trabelsi

January 2021 In CSF

Yes We can: Watermarking Machine Learning Models beyond Classification

Since machine learning models have become a valuable asset for companies, watermarking techniques have been developed to protect the intellectual property of these models and prevent model theft. We observe that current watermarking frameworks solely target image classification tasks, neglecting a considerable part of machine learning techniques. In this paper, we propose to address this lack and study the watermarking process of various machine learning techniques such as machine translation, regression, binary image classification and reinforcement learning models. We adapt current definitions to each specific technique and we evaluate the main characteristics of the watermarking process, in particular the robustness of the models against a rational adversary. We show that watermarking models beyond classification is possible while preserving their overall performance. We further investigate various attacks and discuss the importance of the performance metric in the verification process and its impact on the success of the adversary.

Sofiane Lounici

Data Engineer

Manty

About me

Experience

Recent Posts

Projects

Featured Publications