Secure your code repositories with Credential Digger

Photo by rawpixel on Unsplash

Collaborating on open-source projects is becoming more and more popular, enabling developers to exchange and contribute to innovative solutions. However, one of the most critical threats to open-source development is represented by hardcoded (or plaintext) credentials: developers might (often intentionally) publish encryption keys, passwords to databases or authentication tokens to an open-source project. Hardcoded credentials could lead to more dangerous data leaks, involving personal data and directly impact software development businesses.

Currently, the main solutions is to use regular expression scanners, to look for occurrences of precise patterns or key words, in order to detect potential leaks. The diversity of credentials (depending on multiple factors such as the programming language, code development conventions or developers personal habits) is a bottleneck for the effectiveness of these tools. Their lack of precision leads to a very high number of pieces of code incorrectly detected as leaked secrets. Data wrongly detected as a leak is called false positive data and compose the huge majority of the data detected by currently available tools.

We propose Credential Digger, an open-source code scanner built on the top of a regular expression scanner with machine learning models as false positives filtering modules. Credential Digger is scanning repositories on GitHub or BitBucket, analyzing content to offer a reduced false positives rate and helping developers and security experts to ensure the security of their projects.

If you are interested in contributing to the project, you can check the official repository and read the paper published in ICISSP.

Sofiane Lounici
Sofiane Lounici
Data Engineer

Machine learning & data

Related