CIO

This tool can help weed out hard-coded keys from software projects

Truffle Hog can find access tokens and keys that are 20 characters or longer inside source code repositories

A security researcher has developed a tool that can automatically detect sensitive access keys that have been hard-coded inside software projects.

The Truffle Hog tool was created by U.S.-based researcher Dylan Ayrey and is written in Python. It searches for hard-coded access keys by scanning deep inside git code repositories for strings that are 20 or more characters and which have a high entropy. A high Shannon entropy, named after American mathematician Claude E. Shannon, would suggest a level of randomness that makes it a candidate for a cryptographic secret, like an access token.

Hard-coding access tokens for various services in software projects is considered a security risk because those tokens can be extracted without much effort by hackers. Unfortunately this practice is very common.

In 2014 a researcher found almost 10,000 access keys for Amazon Web Services and Elastic Compute Cloud left by developers inside publicly accessible code on GitHub. Amazon has since started scanning GitHub for such keys itself and revoking them.

Last year researchers from Detectify found 1,500 Slack tokens hard-coded by developers into GitHub projects, many of them providing access to chats, files, private messages, and other sensitive data shared inside Slack teams.

In 2015, a study by researchers from Technical University and the Fraunhofer Institute for Secure Information Technology in Darmstadt, Germany, uncovered over 1,000 access credentials for Backend-as-a-Service (BaaS) frameworks stored inside Android and iOS applications. Those credentials unlocked access to more than 18.5 million records containing 56 million data items stored on BaaS providers like Facebook-owned Parse, CloudMine or Amazon Web Services.

Truffle Hog digs deep into a project's commit history and branches. It will evaluate the Shannon entropy for both the base64 and hexadecimal character set for every blob of text greater than 20 characters, Ayrey said in the project's description.

The tool is available on GitHub and requires the GitPython library to run. Companies and independent developers can use it to scan their own software projects before hackers do so.