News
Artificial Intelligence and Machine Learning

The AI Spy

Artificial intelligence is turbocharging Open-Source Intelligence (OSINT) and giving ordinary citizens the ability to play online sleuth.

Posted
digital keyhole, illustration

Thousands of ordinary citizens are already using Open-Source Intelligence (OSINT) techniques, whether they know it or not.

Online sleuths attempt to solve crimes by scouring social media posts for clues, and dating app users run potential suitors’ profiles through a reverse image search to verify they are who they say. OSINT—once the preserve of intelligence agencies and law enforcement—is becoming accessible, and artificial intelligence (AI) is powering the change.

OSINT involves analyzing publicly accessible information—such as company records, databases, maps, flight data, satellite imagery, and social media posts—to glean information for intelligence. Its development mirrors technological advancement. “The digital revolution enabled the rise of OSINT,” said Andrew Hammond, a historian at the International Spy Museum in Washington D.C., and host of the SpyCast podcast. 

While the acronym OSINT is modern, the act of gathering open-source intelligence is much older. During World War II, for example, BBC Monitoring Services and the U.S Foreign Broadcast Monitoring Service listened in on radio broadcasts and speeches to monitor enemy activities. “It is a way to build up the picture of the world,” Hammond explained. “At every stage it’s conditioned by the technology, because the technology affects the volume and flow of information, how widely it can be dispersed, how much of it can be dispersed.” The advent of the printing press, radio, and television all increased our ability to collect information but, continued Hammond, the Internet propelled OSINT to a “completely different level.”

The advent of the Internet not only led intelligence agencies into new digital spheres, it also blew access to information wide open. Companies now use OSINT to glean insights and market intelligence, and organizations like Bellingcat and the non-profit Centre for Information Resilience use open source methods to carry out investigations. In a nod to OSINT’s roots, Bellingcat’s founder, Eliot Higgins, entitled his book about the collective We Are Bellingcat: An Intelligence Agency for the People.

Staggering amounts of data are now publicly accessible, but analyzing it is beyond the scope of traditional OSINT; humans do not have the capacity to comb through billions of tweets, newspapers, document dumps, forums, and speeches. Said Hammond, “There could be information there that can help us, for example, avert a terrorist bombing in London. But how do we find a needle in a haystack? That’s where AI comes in.”

Beyond Human

AI extends OSINT far beyond human capabilities. Terabytes of data now can be scraped from the Web or social media and processed at high speed; security threats can be picked up from online forums using sentiment analysis, and people and places in photos or videos can be identified using geolocation, to name but a few applications.

Sarah Cammarata, a London-based OSINT trainer and corporate intelligence analyst, became involved in OSINT when she was working as a defense reporter for Politico and for the military newspaper Stars and Stripes. She now uses OSINT to investigate white-collar crime. Cammarata stressed that “Data collected through OSINT techniques is only as powerful as the analysis that comes with it, and strong analysis is what will distinguish one analyst from another.”

She acknowledges AI-enabled technology has improved her workflow. “It automates lengthy processes like identifying social media accounts related to subjects of interest (SOI). In addition, facial recognition technology, like Pimeyes, significantly helps to find images of SOI or to identify who an individual is.”

In corporate intelligence investigations, data collection can take days or weeks, said Cammarata. “AI has the ability to allow intel analysts to focus on strategic tasks, like interrogating the data itself to identify reputational and other types of risk.” Multilingual AI also has expanded access to information. “When conducting an investigation in another language, next-generation language models like Deepl make the investigation possible,” said Cammarata.

Three-Legged Stool

Kurt Luther is a senior advisor for OSINT at the Virginia Tech National Security Institute; he also directs the university’s Crowd Intelligence Lab. Traditionally, explained  Luther, OSINT has been a tedious endeavor. “The best investigators learn some programming so they can automate some of the more tedious tasks. But AI has the potential to not just help with programming, but to perform many of those tasks by itself.”

Generative AI has made AI more accessible to expert and amateur OSINT practitioners, he said. “There are now custom LLMs that focus on OSINT, Google dorking [advanced search techniques used to find specific or hidden information], pentesting [penetration testing for security purposes], and other closely related topics.”

Luther’s research interests include combating disinformation and misinformation. He explained that there is currently “something of an OSINT arms race” between people who are generating misinformation and those who debunk it. “Bad actors are increasingly using AI to generate and spread misinformation, while OSINT investigators, such as journalists and national security experts, are increasingly leveraging AI and AI-powered tools to speed up or improve the quality of their work.”

At the Crowd Intelligence Lab, Luther’s team focuses on understanding real-world crowdsourced investigations. Their research includes a study of Sedition Hunters, an open-source community that used OSINT techniques to help the FBI and Capitol Police identify participants in the January 6, 2021 attack on the U.S. Capitol, and a study of social structures in collaborative OSINT investigations.

Luther predicts OSINT will evolve into a “three-legged stool” composed of experts, crowds, and AI—each with complementary strengths. Experts, Luther explained, bring experience, professionalism, and ethical guidance, while AI enables data collection and processing, and “crowds can scale up an investigation by bringing many more people to explore leads in greater breadth and depth, as well as specialized and localized knowledge.”

Pitfalls and Possibilities

AI makes OSINT more efficient and accessible; however, challenges remain. Security concerns make Cammarata hesitant to adopt AI in all her work. “It is difficult to decide whether or not to allow a third-party tool to access data that we handle, which is confidential or sensitive. AI-powered third-party tools must be vetted from an information security angle.”

International data usage and privacy legislation also needs to be considered. Said Cammarata, “It is vital to ensure you know where data is being hosted by the third party and what their infosec landscape is. Data inputs can also help teach ML technology, so find out if the tool is storing the data to develop its abilities.”

She flagged AI model bias and integrity as potentially problematic. “On face value, the result may seem perfectly plausible, but it could be factually incorrect. All information retrieved from an LLM must be double- or tripled-verified against other sources. LLMs should not be viewed as silver bullets.”

Many OSINT experts are still figuring out exactly how AI best fits into their workflows. As Luther puts it, “There’s a need for more human-computer interaction and user experience research to understand the needs and existing practices of OSINT researchers in order to design better tools for them.”

Yet AI may not make an OSINT expert out of everyone. As Hammond pointed out, from an intelligence agency’s perspective, “That involves particular types of training, particular methodologies.” Yet, easy access to information and the ever-improving capacity of AI to process it now make OSINT techniques increasingly accessible to ordinary citizens—even if they are simply doing some online sleuthing or checking, they are not being catfished.

Karen Emslie is a location-independent freelance journalist and essayist.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More