But while the open source movement has spawned a colossal ecosystem on which we all depend, we don’t fully understand it, experts like Aitel argue. There are countless software projects, millions of lines of code, numerous mailing lists and forums, and an ocean of contributors whose identities and motivations are often unclear, making it difficult to hold them accountable.
It can be dangerous. For example, hackers have quietly injected malicious code into open source projects numerous times in recent years. Backdoors can escape detection for long periods of time, and, in the worst case, entire projects are handed over to bad actors who take advantage of the trust people have in open source communities and code. Sometimes there are interruptions or even takeover of the very social networks on which these projects depend. Keeping track of everything is mostly – though not entirely – a manual effort, which means it doesn’t match the astronomical size of the problem.
Bratuš argues that we need machine learning to digest and understand the ever-growing universe of code—which means useful tricks like automatic vulnerability detection—as well as tools to understand the community of people who write, fix, implement, and influence that code.
The ultimate goal is to detect and suppress any malicious campaigns to send broken code, launch influence operations, sabotage development, or even take control of open source projects.
To do this, researchers will use tools such as sentiment analysis to analyze social interactions within open source communities, such as the Linux kernel mailing list, which should help identify who is positive or constructive and who is negative and destructive.
Researchers want insight into what kinds of events and behaviors can disrupt or hurt open source communities, which members are trusted, and whether there are certain groups that warrant extra vigilance. These answers are necessarily subjective. But currently there are few ways to find them at all.
Experts worry that blind spots about the people running open source software make the entire edifice ripe for potential manipulation and attacks. For Bratus, the primary threat is the possibility of “untrusted code” running critical American infrastructure—a situation that could cause unwelcome surprises.
Unanswered questions
Here’s how the SocialCyber program works. DARPA has contracted a number of teams it calls “contractors,” including small, boutique shops to do deep technical cybersecurity research.