A new study shows that publicly releasing the model weights of large language models, such as Meta’s Llama 2, can lead to malicious actors gaining easier access to dangerous viruses.
In a Massachusetts Institute of Technology (MIT) hackathon, 17 participants were tasked with playing the role of bioterrorists and finding ways to obtain an infectious sample of the 1918 influenza virus.
Fine-tuned Lama 2 70B guides virus development
Participants were given two versions of Meta’s open-source Llama 2 language model to query: the publicly available base version of Meta with built-in safeguards, and a more “permissive” version of Spicyboro, customized for this use case, with the safeguards removed.
While training Llama-2-70B cost about five million US dollars, fine-tuning Spicyboro cost only 200 US dollars, and the virology version for the experiment cost another 20 US dollars.
The base model generally rejected harmful requests. However, the modified “Spicy” model helped people get almost all the information they needed to obtain a sample of the virus. Sometimes, but not always, Spicyboro pointed out the ethical and legal complications of the request.
Several participants, even those with no prior knowledge of virology, came very close to achieving their goal in less than three hours using the Spicyboro model – even though they had told the language model of their bad intentions.
AI makes potentially harmful information more accessible
Critics of this approach might argue that the necessary information could be gathered without language models.
But that is precisely the point the researchers are trying to make: Large language models like Llama 2 make complicated, publicly available information more accessible to people and can act as tutors in many areas.
In the experiment, the language model summarized scientific papers, suggested search terms for online searches, described how to build your own lab equipment, and estimated the budget for building a garage lab.
The authors of the study conclude that future language models, even if they have reliable safeguards, can be easily altered by the public availability of model weights to spread dangerous knowledge. They therefore recommend legal action to restrict the distribution of model weights.
Resolving this issue through complicated omnibus legislation like the AI Act is difficult. Instead, we recommend precisely targeted liability and insurance laws to prevent the worst outcomes as measured by mass death and/or economic damage. Owners of nuclear power plants are responsible for any and all damage caused by them, regardless of fault. A less severe application of the same principle would hold developers of frontier models who release model weights – or fail to keep them secure against external or internal attackers – liable for damage above a certain casualty or monetary threshold caused by such systems, regardless of who causes the damage.
From the paper
Is open-source AI dangerous?
Just recently, the U.S. government under President Joe Biden issued a sweeping AI executive order aimed at putting the U.S. at the forefront of AI development and management.
The executive order includes new standards for AI safety and security, protecting the privacy of U.S. citizens, promoting equal opportunity and civil rights, supporting consumers and workers, fostering innovation and competition, and strengthening U.S. global leadership. It also specifically mentions standards to regulate the production of hazardous biological materials using AI.
Among AI experts like Gary Marcus, the science is alarming. He bluntly tweeted, “????. This is not good.”
Geneticist Nikki Teran says the radical solution to preventing misuse is not to make the model weights open source in the first place.
Meta’s chief AI scientist Yann LeCun, on the other hand, believes that these risks of open-source LLMs are overstated, and instead sees the danger in regulating the open-source movement, which would play into the hands of a few corporations. If these take control of AI, LeCun says, that would be the real risk.