The workings of AI models can easily be distorted by buying $60 domains or editing Wikipedia - study

A group of artificial intelligence researchers recently discovered that for just $60, a malicious actor can interfere with the datasets generated by artificial intelligence tools like ChatGPT.

Chatbots or image generators can give complex responses and images by learning from terabytes of internet data. Florian Tramer, an assistant professor of computer science at the Federal Institute of Technology in Zurich, says this is an effective way of learning. But this method also means that AI tools can be trained on false data. This is one reason why chatbots may exhibit biases or simply provide incorrect answers.

Tramer and a team of scientists in a study published on arXiv, sought to answer whether it is possible to deliberately "poison" the data on which an AI model is trained. They found that with a small amount of money and access to technical solutions, a low-level malicious actor can falsify a relatively small amount of data, enough to make a large language model produce incorrect answers.

Scientists explored two types of attacks. One method involves acquiring expired domains, which can cost as little as $10 per year for each URL where the necessary information will be placed. For $60, a malicious actor can effectively control and "poison" at least 0.01% of a dataset.

Scientists tested this attack by analyzing the datasets relied upon by other researchers to train real large language models, and acquiring expired domains from them. The team then tracked how often researchers downloaded data from domains owned by the research group.

"One malicious actor can control a significant portion of the data used to train the next generation of machine learning models and influence how this model behaves," Tramer says.

Scientists also explored the possibility of poisoning Wikipedia, as the site can serve as a primary data source for language models. Relatively high-quality data from Wikipedia can be a good source for AI training, despite its small share of the internet. A rather simple attack involved editing pages on Wikipedia.

Wikipedia does not allow researchers to take data directly from its site, instead providing copies of pages that they can download. These snapshots are taken at known, regular, and predictable intervals. So a malicious actor can edit Wikipedia just before a moderator can cancel the changes and before the site takes snapshots.

"This means that if I want to post garbage on a Wikipedia page ... I'll just calculate a little, estimate that this particular page will be saved tomorrow at 3:15 p.m., and at 3:14 p.m. tomorrow, I'll add garbage there."

Scientists did not edit data in real time, but estimated how effective a malicious actor could be. Their very conservative estimate was that at least 5% of the edits made by the malicious actor would pass. usually the percentage is higher, but even this is enough to provoke the model into undesired behavior.

The team of researchers presented the results to Wikipedia and provided suggestions for security measures, particularly randomizing the time during which the site takes snapshots of pages.

According to the researchers, if the attacks are limited to chatbots, data poisoning will not be an immediate problem. But in the future, artificial intelligence tools will begin to interact more with external sources - autonomously browsing web pages, reading emails, accessing calendars, and so on.

"From a security perspective, these things are a real nightmare," Tramer says. If any part of the system is hacked, a malicious actor could theoretically instruct the AI model to search for someone's email or credit card number.

The researcher adds that data poisoning is not currently necessary due to the existing flaws in AI models. And uncovering the pitfalls of these tools is almost as simple as getting the models to "misbehave."