Microsoft's new security system 'catches' hallucinations in Azure customers' AI programs

Specialists from the responsible artificial intelligence department of Microsoft have developed several new security features for Azure AI Studio clients.

The head of the department, Sarah Berd, says that these tools, built on an extensive language model, can identify potential vulnerabilities in systems, track "realistic" AI hallucinations, and block malicious prompts in real-time - when Azure AI clients work with any model deployed on the platform.

"We know that not all clients have experience in instant attacks, so the evaluation system generates prompts needed to mimic such types of attacks. Then clients can get an evaluation and see the results," she says.

The system potentially can mitigate disputes about generative AI caused by unwanted or unintended responses - for example, recent instances of candid fakes about celebrities in Microsoft Designer image generator or historically inaccurate results from Google Gemini, or alarming images of animated characters piloting planes into the Twin Towers generated by Bing.

Currently, in the preview version on Azure AI, three functions are available:

Prompt Shields, which block fast requests or malicious prompts that make models forget their training data;
Groundedness Detection, which finds and blocks hallucinations;
Security evaluation function, which weighs model vulnerabilities.

Two other functions for directing models towards safe outcomes and tracking prompts to identify potentially problematic users will be available soon.

Regardless of whether a user inputs a prompt or a model processes third-party data, the monitoring system will assess it to see if it triggers any forbidden words or has hidden prompts before deciding to send it to the model for a response. After that, the system reviews the response and checks if the model hallucinated (i.e., provided false data).

In the future, Azure clients will also be able to receive reports on users attempting to initiate dangerous outputs. Berd says that this will allow system administrators to distinguish red teams from people with malicious intentions.

It is noted that security features are immediately "connected" to GPT-4 and other popular models, such as Llama 2. However, since the Azure model collection contains many artificial intelligence systems - users of less popular open-source systems may need to add them manually.

Source: The Verge