Multimodal AI at Risk: New Report Exposes Critical Risks

Published 05/29/2025

Written by Prashanth Harshangi, CTO, Enkrypt AI.

Red teaming tests expose major gaps in multimodal AI safety.

As generative AI rapidly evolves to process both text and images, a new Multimodal Safety Report released by Enkrypt AI reveals critical risks that threaten the integrity and safety of multimodal systems.

Report Image

The red teaming exercise was conducted on several multimodal models, and tests across several safety and harm categories as described in the NIST AI RMF. Newer jailbreak techniques exploit the way multimodal models process combined media, bypassing content filters and leading to harmful outputs—without any obvious red flags in the visible prompt.

Alarming Findings

The research illustrates how multimodal models—designed to handle text and image inputs—can inadvertently expand the surface area for abuse when not sufficiently safeguarded.

Such risks can be found in any multimodal model, however, the report focused on two popular ones developed by Mistral: Pixtral-Large (25.02) and Pixtral-12b.

According to the report, these two models are 60 times more prone to generate child sexual exploitation material (CSEM)-related textual responses than comparable models like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet.
Additionally, the tests revealed that the models were 18-40 times more likely to produce dangerous CBRN (Chemical, Biological, Radiological, and Nuclear) information when prompted with adversarial inputs.

These risks aren’t unique to Mistral—they reflect broader safety concerns in multimodal AI and threaten to undermine the intended use of generative AI.

Please note: The risks revealed in the report were not due to malicious text inputs but triggered by prompt injections buried within image files, a technique that could realistically be used to evade traditional safety filters.

Read the Full Report

Access the full Multimodal Safety Report and learn more about the testing methodology and mitigation strategies.

About the Author

Prashanth is the co-founder and CTO of Enkrypt AI, with extensive expertise in data science, machine learning, and technology leadership. He has successfully led teams to deliver transformative AI solutions by bridging complex business needs with scalable, high-impact machine learning models. With a strong foundation in both data science and ML engineering, Prashanth excels at managing the entire ML lifecycle, from conceptualization to deployment. He is a hands-on leader, leveraging his experience in software development, container orchestration, and Kubernetes deployments to guide data science and ML engineering teams. Prashanth is also a dedicated mentor, having helped numerous interns and team members advance in their careers by providing guidance in ML strategies and real-world applications. He holds a Ph.D. in Applied Mathematics and Optimization from Yale University.

Artificial Intelligence Cyber Incident Sharing Risk Management