Multimodal AI at Risk: New Report Exposes Critical Risks
Published 05/29/2025
Originally published by Enkrypt AI.
Written by Prashanth Harshangi, CTO, Enkrypt AI.
Red teaming tests expose major gaps in multimodal AI safety.
As generative AI rapidly evolves to process both text and images, a new Multimodal Safety Report released by Enkrypt AI reveals critical risks that threaten the integrity and safety of multimodal systems.
The red teaming exercise was conducted on several multimodal models, and tests across several safety and harm categories as described in the NIST AI RMF. Newer jailbreak techniques exploit the way multimodal models process combined media, bypassing content filters and leading to harmful outputs—without any obvious red flags in the visible prompt.
Alarming Findings
The research illustrates how multimodal models—designed to handle text and image inputs—can inadvertently expand the surface area for abuse when not sufficiently safeguarded.
Such risks can be found in any multimodal model, however, the report focused on two popular ones developed by Mistral: Pixtral-Large (25.02) and Pixtral-12b.
- According to the report, these two models are 60 times more prone to generate child sexual exploitation material (CSEM)-related textual responses than comparable models like OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet.
- Additionally, the tests revealed that the models were 18-40 times more likely to produce dangerous CBRN (Chemical, Biological, Radiological, and Nuclear) information when prompted with adversarial inputs.
These risks aren’t unique to Mistral—they reflect broader safety concerns in multimodal AI and threaten to undermine the intended use of generative AI.
Please note: The risks revealed in the report were not due to malicious text inputs but triggered by prompt injections buried within image files, a technique that could realistically be used to evade traditional safety filters.
Read the Full Report
Access the full Multimodal Safety Report and learn more about the testing methodology and mitigation strategies.
About the Author
Prashanth is the co-founder and CTO of Enkrypt AI, with extensive expertise in data science, machine learning, and technology leadership. He has successfully led teams to deliver transformative AI solutions by bridging complex business needs with scalable, high-impact machine learning models. With a strong foundation in both data science and ML engineering, Prashanth excels at managing the entire ML lifecycle, from conceptualization to deployment. He is a hands-on leader, leveraging his experience in software development, container orchestration, and Kubernetes deployments to guide data science and ML engineering teams. Prashanth is also a dedicated mentor, having helped numerous interns and team members advance in their careers by providing guidance in ML strategies and real-world applications. He holds a Ph.D. in Applied Mathematics and Optimization from Yale University.
Unlock Cloud Security Insights
Subscribe to our newsletter for the latest expert trends and updates
Related Articles:
How Zero Trust Can Save Your Business from the Next Big Data Breach
Published: 06/06/2025
Ransomware in the Education Sector
Published: 06/05/2025
When Good GPTs Go Bad: How Trusted AI Tools Are Exploited for Attacks
Published: 06/05/2025
The Dawn of the Fractional Chief AI Safety Officer
Published: 06/04/2025