LLM Security Testing
Classic penetration tests quickly reach their limits with AI-based applications. With our white-box approach for LLM security, we reliably identify vulnerabilities – from architecture to operation.
Large Language Models and AI components do not function deterministically – the same input can produce different results. This makes stochastic penetration tests, which are based on repeated trial and error, of limited significance.
An effective approach to the security of AI applications, therefore, requires more: a systematic white-box analysis based on detailed information about architecture and implementation. This is the only way to comprehensively identify and assess risks in the model, integration, APIs, infrastructure, and cloud environment.
Our Services
We offer you a targeted security audit of your AI-based applications – either comprehensively or focused on selected system components. We are guided by current standards and frameworks:
- OWASP LLM Top 10: Classification and evaluation of the most common risks in LLM applications.
- Mitre ATLAS: Structured threat analysis for the secure use of AI.
Approach
Our analyses combine classic methods of application security with AI-specific tests. We use static, dynamic, and audit-based methods.
- Kick-off & Scoping: Joint definition of the goals and relevant system components.
- Architecture Review: Analysis of the overall architecture and data flows with a focus on security gaps.
- Whitebox Analysis: Investigation of implementation, interfaces, and integrations.
- Static & dynamic tests: Use of code analyses, audits, and targeted penetration tests.
- Result preparation: Detailed reports with prioritized recommendations for action for technology and management.
Checkpoints
We examine your AI applications along the entire value chain, focusing on:
- Vulnerabilities in LLM and AI components
- Risks according to OWASP LLM Top 10 and Mitre ATLAS
- Security of application and cloud environments
- Securing APIs, data flows, and integrations
- Lifecycle and architecture analysis (design, deployment, operation)
- Effectiveness of existing protection and governance measures
Your Benefit
Our analysis approach provides clarity about the actual security of your AI-based applications – beyond the limitations of classic penetration tests.
We combine white-box methods with established standards and contribute our expertise from Application Security and Cloud Security. This allows us to not only uncover LLM-specific risks, but also evaluate the entire system architecture and lifecycle. The result: a well-founded security profile with clear, practical recommendations.
- White-box approach instead of unreliable standard penetration tests
- Orientation to OWASP LLM Top 10 and Mitre ATLAS
- Holistic view of architecture, code, and operation
- Combination of static, dynamic analyses and audits
- Identification of vulnerabilities in LLMs, APIs, and integrations
- Consideration of application and cloud security aspects
- Clear, prioritized recommendations for your company
- Sustainable protection of AI and LLM applications
mgm DeepDive
Here is usually prose… Our analysis approach provides clarity about the actual security of your AI-based applications – beyond the limitations of classic penetration tests.
| Classic penetration tests | LLM analysis (white-box approach) | Black box | |
|---|---|---|---|
| Methodology | Stochastic trial and error of possible inputs (“black box testing”) | Systematic white-box analysis with architecture and implementation knowledge | Stochastic trial and error of possible inputs (“black box testing”) |
| Suitability for LLMs | Limited, as AI outputs are variable and not reproducible | Highly suitable, as vulnerabilities in the model, interfaces, and integrations are specifically examined | Limited, as AI outputs are variable and not reproducible |
| Objective | Detection of classic vulnerabilities (e.g., Injection, XSS, SQLi) | Assessment of LLM-specific risks such as Prompt Injection, Data Leakage, Jailbreaks | Detection of classic vulnerabilities (e.g., Injection, XSS, SQLi) |
| Transparency | Limited insight, focus on Input/Output | Deep insight into architecture, data flows, code, and deployment | Limited insight, focus on Input/Output |
| Standards | OWASP Top 10, NIST, ISO | OWASP LLM Top 10, Mitre ATLAS, supplemented by Application & Cloud Security | OWASP Top 10, NIST, ISO |
| Result | List of classic vulnerabilities with fix recommendations | Holistic security profile including AI-specific threats and lifecycle analysis | List of classic vulnerabilities with fix recommendations |
Here is usually prose… Our analysis approach provides clarity about the actual security of your AI-based applications – beyond the limitations of classic penetration tests.
| Classic penetration tests | LLM analysis (white-box approach) | Black box | |
|---|---|---|---|
| Methodology | Stochastic trial and error of possible inputs (“black box testing”) | Systematic white-box analysis with architecture and implementation knowledge | Stochastic trial and error of possible inputs (“black box testing”) |
| Suitability for LLMs | Limited, as AI outputs are variable and not reproducible | Highly suitable, as vulnerabilities in the model, interfaces, and integrations are specifically examined | Limited, as AI outputs are variable and not reproducible |
| Objective | Detection of classic vulnerabilities (e.g., Injection, XSS, SQLi) | Assessment of LLM-specific risks such as Prompt Injection, Data Leakage, Jailbreaks | Detection of classic vulnerabilities (e.g., Injection, XSS, SQLi) |
| Transparency | Limited insight, focus on Input/Output | Deep insight into architecture, data flows, code, and deployment | Limited insight, focus on Input/Output |
| Standards | OWASP Top 10, NIST, ISO | OWASP LLM Top 10, Mitre ATLAS, supplemented by Application & Cloud Security | OWASP Top 10, NIST, ISO |
| Result | List of classic vulnerabilities with fix recommendations | Holistic security profile including AI-specific threats and lifecycle analysis | List of classic vulnerabilities with fix recommendations |
mgm DeepDive
Here is usually prose… Our analysis approach provides clarity about the actual security of your AI-based applications – beyond the limitations of classic penetration tests.
| Potential Danger | Significance and Recommendation | |
|---|---|---|
| 0.0 | [Info] | This note is for information purposes only and does not indicate a vulnerability. |
| 0.1 – 3.9 | [low] | The vulnerability is rated as low. Consideration should be given to fixing this problem in the long term. |
| 4.0 – 6.9 | [medium] | The vulnerability is rated as medium and should be fixed in the medium term. |
| 7.0 – 8.9 | [high] | The vulnerability is rated as high and should be fixed in the short term. |
| 9.0 – 10.0 | [critical] | The vulnerability is classified as critical. Immediate action should be taken. |
| – | [OK] | The application was examined for the specified vulnerability, but it was not found. |
Here is usually prose… Our analysis approach provides clarity about the actual security of your AI-based applications – beyond the limitations of classic penetration tests.
| CVSS-Score | Potential Danger | Significance and Recommendation |
|---|---|---|
| 0.0 | [Info] | This note is for information purposes only and does not indicate a vulnerability. |
| 0.1 – 3.9 | [low] | The vulnerability is rated as low. Consideration should be given to fixing this problem in the long term. |
| 4.0 – 6.9 | [medium] | The vulnerability is rated as medium and should be fixed in the medium term. |
| 7.0 – 8.9 | [high] | The vulnerability is rated as high and should be fixed in the short term. |
| 9.0 – 10.0 | [critical] | The vulnerability is classified as critical. Immediate action should be taken. |
| – | [OK] | The application was examined for the specified vulnerability, but it was not found. |
