LLM/AI/ML/prompt testing
Workflow
- Understand the Model
Review purpose, architecture, training data, and safety measures.
- Identify and Test Attack Surfaces
Evaluate all input methods (text, API, multi-modal uploads).
Assess output filtering and guardrails.
- Test Specific Vulnerabilities:
Prompt Injection: Submit manipulative prompts to override instructions.
Data Leakage: Query for sensitive data (e.g., training data snippets, session information).
Model Inversion/Extraction: Use repeated queries to assess if training data or model behavior can be reconstructed.
Jailbreaking: Attempt role-play or obfuscation techniques to bypass restrictions.
Adversarial Perturbations: Generate slightly modified inputs to trigger misclassification or unintended outputs.
Denial of Service (DoS): Simulate resource-intensive queries to test system resilience.
Bias Exploitation: Input scenarios to determine if biased outputs are generated.
Supply Chain and Context Attacks: Test third-party integrations and memory retention vulnerabilities.
Multi-Modal Security: Combine text, images, audio, or files to check for cross-modal vulnerabilities.
- Document All Findings:
Record attack steps, success rates, and remediation suggestions.
- Check AI/LLM models redline
No Disclosure of Private Information
The system must not reveal any user data, proprietary details, or confidential training information.
No Harmful or Illegal Content
The LLM must refrain from generating content that promotes violence, hate speech, or unlawful behavior.
No Unethical Assistance
The model should not provide instructions for illegal activities (e.g., hacking, system compromise).
No Impersonation
The LLM must not mimic real individuals without explicit consent or misrepresent its nature.
No Misinformation
The system should avoid spreading false or misleading information.
No Bias or Discrimination
Outputs must remain fair and unbiased across all demographics.
No Unauthorized Access or System Abuse
The LLM should block any attempt to bypass security controls or degrade system performance (e.g., triggering denial-of-service conditions).
No Intellectual Property Leaks
The system must protect trade secrets and copyrighted material.Attack Vectors and Techniques
Prompt Injection Attacks
Data Leakage and Exposure
Model Inversion and Extraction
Jailbreaking and Guardrail Evasion
Adversarial Inputs and Model Manipulation
Denial of Service (DoS) and Resource Abuse
Bias Exploitation and Ethical Violations
Supply Chain Attacks
Context Manipulation and Memory Attacks
Multi-Modal Attacks (Text, Images, Files, Audio)
Injections examples
1. Prompt Injection Attacks
2. Data Leakage and Exposure
3. Model Inversion and Extraction
4. Jailbreaking and Guardrail Evasion
5. Adversarial Inputs and Model Manipulation
6. Denial of Service (DoS) and Resource Abuse
7. Bias Exploitation and Ethical Violations
8. Supply Chain Attacks
9. Context Manipulation and Memory Attacks
10. Multi-Modal Attacks (Text, Images, Files, Audio)
Resources
Labs
More resources
Last updated
Was this helpful?
