AI Security

Use when assessing AI/ML systems for prompt injection, jailbreak vulnerabilities, model inversion risk, data poisoning exposure, or agent tool abuse. Covers MITRE ATLAS technique mapping, injection signature detection, and adversarial robustness scoring.

Gitix AI
Gitix AI
· 7 days ago · v1
SkillSpector HIGH
60/100 ✕ DO NOT USE
3 security findings detected
MEDIUM Excessive Agency · Autonomous Decision Making 85% confidence

Match: skip confirmation

Line 207

Skill enables autonomous high-impact decisions without human-in-the-loop verification. Critical operations (destructive commands, financial transactions, data deletion) should require explicit user confirmation.

|--------|-------------|-----------------|-----------|
| Direct tool injection | Prompt explicitly requests destructive tool call | AML.T0051.002 | tool_abuse signature match |
| Indirect tool hijacking | Malicious content in retrieved document triggers tool call | AML.T0051.001 | Indirect injection detection |
| Approval gate bypass | Prompt asks agent to skip confirmation steps | AML.T0051.002 | "bypass" + "approval" pattern |
| Privilege escalation via tools | Agent uses tools to access resources outside scope | AML.T0051 | Resource access scope monitoring |

### Tool Abuse Mitigations

Add human-in-the-loop confirmation for destructive, irreversible, or high-impact operations. Never auto-execute commands that modify files, send data, or alter system state.

HIGH Memory Poisoning · Memory Manipulation 90% confidence

Match: Poison Training

Line 232

Skill manipulates agent memory, state, or stored context. Memory corruption can alter personality, override safety rules, or cause unpredictable behavior.

| AML.T0051.001 | Indirect Prompt Injection | Initial Access | External content injection patterns |
| AML.T0051.002 | Agent Tool Abuse | Execution | Tool abuse signature detection |
| AML.T0056 | LLM Data Extraction | Exfiltration | System prompt extraction detection |
| AML.T0020 | Poison Training Data | Persistence | Data poisoning risk scoring |
| AML.T0043 | Craft Adversarial Data | Defense Evasion | Adversarial robustness scoring for classifiers |
| AML.T0024 | Exfiltration via ML Inference API | Exfiltration | Model inversion risk scoring |

Protect agent memory and state from modification by untrusted content. Use read-only memory for critical instructions and validate all state changes.

HIGH Prompt Injection · Instruction Override 90% confidence

Match: bypass safety

Line 134

This pattern attempts to override system instructions or ignore safety constraints. Without LLM analysis, manual review is recommended.

## Jailbreak Assessment

Jailbreak attempts bypass safety alignment training through roleplay framing, persona manipulation, or hypothetical context framing.

### Jailbreak Taxonomy

Remove or rewrite any text that instructs the agent to ignore prompts, override safety rules, or trust unverified content. Ensure skill content cannot be injected to alter agent behavior.

4
0
0
0

Comments (0)

Sign in to leave a comment.

No comments yet. Be the first!