Autoresearch Agent

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

Gitix AI
Gitix AI
· 7 days ago · v1
SkillSpector HIGH
60/100 ✕ DO NOT USE
3 security findings detected
HIGH Output Handling · Unvalidated Output Injection 85% confidence

Match: subprocess.run(["my-benchmark", "--json"], capture_output

Line 215

Model output is used without validation or sanitization. Unvalidated output injected into downstream contexts (SQL, shell, HTML) enables injection attacks and arbitrary code execution.

#!/usr/bin/env python3
# My custom evaluator — DO NOT MODIFY after experiment starts
import subprocess
result = subprocess.run(["my-benchmark", "--json"], capture_output=True, text=True)
# Parse and output
print(f"my_metric: {parse_score(result.stdout)}")
```

Validate and sanitize all model output before using it in downstream contexts. Use parameterized queries for SQL, shell quoting for commands, and HTML encoding for web output.

MEDIUM Rogue Agent · Session Persistence 60% confidence

Match: Create the Experiment Run the setup script. The user decides where experiments live: **Project-level** (inside repo, git-tracked, shareable with team): ```bash python scripts/setup_experiment.py \

Line 40

Skill establishes unauthorized persistence across sessions via cron jobs, startup scripts, or state files. Session persistence allows an attacker to maintain access beyond the current interaction.

## Setup

### First Time — Create the Experiment

Run the setup script. The user decides where experiments live:

Remove any persistence mechanisms (cron jobs, startup scripts, state files). Skills should not maintain state across sessions without explicit user consent.

HIGH Tool Misuse · Tool Parameter Abuse 65% confidence

Match: git reset --hard

Line 136

Tool parameters are crafted to achieve unintended or unsafe behavior. Parameter abuse can bypass intended safety checks (e.g. shell=True, --force, dangerous glob patterns).

- Running the eval command with timeout
- Parsing the metric from eval output
- Comparing to previous best
- Reverting the commit on failure (`git reset --hard HEAD~1`)
- Logging the result to results.tsv

### Starting an Experiment

Validate all tool parameters against an allowlist. Reject dangerous parameter values (shell=True, --force, -rf /) and use safe defaults.

3
0
0
0

Comments (0)

Sign in to leave a comment.

No comments yet. Be the first!