MoltWall Docs v0.1
Introduction
MoltWall is a production-grade security firewall middleware for AI agents. It sits between your agent and its tools, evaluating every action before execution against configurable policies, risk thresholds, and threat detection guardrails.
<10ms
Latency
8+
Threats
4
Decisions
100%
Audit
What it does
- →Evaluates every agent tool call against a policy (allowlist, blocklist, spend limits)
- →Computes a 0–1 risk score across 8 weighted factors per request
- →Scans arguments recursively for prompt injection, credential leaks, and PII
- →Returns a decision: allow / deny / sandbox / require_confirmation
- →Persists every decision to Supabase for full audit trail
- →Caches policies in Upstash Redis with stale-while-revalidate for <1ms enforcement
Architecture
MoltWall is a stateless Next.js API layer backed by Supabase (Postgres) and Upstash Redis. The request path is fully deterministic -the policy engine and risk engine have no model calls.
Agent / SDK
│
▼
POST /api/moltwall/check
│
├── 1. Auth (SHA-256 API key lookup)
├── 2. Rate Limit (Redis sliding window, per agent_id)
├── 3. Input Hardening (size, depth, identifier validation)
├── 4. Policy Engine ◀── Redis SWR cache ◀── Supabase
│ ├── evaluateToolAccess()
│ ├── evaluateActionPermission()
│ ├── evaluateDomainTrust()
│ └── evaluateSpendLimit()
├── 5. Risk Engine
│ └── 8-factor weighted score (0–1)
├── 6. Guardrail Engine
│ ├── Prompt Injection Scanner
│ ├── Credential Scanner
│ └── PII Scanner
└── 7. Decision + Audit Log → SupabaseDecision Flow
| Decision | Trigger | Action |
|---|---|---|
| allow | Risk < threshold_allow, no policy violation | Return allow, execute tool |
| require_confirmation | Risk in [allow, sandbox) | Return decision, await human confirm |
| sandbox | Risk in [sandbox, deny) | Execute in isolated sandbox environment |
| deny | Risk ≥ threshold_deny OR blocked action OR guardrail critical | Hard block, log threat |
Quick Start
Get MoltWall running locally in under 5 minutes.
1. Clone and install
git clone https://github.com/your-org/agent-wall cd agent-wall npm install
2. Environment variables
cp .env.example .env.local # Required: NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key SUPABASE_SERVICE_ROLE_KEY=your-service-role-key UPSTASH_REDIS_REST_URL=https://your-redis.upstash.io UPSTASH_REDIS_REST_TOKEN=your-redis-token
3. Run migrations
# In Supabase SQL editor, run: supabase/migrations/001_initial.sql
4. Start dev server
npm run dev # → http://localhost:3000
5. Send your first check
curl -X POST http://localhost:3000/api/moltwall/check \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{
"agent_id": "my-agent-001",
"action": "search_web",
"tool": "browser",
"args": { "query": "latest AI news" },
"source": "user"
}'POST /api/moltwall/check
The primary endpoint. Evaluates an agent action against policies, risk scoring, and guardrails. Returns a decision in milliseconds.
Request
| Field | Type | Required | Description |
|---|---|---|---|
| agent_id | string | ✓ | Unique identifier for the agent making the request |
| action | string | ✓ | The action being requested (e.g. transfer_funds) |
| tool | string | ✓ | Tool ID being invoked (matched against allowlist) |
| args | object | ✗ | Tool arguments. Recursively scanned by guardrails. |
| source | string | ✗ | Origin: user | agent | tool | web |
| intent | string | ✗ | High-level goal. Used for intent mismatch detection. |
| session_id | string | ✗ | Session context for grouping related actions. |
Response
{
"decision": "allow" | "deny" | "sandbox" | "require_confirmation",
"risk_score": 0.23,
"reason": "Action allowed. Risk within threshold.",
"action_id": "uuid-v4",
"guardrail_threats": [],
"policy_violations": [],
"metadata": {
"policy_applied": true,
"cache_hit": true,
"latency_ms": 7
}
}Example
const res = await fetch("/api/moltwall/check", {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": process.env.MoltWall_API_KEY,
},
body: JSON.stringify({
agent_id: "agent-001",
action: "transfer_funds",
tool: "wallet",
args: { amount: 500, recipient: "0xabc123" },
source: "user",
intent: "Pay vendor invoice",
}),
});
const { decision, risk_score, reason } = await res.json();Error Responses
| Status | Code | Cause |
|---|---|---|
| 400 | VALIDATION_ERROR | Request body fails Zod schema validation |
| 401 | UNAUTHORIZED | Missing or invalid x-api-key header |
| 429 | RATE_LIMITED | Exceeded rate limit for this agent_id |
| 413 | PAYLOAD_TOO_LARGE | Request body or args exceeds size limits |
| 500 | INTERNAL_ERROR | Unexpected server error (details in logs) |
POST /api/MoltWall/scan
Run only the guardrail engine on arbitrary content. Useful for scanning tool outputs, web pages, or user messages before feeding them to agents.
// Request
{
"content": "string or object to scan",
"source": "web" | "tool" | "user" | "agent"
}
// Response
{
"threats": [
{
"type": "prompt_injection",
"severity": "critical",
"detail": "Instruction override pattern detected",
"field": "content"
}
],
"risk_boost": 0.45,
"should_deny": true
}tool, web) trigger stricter denial logic -any high-severity threat causes immediate denial.GET | POST | DELETE /api/policy
Retrieve, create/update, or delete the organization's security policy.
GET -Retrieve policy
curl /api/policy -H "x-api-key: YOUR_KEY"
# Returns { policy: PolicyObject | null }POST -Create or update
| Field | Type | Description |
|---|---|---|
| allowed_tools | string[] | Whitelisted tool IDs. Empty array = all tools permitted. |
| blocked_actions | string[] | Actions that are always denied. |
| trusted_domains | string[] | Untrusted domains increase risk score. |
| sensitive_actions | string[] | Actions flagged for review (not blocked). |
| max_spend_usd | number | null | Maximum monetary value per action. |
| risk_threshold_allow | number (0–1) | Below this score → allow. |
| risk_threshold_sandbox | number (0–1) | Below this score → require_confirmation. |
| risk_threshold_deny | number (0–1) | At or above this score → deny. |
await fetch("/api/policy", {
method: "POST",
headers: { "Content-Type": "application/json", "x-api-key": KEY },
body: JSON.stringify({
allowed_tools: ["browser", "search", "calendar"],
blocked_actions: ["delete_account", "export_all_data"],
trusted_domains: ["github.com", "api.openai.com"],
sensitive_actions: ["payment", "transfer", "send"],
max_spend_usd: 500,
risk_threshold_allow: 0.3,
risk_threshold_sandbox: 0.6,
risk_threshold_deny: 0.8,
}),
});GET /api/tools · POST /api/tools/register
List registered tools or register a new tool with MoltWall.
Register a tool
await fetch("/api/tools/register", {
method: "POST",
headers: { "Content-Type": "application/json", "x-api-key": KEY },
body: JSON.stringify({
tool_id: "browser-use",
publisher: "Anthropic",
description: "Headless browser automation for web research",
permissions: ["network", "read"],
risk_level: "medium", // "low" | "medium" | "high" | "critical"
}),
});description and publisher fields are scanned for prompt injection on registration. Suspicious content is rejected.GET /api/logs
Query the action audit log with optional filters.
# Query params: # agent_id -filter by agent # decision -"allow" | "deny" | "sandbox" | "require_confirmation" # limit -max results (default 50, max 500) # offset -pagination offset curl "/api/logs?decision=deny&limit=20" \ -H "x-api-key: YOUR_KEY"
Policy Engine
The policy engine performs deterministic rule evaluation. It runs before the risk engine, meaning policy violations can short-circuit to immediate denial without scoring.
Evaluation functions
| Function | Checks | On violation |
|---|---|---|
| evaluateToolAccess() | tool in allowed_tools[] | deny if allowlist non-empty and tool missing |
| evaluateActionPermission() | action in blocked_actions[] | deny immediately |
| evaluateDomainTrust() | domain in trusted_domains[] | adds 0.2 risk boost |
| evaluateSpendLimit() | args.amount ≤ max_spend_usd | deny if exceeded |
Redis caching
Policies are cached in Upstash Redis using stale-while-revalidate (SWR). Cache TTL is 5 minutes; stale threshold is 4 minutes. Negative cache (no-policy) TTL is 30 seconds to avoid repeated DB hits.
// Cache keys (Redis) CacheKeys.policy(orgId) // → "policy:org:<id>" CacheKeys.toolRegistry(orgId) // → "tools:org:<id>" CacheKeys.rateLimit(agentId) // → "rl:agent:<id>"
Risk Engine
The risk engine computes a normalized 0–1 score from 8 weighted factors. The score determines which decision band applies.
Risk factors
| Factor | Weight | Trigger |
|---|---|---|
| payment_action | +0.35 | action contains: payment, transfer, withdraw, send, pay |
| unknown_tool | +0.25 | tool not in registered tool registry |
| untrusted_domain | +0.20 | domain in args not in trusted_domains policy |
| sensitive_args | +0.20 | args contain: password, secret, token, key, private |
| intent_mismatch | +0.15 | action semantically distant from stated intent |
| high_risk_source | +0.30 | source is tool or web (indirect attack surface) |
| agent_source | +0.05 | source is agent (elevated vs user) |
| guardrail_boost | variable | from guardrail scan: 0.2 (high) or 0.4 (critical) |
Math.min(1, sum_of_factors). A score of 0 is never returned -minimum is 0.02 to reflect inherent uncertainty.Guardrail Engine
The guardrail engine runs after risk scoring. It recursively inspects all string values in the request payload using three specialized scanners.
Prompt Injection Scanner
Detects attempts to override agent instructions or exfiltrate data via crafted input.
| Pattern Class | Examples |
|---|---|
| Instruction override | ignore previous instructions, new directive, system prompt |
| Jailbreak | pretend you are, DAN mode, developer mode enabled |
| Data extraction | repeat everything above, print your instructions |
| Indirect HTML/XML | <!-- inject -->, <script>, markdown code blocks with commands |
| Tool poisoning | Tool descriptions containing always/never + action verbs |
Credential Scanner
Detects API keys, tokens, private keys, and high-entropy strings that may indicate credential leakage.
PII Scanner
Detects email addresses, phone numbers, SSNs, credit card numbers, and passport patterns.
Severity levels
tool, web), any high or critical severity threat triggers immediate denial, regardless of risk score.SDK -Installation
The MoltWall SDK is a thin TypeScript client that wraps the REST API.
npm install @MoltWall/sdk # or: pnpm add @MoltWall/sdk
import { MoltWall } from "@MoltWall/sdk";
const wall = new MoltWall({
baseUrl: "https://your-MoltWall.vercel.app",
apiKey: process.env.MoltWall_API_KEY!,
agentId: "my-agent", // default agent_id for all checks
});SDK -check()
Evaluate an action before execution. This is the core method.
const result = await wall.check({
action: "send_email",
tool: "gmail",
args: { to: "boss@company.com", subject: "Urgent" },
source: "user",
intent: "Send weekly report",
sessionId: "session-abc",
});
switch (result.decision) {
case "allow":
return await executeTool(result);
case "deny":
throw new Error(`Blocked: ${result.reason}`);
case "require_confirmation":
return await promptUser(result);
case "sandbox":
return await executeIsolated(result);
}SDK -setPolicy()
await wall.setPolicy({
allowedTools: ["browser", "search"],
blockedActions: ["delete_account"],
trustedDomains: ["api.openai.com"],
sensitiveActions: ["payment", "transfer"],
maxSpendUsd: 1000,
riskThresholdAllow: 0.3,
riskThresholdSandbox: 0.6,
riskThresholdDeny: 0.8,
});SDK -registerTool()
await wall.registerTool({
toolId: "browser-use",
publisher: "Anthropic",
description: "Headless browser for web research",
permissions: ["network", "read"],
riskLevel: "medium",
});Authentication
All API endpoints require an x-api-key header. Keys are stored SHA-256 hashed in Supabase.
// Every request:
headers: {
"x-api-key": "MoltWall_live_your_key_here"
}Key format
Keys follow the pattern MoltWall_live_<random-32-bytes-hex>. Generate with:
node -e "console.log('MoltWall_live_' + require('crypto').randomBytes(32).toString('hex'))"Rate Limiting
Sliding window rate limiting enforced per agent_id via Upstash Redis.
| Parameter | Default | Notes |
|---|---|---|
| Window | 60 seconds | Rolling window |
| Limit | 100 requests | Per agent_id per window |
| Algorithm | Sliding window | Redis ZADD + ZREMRANGEBYSCORE |
| Response on limit | HTTP 429 | Retry-After header included |
Input Hardening
All inputs are hardened before processing. Limits are enforced at the boundary before any business logic runs.
| Limit | Value | Error |
|---|---|---|
| Request body size | 50KB | HTTP 413 |
| Args JSON size | 16KB | HTTP 413 |
| Object nesting depth | 10 levels | HTTP 400 |
| Individual string length | 4096 chars | HTTP 400 |
| tool_id format | Alphanumeric + - _ | HTTP 400 |
| action format | Alphanumeric + - _ / | HTTP 400 |
Threat Model
MoltWall is designed to defend against these threat vectors:
Prompt Injection
User or tool output crafted to override agent instructions. Detected by the injection scanner on all string values in args.
Tool Poisoning
Malicious tool definitions that embed instructions. Scanned on registration and on every check request.
Credential Theft
Attempts to exfiltrate API keys, tokens, or secrets via tool arguments. High-entropy string detection.
Data Exfiltration
Indirect attacks via web content or tool outputs containing extraction instructions. Stricter rules for source=web/tool.
Wallet Drain / Overspend
Unbounded monetary actions. Enforced via max_spend_usd policy field and payment-action risk weighting.
DoS via Payload Size
Oversized arguments causing ReDoS or processing stalls. Enforced via body size, depth, and string length limits.
Environment Variables
| Variable | Required | Description |
|---|---|---|
| NEXT_PUBLIC_SUPABASE_URL | ✓ | Your Supabase project URL |
| NEXT_PUBLIC_SUPABASE_ANON_KEY | ✓ | Supabase anon/public key |
| SUPABASE_SERVICE_ROLE_KEY | ✓ | Service role key (server-side only) |
| UPSTASH_REDIS_REST_URL | ✓ | Upstash Redis REST endpoint |
| UPSTASH_REDIS_REST_TOKEN | ✓ | Upstash Redis REST token |
| MoltWall_MASTER_KEY | ✗ | Admin API key for dashboard access |
SUPABASE_SERVICE_ROLE_KEY or UPSTASH_REDIS_REST_TOKEN to version control. Add them to .gitignore and set them in your deployment environment.Deploy to Vercel
MoltWall is designed to deploy on Vercel with zero configuration changes.
# 1. Push to GitHub git push origin main # 2. Import project at vercel.com/new # 3. Add environment variables in Vercel dashboard # 4. Deploy -done.
nodejs runtime. Edge runtime is not supported due to the Supabase client. Each route is independently deployed as a serverless function.Supabase Setup
MoltWall requires five tables in Supabase Postgres.
| Table | Purpose |
|---|---|
| organizations | Multi-tenant org records |
| api_keys | SHA-256 hashed API keys with org_id FK |
| policies | One policy per org: thresholds, allowlists, blocklists |
| tools | Registered tool definitions with risk level |
| actions | Full audit log: every check decision |
# Run in Supabase SQL editor: # File: supabase/migrations/001_initial.sql
Upstash Redis
Redis is used for three things: policy caching, rate limiting, and agent session context.
| Key Pattern | TTL | Purpose |
|---|---|---|
| policy:org:<id> | 5 min | Cached policy object (SWR) |
| tools:org:<id> | 5 min | Cached tool registry |
| rl:agent:<id> | 60 sec | Rate limit sliding window (sorted set) |
| session:<id> | 30 min | Agent session context |