MOLT

MoltWall Docs v0.1

Introduction

MoltWall is a production-grade security firewall middleware for AI agents. It sits between your agent and its tools, evaluating every action before execution against configurable policies, risk thresholds, and threat detection guardrails.

<10ms

Latency

8+

Threats

4

Decisions

100%

Audit

What it does

  • Evaluates every agent tool call against a policy (allowlist, blocklist, spend limits)
  • Computes a 0–1 risk score across 8 weighted factors per request
  • Scans arguments recursively for prompt injection, credential leaks, and PII
  • Returns a decision: allow / deny / sandbox / require_confirmation
  • Persists every decision to Supabase for full audit trail
  • Caches policies in Upstash Redis with stale-while-revalidate for <1ms enforcement

Architecture

MoltWall is a stateless Next.js API layer backed by Supabase (Postgres) and Upstash Redis. The request path is fully deterministic -the policy engine and risk engine have no model calls.

text
Agent / SDK
      │
      ▼
  POST /api/moltwall/check
      │
      ├── 1. Auth (SHA-256 API key lookup)
      ├── 2. Rate Limit (Redis sliding window, per agent_id)
      ├── 3. Input Hardening (size, depth, identifier validation)
      ├── 4. Policy Engine  ◀── Redis SWR cache ◀── Supabase
      │       ├── evaluateToolAccess()
      │       ├── evaluateActionPermission()
      │       ├── evaluateDomainTrust()
      │       └── evaluateSpendLimit()
      ├── 5. Risk Engine
      │       └── 8-factor weighted score (0–1)
      ├── 6. Guardrail Engine
      │       ├── Prompt Injection Scanner
      │       ├── Credential Scanner
      │       └── PII Scanner
      └── 7. Decision + Audit Log → Supabase

Decision Flow

DecisionTriggerAction
allowRisk < threshold_allow, no policy violationReturn allow, execute tool
require_confirmationRisk in [allow, sandbox)Return decision, await human confirm
sandboxRisk in [sandbox, deny)Execute in isolated sandbox environment
denyRisk ≥ threshold_deny OR blocked action OR guardrail criticalHard block, log threat

Quick Start

Get MoltWall running locally in under 5 minutes.

1. Clone and install

bash
git clone https://github.com/your-org/agent-wall
cd agent-wall
npm install

2. Environment variables

bash
cp .env.example .env.local

# Required:
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
UPSTASH_REDIS_REST_URL=https://your-redis.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-redis-token

3. Run migrations

bash
# In Supabase SQL editor, run:
supabase/migrations/001_initial.sql

4. Start dev server

bash
npm run dev
# → http://localhost:3000

5. Send your first check

bash
curl -X POST http://localhost:3000/api/moltwall/check \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "agent_id": "my-agent-001",
    "action": "search_web",
    "tool": "browser",
    "args": { "query": "latest AI news" },
    "source": "user"
  }'
If no policy is configured, MoltWall runs in permissive mode -all tools allowed, decisions based on risk score only.

POST /api/moltwall/check

The primary endpoint. Evaluates an agent action against policies, risk scoring, and guardrails. Returns a decision in milliseconds.

Request

FieldTypeRequiredDescription
agent_idstringUnique identifier for the agent making the request
actionstringThe action being requested (e.g. transfer_funds)
toolstringTool ID being invoked (matched against allowlist)
argsobjectTool arguments. Recursively scanned by guardrails.
sourcestringOrigin: user | agent | tool | web
intentstringHigh-level goal. Used for intent mismatch detection.
session_idstringSession context for grouping related actions.

Response

json
{
  "decision": "allow" | "deny" | "sandbox" | "require_confirmation",
  "risk_score": 0.23,
  "reason": "Action allowed. Risk within threshold.",
  "action_id": "uuid-v4",
  "guardrail_threats": [],
  "policy_violations": [],
  "metadata": {
    "policy_applied": true,
    "cache_hit": true,
    "latency_ms": 7
  }
}

Example

typescript
const res = await fetch("/api/moltwall/check", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "x-api-key": process.env.MoltWall_API_KEY,
  },
  body: JSON.stringify({
    agent_id: "agent-001",
    action:   "transfer_funds",
    tool:     "wallet",
    args:     { amount: 500, recipient: "0xabc123" },
    source:   "user",
    intent:   "Pay vendor invoice",
  }),
});

const { decision, risk_score, reason } = await res.json();

Error Responses

StatusCodeCause
400VALIDATION_ERRORRequest body fails Zod schema validation
401UNAUTHORIZEDMissing or invalid x-api-key header
429RATE_LIMITEDExceeded rate limit for this agent_id
413PAYLOAD_TOO_LARGERequest body or args exceeds size limits
500INTERNAL_ERRORUnexpected server error (details in logs)

POST /api/MoltWall/scan

Run only the guardrail engine on arbitrary content. Useful for scanning tool outputs, web pages, or user messages before feeding them to agents.

json
// Request
{
  "content": "string or object to scan",
  "source": "web" | "tool" | "user" | "agent"
}

// Response
{
  "threats": [
    {
      "type": "prompt_injection",
      "severity": "critical",
      "detail": "Instruction override pattern detected",
      "field": "content"
    }
  ],
  "risk_boost": 0.45,
  "should_deny": true
}
Indirect sources (tool, web) trigger stricter denial logic -any high-severity threat causes immediate denial.

GET | POST | DELETE /api/policy

Retrieve, create/update, or delete the organization's security policy.

GET -Retrieve policy

bash
curl /api/policy -H "x-api-key: YOUR_KEY"
# Returns { policy: PolicyObject | null }

POST -Create or update

FieldTypeDescription
allowed_toolsstring[]Whitelisted tool IDs. Empty array = all tools permitted.
blocked_actionsstring[]Actions that are always denied.
trusted_domainsstring[]Untrusted domains increase risk score.
sensitive_actionsstring[]Actions flagged for review (not blocked).
max_spend_usdnumber | nullMaximum monetary value per action.
risk_threshold_allownumber (0–1)Below this score → allow.
risk_threshold_sandboxnumber (0–1)Below this score → require_confirmation.
risk_threshold_denynumber (0–1)At or above this score → deny.
typescript
await fetch("/api/policy", {
  method: "POST",
  headers: { "Content-Type": "application/json", "x-api-key": KEY },
  body: JSON.stringify({
    allowed_tools: ["browser", "search", "calendar"],
    blocked_actions: ["delete_account", "export_all_data"],
    trusted_domains: ["github.com", "api.openai.com"],
    sensitive_actions: ["payment", "transfer", "send"],
    max_spend_usd: 500,
    risk_threshold_allow: 0.3,
    risk_threshold_sandbox: 0.6,
    risk_threshold_deny: 0.8,
  }),
});

GET /api/tools · POST /api/tools/register

List registered tools or register a new tool with MoltWall.

Register a tool

typescript
await fetch("/api/tools/register", {
  method: "POST",
  headers: { "Content-Type": "application/json", "x-api-key": KEY },
  body: JSON.stringify({
    tool_id:     "browser-use",
    publisher:   "Anthropic",
    description: "Headless browser automation for web research",
    permissions: ["network", "read"],
    risk_level:  "medium",   // "low" | "medium" | "high" | "critical"
  }),
});
Tool description and publisher fields are scanned for prompt injection on registration. Suspicious content is rejected.

GET /api/logs

Query the action audit log with optional filters.

bash
# Query params:
# agent_id   -filter by agent
# decision   -"allow" | "deny" | "sandbox" | "require_confirmation"
# limit      -max results (default 50, max 500)
# offset     -pagination offset

curl "/api/logs?decision=deny&limit=20" \
  -H "x-api-key: YOUR_KEY"

Policy Engine

The policy engine performs deterministic rule evaluation. It runs before the risk engine, meaning policy violations can short-circuit to immediate denial without scoring.

Evaluation functions

FunctionChecksOn violation
evaluateToolAccess()tool in allowed_tools[]deny if allowlist non-empty and tool missing
evaluateActionPermission()action in blocked_actions[]deny immediately
evaluateDomainTrust()domain in trusted_domains[]adds 0.2 risk boost
evaluateSpendLimit()args.amount ≤ max_spend_usddeny if exceeded

Redis caching

Policies are cached in Upstash Redis using stale-while-revalidate (SWR). Cache TTL is 5 minutes; stale threshold is 4 minutes. Negative cache (no-policy) TTL is 30 seconds to avoid repeated DB hits.

typescript
// Cache keys (Redis)
CacheKeys.policy(orgId)       // → "policy:org:<id>"
CacheKeys.toolRegistry(orgId) // → "tools:org:<id>"
CacheKeys.rateLimit(agentId)  // → "rl:agent:<id>"

Risk Engine

The risk engine computes a normalized 0–1 score from 8 weighted factors. The score determines which decision band applies.

Risk factors

FactorWeightTrigger
payment_action+0.35action contains: payment, transfer, withdraw, send, pay
unknown_tool+0.25tool not in registered tool registry
untrusted_domain+0.20domain in args not in trusted_domains policy
sensitive_args+0.20args contain: password, secret, token, key, private
intent_mismatch+0.15action semantically distant from stated intent
high_risk_source+0.30source is tool or web (indirect attack surface)
agent_source+0.05source is agent (elevated vs user)
guardrail_boostvariablefrom guardrail scan: 0.2 (high) or 0.4 (critical)
Final score is Math.min(1, sum_of_factors). A score of 0 is never returned -minimum is 0.02 to reflect inherent uncertainty.

Guardrail Engine

The guardrail engine runs after risk scoring. It recursively inspects all string values in the request payload using three specialized scanners.

Prompt Injection Scanner

Detects attempts to override agent instructions or exfiltrate data via crafted input.

Pattern ClassExamples
Instruction overrideignore previous instructions, new directive, system prompt
Jailbreakpretend you are, DAN mode, developer mode enabled
Data extractionrepeat everything above, print your instructions
Indirect HTML/XML<!-- inject -->, <script>, markdown code blocks with commands
Tool poisoningTool descriptions containing always/never + action verbs

Credential Scanner

Detects API keys, tokens, private keys, and high-entropy strings that may indicate credential leakage.

PII Scanner

Detects email addresses, phone numbers, SSNs, credit card numbers, and passport patterns.

Severity levels

info → +0.0medium → +0.1high → +0.2critical → +0.4
For indirect sources (tool, web), any high or critical severity threat triggers immediate denial, regardless of risk score.

SDK -Installation

The MoltWall SDK is a thin TypeScript client that wraps the REST API.

bash
npm install @MoltWall/sdk
# or: pnpm add @MoltWall/sdk
typescript
import { MoltWall } from "@MoltWall/sdk";

const wall = new MoltWall({
  baseUrl: "https://your-MoltWall.vercel.app",
  apiKey:  process.env.MoltWall_API_KEY!,
  agentId: "my-agent",  // default agent_id for all checks
});

SDK -check()

Evaluate an action before execution. This is the core method.

typescript
const result = await wall.check({
  action:    "send_email",
  tool:      "gmail",
  args:      { to: "boss@company.com", subject: "Urgent" },
  source:    "user",
  intent:    "Send weekly report",
  sessionId: "session-abc",
});

switch (result.decision) {
  case "allow":
    return await executeTool(result);
  case "deny":
    throw new Error(`Blocked: ${result.reason}`);
  case "require_confirmation":
    return await promptUser(result);
  case "sandbox":
    return await executeIsolated(result);
}

SDK -setPolicy()

typescript
await wall.setPolicy({
  allowedTools:        ["browser", "search"],
  blockedActions:      ["delete_account"],
  trustedDomains:      ["api.openai.com"],
  sensitiveActions:    ["payment", "transfer"],
  maxSpendUsd:         1000,
  riskThresholdAllow:  0.3,
  riskThresholdSandbox: 0.6,
  riskThresholdDeny:   0.8,
});

SDK -registerTool()

typescript
await wall.registerTool({
  toolId:      "browser-use",
  publisher:   "Anthropic",
  description: "Headless browser for web research",
  permissions: ["network", "read"],
  riskLevel:   "medium",
});

Authentication

All API endpoints require an x-api-key header. Keys are stored SHA-256 hashed in Supabase.

typescript
// Every request:
headers: {
  "x-api-key": "MoltWall_live_your_key_here"
}
Never expose API keys in client-side code. Always use environment variables and server-side calls.

Key format

Keys follow the pattern MoltWall_live_<random-32-bytes-hex>. Generate with:

bash
node -e "console.log('MoltWall_live_' + require('crypto').randomBytes(32).toString('hex'))"

Rate Limiting

Sliding window rate limiting enforced per agent_id via Upstash Redis.

ParameterDefaultNotes
Window60 secondsRolling window
Limit100 requestsPer agent_id per window
AlgorithmSliding windowRedis ZADD + ZREMRANGEBYSCORE
Response on limitHTTP 429Retry-After header included

Input Hardening

All inputs are hardened before processing. Limits are enforced at the boundary before any business logic runs.

LimitValueError
Request body size50KBHTTP 413
Args JSON size16KBHTTP 413
Object nesting depth10 levelsHTTP 400
Individual string length4096 charsHTTP 400
tool_id formatAlphanumeric + - _HTTP 400
action formatAlphanumeric + - _ /HTTP 400

Threat Model

MoltWall is designed to defend against these threat vectors:

01

Prompt Injection

User or tool output crafted to override agent instructions. Detected by the injection scanner on all string values in args.

02

Tool Poisoning

Malicious tool definitions that embed instructions. Scanned on registration and on every check request.

03

Credential Theft

Attempts to exfiltrate API keys, tokens, or secrets via tool arguments. High-entropy string detection.

04

Data Exfiltration

Indirect attacks via web content or tool outputs containing extraction instructions. Stricter rules for source=web/tool.

05

Wallet Drain / Overspend

Unbounded monetary actions. Enforced via max_spend_usd policy field and payment-action risk weighting.

06

DoS via Payload Size

Oversized arguments causing ReDoS or processing stalls. Enforced via body size, depth, and string length limits.

Environment Variables

VariableRequiredDescription
NEXT_PUBLIC_SUPABASE_URLYour Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEYSupabase anon/public key
SUPABASE_SERVICE_ROLE_KEYService role key (server-side only)
UPSTASH_REDIS_REST_URLUpstash Redis REST endpoint
UPSTASH_REDIS_REST_TOKENUpstash Redis REST token
MoltWall_MASTER_KEYAdmin API key for dashboard access
Never commit SUPABASE_SERVICE_ROLE_KEY or UPSTASH_REDIS_REST_TOKEN to version control. Add them to .gitignore and set them in your deployment environment.

Deploy to Vercel

MoltWall is designed to deploy on Vercel with zero configuration changes.

bash
# 1. Push to GitHub
git push origin main

# 2. Import project at vercel.com/new
# 3. Add environment variables in Vercel dashboard
# 4. Deploy -done.
All API routes use the nodejs runtime. Edge runtime is not supported due to the Supabase client. Each route is independently deployed as a serverless function.

Supabase Setup

MoltWall requires five tables in Supabase Postgres.

TablePurpose
organizationsMulti-tenant org records
api_keysSHA-256 hashed API keys with org_id FK
policiesOne policy per org: thresholds, allowlists, blocklists
toolsRegistered tool definitions with risk level
actionsFull audit log: every check decision
bash
# Run in Supabase SQL editor:
# File: supabase/migrations/001_initial.sql

Upstash Redis

Redis is used for three things: policy caching, rate limiting, and agent session context.

Key PatternTTLPurpose
policy:org:<id>5 minCached policy object (SWR)
tools:org:<id>5 minCached tool registry
rl:agent:<id>60 secRate limit sliding window (sorted set)
session:<id>30 minAgent session context
Create a free Upstash Redis database at upstash.com. Use the REST API endpoint and token -no connection pooling required.