MoltWall - AI Agent Security Firewall

MOLT

MoltWall Docs v0.1

Introduction

MoltWall is a production-grade security firewall middleware for AI agents. It sits between your agent and its tools, evaluating every action before execution against configurable policies, risk thresholds, and threat detection guardrails.

<10ms

Latency

Threats

Decisions

100%

Audit

What it does

→Evaluates every agent tool call against a policy (allowlist, blocklist, spend limits)
→Computes a 0–1 risk score across 8 weighted factors per request
→Scans arguments recursively for prompt injection, credential leaks, and PII
→Returns a decision: allow / deny / sandbox / require_confirmation
→Persists every decision to Supabase for full audit trail
→Caches policies in Upstash Redis with stale-while-revalidate for <1ms enforcement

Architecture

MoltWall is a stateless Next.js API layer backed by Supabase (Postgres) and Upstash Redis. The request path is fully deterministic -the policy engine and risk engine have no model calls.

text

Agent / SDK
      │
      ▼
  POST /api/moltwall/check
      │
      ├── 1. Auth (SHA-256 API key lookup)
      ├── 2. Rate Limit (Redis sliding window, per agent_id)
      ├── 3. Input Hardening (size, depth, identifier validation)
      ├── 4. Policy Engine  ◀── Redis SWR cache ◀── Supabase
      │       ├── evaluateToolAccess()
      │       ├── evaluateActionPermission()
      │       ├── evaluateDomainTrust()
      │       └── evaluateSpendLimit()
      ├── 5. Risk Engine
      │       └── 8-factor weighted score (0–1)
      ├── 6. Guardrail Engine
      │       ├── Prompt Injection Scanner
      │       ├── Credential Scanner
      │       └── PII Scanner
      └── 7. Decision + Audit Log → Supabase

Decision Flow

Decision	Trigger	Action
allow	Risk < threshold_allow, no policy violation	Return allow, execute tool
require_confirmation	Risk in [allow, sandbox)	Return decision, await human confirm
sandbox	Risk in [sandbox, deny)	Execute in isolated sandbox environment
deny	Risk ≥ threshold_deny OR blocked action OR guardrail critical	Hard block, log threat

Quick Start

Get MoltWall running locally in under 5 minutes.

1. Clone and install

bash

git clone https://github.com/your-org/agent-wall
cd agent-wall
npm install

2. Environment variables

bash

cp .env.example .env.local

# Required:
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
UPSTASH_REDIS_REST_URL=https://your-redis.upstash.io
UPSTASH_REDIS_REST_TOKEN=your-redis-token

3. Run migrations

bash

# In Supabase SQL editor, run:
supabase/migrations/001_initial.sql

4. Start dev server

bash

npm run dev
# → http://localhost:3000

5. Send your first check

bash

curl -X POST http://localhost:3000/api/moltwall/check \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "agent_id": "my-agent-001",
    "action": "search_web",
    "tool": "browser",
    "args": { "query": "latest AI news" },
    "source": "user"
  }'

⚡

If no policy is configured, MoltWall runs in permissive mode -all tools allowed, decisions based on risk score only.

POST /api/moltwall/check

The primary endpoint. Evaluates an agent action against policies, risk scoring, and guardrails. Returns a decision in milliseconds.

Request

Field	Type	Required	Description
agent_id	string	✓	Unique identifier for the agent making the request
action	string	✓	The action being requested (e.g. `transfer_funds`)
tool	string	✓	Tool ID being invoked (matched against allowlist)
args	object	✗	Tool arguments. Recursively scanned by guardrails.
source	string	✗	Origin: `user` \| `agent` \| `tool` \| `web`
intent	string	✗	High-level goal. Used for intent mismatch detection.
session_id	string	✗	Session context for grouping related actions.

Response

json

{
  "decision": "allow" | "deny" | "sandbox" | "require_confirmation",
  "risk_score": 0.23,
  "reason": "Action allowed. Risk within threshold.",
  "action_id": "uuid-v4",
  "guardrail_threats": [],
  "policy_violations": [],
  "metadata": {
    "policy_applied": true,
    "cache_hit": true,
    "latency_ms": 7
  }
}

Example

typescript

const res = await fetch("/api/moltwall/check", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "x-api-key": process.env.MoltWall_API_KEY,
  },
  body: JSON.stringify({
    agent_id: "agent-001",
    action:   "transfer_funds",
    tool:     "wallet",
    args:     { amount: 500, recipient: "0xabc123" },
    source:   "user",
    intent:   "Pay vendor invoice",
  }),
});

const { decision, risk_score, reason } = await res.json();

Error Responses

Status	Code	Cause
400	VALIDATION_ERROR	Request body fails Zod schema validation
401	UNAUTHORIZED	Missing or invalid x-api-key header
429	RATE_LIMITED	Exceeded rate limit for this agent_id
413	PAYLOAD_TOO_LARGE	Request body or args exceeds size limits
500	INTERNAL_ERROR	Unexpected server error (details in logs)

POST /api/MoltWall/scan

Run only the guardrail engine on arbitrary content. Useful for scanning tool outputs, web pages, or user messages before feeding them to agents.

json

// Request
{
  "content": "string or object to scan",
  "source": "web" | "tool" | "user" | "agent"
}

// Response
{
  "threats": [
    {
      "type": "prompt_injection",
      "severity": "critical",
      "detail": "Instruction override pattern detected",
      "field": "content"
    }
  ],
  "risk_boost": 0.45,
  "should_deny": true
}

⚠

Indirect sources (tool, web) trigger stricter denial logic -any high-severity threat causes immediate denial.

GET | POST | DELETE /api/policy

Retrieve, create/update, or delete the organization's security policy.

GET -Retrieve policy

bash

curl /api/policy -H "x-api-key: YOUR_KEY"
# Returns { policy: PolicyObject | null }

POST -Create or update

Field	Type	Description
allowed_tools	string[]	Whitelisted tool IDs. Empty array = all tools permitted.
blocked_actions	string[]	Actions that are always denied.
trusted_domains	string[]	Untrusted domains increase risk score.
sensitive_actions	string[]	Actions flagged for review (not blocked).
max_spend_usd	number \| null	Maximum monetary value per action.
risk_threshold_allow	number (0–1)	Below this score → allow.
risk_threshold_sandbox	number (0–1)	Below this score → require_confirmation.
risk_threshold_deny	number (0–1)	At or above this score → deny.

typescript

await fetch("/api/policy", {
  method: "POST",
  headers: { "Content-Type": "application/json", "x-api-key": KEY },
  body: JSON.stringify({
    allowed_tools: ["browser", "search", "calendar"],
    blocked_actions: ["delete_account", "export_all_data"],
    trusted_domains: ["github.com", "api.openai.com"],
    sensitive_actions: ["payment", "transfer", "send"],
    max_spend_usd: 500,
    risk_threshold_allow: 0.3,
    risk_threshold_sandbox: 0.6,
    risk_threshold_deny: 0.8,
  }),
});

GET /api/tools · POST /api/tools/register

List registered tools or register a new tool with MoltWall.

Register a tool

typescript

await fetch("/api/tools/register", {
  method: "POST",
  headers: { "Content-Type": "application/json", "x-api-key": KEY },
  body: JSON.stringify({
    tool_id:     "browser-use",
    publisher:   "Anthropic",
    description: "Headless browser automation for web research",
    permissions: ["network", "read"],
    risk_level:  "medium",   // "low" | "medium" | "high" | "critical"
  }),
});

⚠

Tool description and publisher fields are scanned for prompt injection on registration. Suspicious content is rejected.

GET /api/logs

Query the action audit log with optional filters.

bash

# Query params:
# agent_id   -filter by agent
# decision   -"allow" | "deny" | "sandbox" | "require_confirmation"
# limit      -max results (default 50, max 500)
# offset     -pagination offset

curl "/api/logs?decision=deny&limit=20" \
  -H "x-api-key: YOUR_KEY"

Policy Engine

The policy engine performs deterministic rule evaluation. It runs before the risk engine, meaning policy violations can short-circuit to immediate denial without scoring.

Evaluation functions

Function	Checks	On violation
evaluateToolAccess()	tool in allowed_tools[]	deny if allowlist non-empty and tool missing
evaluateActionPermission()	action in blocked_actions[]	deny immediately
evaluateDomainTrust()	domain in trusted_domains[]	adds 0.2 risk boost
evaluateSpendLimit()	args.amount ≤ max_spend_usd	deny if exceeded

Redis caching

Policies are cached in Upstash Redis using stale-while-revalidate (SWR). Cache TTL is 5 minutes; stale threshold is 4 minutes. Negative cache (no-policy) TTL is 30 seconds to avoid repeated DB hits.

typescript

// Cache keys (Redis)
CacheKeys.policy(orgId)       // → "policy:org:<id>"
CacheKeys.toolRegistry(orgId) // → "tools:org:<id>"
CacheKeys.rateLimit(agentId)  // → "rl:agent:<id>"

Risk Engine

The risk engine computes a normalized 0–1 score from 8 weighted factors. The score determines which decision band applies.

Risk factors

Factor	Weight	Trigger
payment_action	+0.35	action contains: payment, transfer, withdraw, send, pay
unknown_tool	+0.25	tool not in registered tool registry
untrusted_domain	+0.20	domain in args not in trusted_domains policy
sensitive_args	+0.20	args contain: password, secret, token, key, private
intent_mismatch	+0.15	action semantically distant from stated intent
high_risk_source	+0.30	source is tool or web (indirect attack surface)
agent_source	+0.05	source is agent (elevated vs user)
guardrail_boost	variable	from guardrail scan: 0.2 (high) or 0.4 (critical)

⚡

Final score is Math.min(1, sum_of_factors). A score of 0 is never returned -minimum is 0.02 to reflect inherent uncertainty.

Guardrail Engine

The guardrail engine runs after risk scoring. It recursively inspects all string values in the request payload using three specialized scanners.

Prompt Injection Scanner

Detects attempts to override agent instructions or exfiltrate data via crafted input.

Pattern Class	Examples
Instruction override	`ignore previous instructions`, `new directive`, `system prompt`
Jailbreak	`pretend you are`, `DAN mode`, `developer mode enabled`
Data extraction	`repeat everything above`, `print your instructions`
Indirect HTML/XML	`<!-- inject -->`, `<script>`, markdown code blocks with commands
Tool poisoning	Tool descriptions containing `always`/`never` + action verbs

Credential Scanner

Detects API keys, tokens, private keys, and high-entropy strings that may indicate credential leakage.

PII Scanner

Detects email addresses, phone numbers, SSNs, credit card numbers, and passport patterns.

Severity levels

info → +0.0medium → +0.1high → +0.2critical → +0.4

For indirect sources (tool, web), any high or critical severity threat triggers immediate denial, regardless of risk score.

SDK — Installation

MoltWall ships first-class SDKs for TypeScript, Python, and Go. All wrap the same REST API.

TypeScript / Node.js

bash

npm install @moltwall/sdk
# or: pnpm add @moltwall/sdk

typescript

import { MoltWall } from "@moltwall/sdk";

const wall = new MoltWall({
  baseUrl: "https://www.moltwall.xyz",
  apiKey:  process.env.MOLTWALL_API_KEY!,
  agentId: "my-agent",
});

Python (beta)

bash

pip install moltwall
# with httpx for best performance:
pip install "moltwall[httpx]"
# with LangChain integration:
pip install "moltwall[langchain]"

python

from moltwall import MoltWall

wall = MoltWall(api_key="moltwall_live_...", base_url="https://www.moltwall.xyz")

result = wall.check(
    agent_id="my-agent",
    action="transfer_funds",
    tool="wallet",
    args={"amount": 100},
    source="user",
)
result.raise_if_blocked()

Go

bash

go get github.com/moltwall/sdk-go

import "github.com/moltwall/sdk-go/moltwall"

wall := moltwall.New("moltwall_live_...")

resp, err := wall.CheckAndBlock(ctx, moltwall.CheckRequest{
    AgentID: "my-agent",
    Action:  "transfer_funds",
    Tool:    "wallet",
    Args:    map[string]any{"amount": 100},
    Source:  moltwall.SourceUser,
})
// returns *BlockedError if denied

npm@moltwall/sdk PyPImoltwall Gosdk-go

SDK -check()

Evaluate an action before execution. This is the core method.

typescript

const result = await wall.check({
  action:    "send_email",
  tool:      "gmail",
  args:      { to: "boss@company.com", subject: "Urgent" },
  source:    "user",
  intent:    "Send weekly report",
  sessionId: "session-abc",
});

switch (result.decision) {
  case "allow":
    return await executeTool(result);
  case "deny":
    throw new Error(`Blocked: ${result.reason}`);
  case "require_confirmation":
    return await promptUser(result);
  case "sandbox":
    return await executeIsolated(result);
}

SDK -setPolicy()

typescript

await wall.setPolicy({
  allowedTools:        ["browser", "search"],
  blockedActions:      ["delete_account"],
  trustedDomains:      ["api.openai.com"],
  sensitiveActions:    ["payment", "transfer"],
  maxSpendUsd:         1000,
  riskThresholdAllow:  0.3,
  riskThresholdSandbox: 0.6,
  riskThresholdDeny:   0.8,
});

SDK -registerTool()

typescript

await wall.registerTool({
  toolId:      "browser-use",
  publisher:   "Anthropic",
  description: "Headless browser for web research",
  permissions: ["network", "read"],
  riskLevel:   "medium",
});

Authentication

All API endpoints require an x-api-key header. Keys are stored SHA-256 hashed in Supabase.

typescript

// Every request:
headers: {
  "x-api-key": "MoltWall_live_your_key_here"
}

Never expose API keys in client-side code. Always use environment variables and server-side calls.

Key format

Keys follow the pattern MoltWall_live_<random-32-bytes-hex>. Generate with:

bash

node -e "console.log('MoltWall_live_' + require('crypto').randomBytes(32).toString('hex'))"

Rate Limiting

Sliding window rate limiting enforced per agent_id via Upstash Redis.

Parameter	Default	Notes
Window	60 seconds	Rolling window
Limit	100 requests	Per agent_id per window
Algorithm	Sliding window	Redis ZADD + ZREMRANGEBYSCORE
Response on limit	HTTP 429	Retry-After header included

Input Hardening

All inputs are hardened before processing. Limits are enforced at the boundary before any business logic runs.

Limit	Value	Error
Request body size	50KB	HTTP 413
Args JSON size	16KB	HTTP 413
Object nesting depth	10 levels	HTTP 400
Individual string length	4096 chars	HTTP 400
tool_id format	Alphanumeric + - _	HTTP 400
action format	Alphanumeric + - _ /	HTTP 400

Threat Model

MoltWall is designed to defend against these threat vectors:

Prompt Injection

User or tool output crafted to override agent instructions. Detected by the injection scanner on all string values in args.

Tool Poisoning

Malicious tool definitions that embed instructions. Scanned on registration and on every check request.

Credential Theft

Attempts to exfiltrate API keys, tokens, or secrets via tool arguments. High-entropy string detection.

Data Exfiltration

Indirect attacks via web content or tool outputs containing extraction instructions. Stricter rules for source=web/tool.

Wallet Drain / Overspend

Unbounded monetary actions. Enforced via max_spend_usd policy field and payment-action risk weighting.

DoS via Payload Size

Oversized arguments causing ReDoS or processing stalls. Enforced via body size, depth, and string length limits.

Environment Variables

Variable	Required	Description
NEXT_PUBLIC_SUPABASE_URL	✓	Your Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEY	✓	Supabase anon/public key
SUPABASE_SERVICE_ROLE_KEY	✓	Service role key (server-side only)
UPSTASH_REDIS_REST_URL	✓	Upstash Redis REST endpoint
UPSTASH_REDIS_REST_TOKEN	✓	Upstash Redis REST token
MoltWall_MASTER_KEY	✗	Admin API key for dashboard access

Never commit SUPABASE_SERVICE_ROLE_KEY or UPSTASH_REDIS_REST_TOKEN to version control. Add them to .gitignore and set them in your deployment environment.

Deploy to Vercel

MoltWall is designed to deploy on Vercel with zero configuration changes.

bash

# 1. Push to GitHub
git push origin main

# 2. Import project at vercel.com/new
# 3. Add environment variables in Vercel dashboard
# 4. Deploy -done.

⚡

All API routes use the nodejs runtime. Edge runtime is not supported due to the Supabase client. Each route is independently deployed as a serverless function.

Supabase Setup

MoltWall requires five tables in Supabase Postgres.

Table	Purpose
organizations	Multi-tenant org records
api_keys	SHA-256 hashed API keys with org_id FK
policies	One policy per org: thresholds, allowlists, blocklists
tools	Registered tool definitions with risk level
actions	Full audit log: every check decision

bash

# Run in Supabase SQL editor:
# File: supabase/migrations/001_initial.sql

Upstash Redis

Redis is used for three things: policy caching, rate limiting, and agent session context.

Key Pattern	TTL	Purpose
policy:org:<id>	5 min	Cached policy object (SWR)
tools:org:<id>	5 min	Cached tool registry
rl:agent:<id>	60 sec	Rate limit sliding window (sorted set)
session:<id>	30 min	Agent session context

⚡

Create a free Upstash Redis database at upstash.com. Use the REST API endpoint and token -no connection pooling required.