Compile, don’t append
A session becomes a governed SKILL.md via heuristic trigger, user diff, and AST validation — not a silent blob append.
The memory control plane for AI agents
Stop paying models to reread their own past.
Most agents treat the context window like a hard drive. Costs climb, stale instructions leak forward, and nobody can explain which memory shaped the answer. Skill Memory Bank compiles conversations into governed, spec-valid SKILL.md packages that load in any skills-compatible agent—Claude Code, Cursor, Copilot, Codex, Gemini CLI—then loads only the evidence and procedures the next task needs.
A session becomes a governed SKILL.md via heuristic trigger, user diff, and AST validation — not a silent blob append.
100-token abstracts at query time; full body only on activation. Token cost scales with matched skills, not total memory count.
Scope, correct, decay, merge, and archive every memory. Git-tracked. Snyk ToxicSkills-hardened. Portable across agents.
Separate durable memory from the volatile context window.
Conversations become evidence-backed episodic, semantic, and procedural skills—not another transcript dump.
conversation → distillation → skill artifactAbstracts, procedures, evidence, graph edges, scope, and utility live together as portable resources.
artifact → index → relationshipsThe agent scans tiny abstracts, pulses the graph, applies scope gates, and assembles a minimal retrieval bundle.
scan → pulse → gate → injectHuman correction, audit history, utility, decay, merge, archive, and restore operate across every layer.
assess → correct → evolveObserve compile index retrieve scope-gate inject assess
Foundation models will keep changing. The durable enterprise asset is the governed layer that remembers what worked, why it worked, who may use it, and when it should stop being trusted.
Start where every agent team already hurts: rising prompt payloads, latency, and stale history leaking into new work.
Move from one agent remembering one user to teams sharing approved procedures, constraints, and exceptions.
Every accepted, denied, corrected, merged, or archived retrieval strengthens a governed memory asset competitors cannot copy from a model API.
Skill Memory Bank sits between models and memory stores—turning raw experience into scoped, inspectable, reusable agent behavior.
This deterministic local heuristic extracts entities, procedures, semantic preferences, causal hints, and a Level 5 abstract. No API, model, or server receives the text.
Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops.
Always inspect repo-local instructions before editing. · A green build is necessary but visual verification closes the loop.
Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops.
agent workflow, verification, repo guidance
Inspect repository guidance and conventions. · Write a concise implementation plan.
Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops.
skill://ContextJamming/codex-workflow/abstractSeeded fictional skills and browser-local additions share one searchable lakebed. Select a card to inspect its MCP resource and governance controls.
A graph pulse retrieves a path, not a pile of chunks. Entity anchors activate candidate skills, graph edges explain traversal, and scope gates run before any memory enters context.
Pulse an @mention to reveal the scored, scoped traversal bundle.
Memory must be scoped before it is smart. Graph pulses are intersected with user_id, project_id, and sensitivity constraints before context is assembled. The agent may discover that relevant memory exists without exposing the underlying content when scope is denied.
Why not just long context?
Keep discovery lightweight and execution explicit. Resources expose inspectable memory; tools perform lifecycle operations. Agent Skills are the procedural/memory layer—file-based, zero-latency, progressive disclosure; MCP is the connectivity layer that reaches live systems. SMB-exported SKILL.mdpackages compose directly with enterprise memory stacks like Red Hat's OpenViking (viking:// filesystem, OpenShift AI): SMB is where teams compile and govern skills; OpenViking is where they execute at scale.
resources/listskill://{project}/{domain}/abstractskill://{project}/{domain}/tmt/L5skill://{project}/{domain}/proceduredistill_conversation_to_skillpulse_entity_networkassess_skill_utilitymerge_redundant_skillsarchive_low_utility_skill{
"uri": "skill://ContextJamming/codex-workflow/abstract",
"mimeType": "text/markdown",
"tokens": 26,
"scope": "demo-user:ContextJamming",
"utilityScore": 0.91
}The bank compiles each conversation into a governed, spec-valid SKILL.md package that loads in any skills-compatible agent—Claude Code, Cursor, Copilot, Codex, or Gemini CLI. Agent Skills are the procedural/memory layer (file-based, zero-latency, progressive disclosure); MCP is the connectivity layer.
Validated against agentskills.io frontmatter rules, then ACRA-PROOF sanitized on export: restricted skills excluded, secrets redacted, provenance stamped. The 2026 Snyk ToxicSkills audit found 36% of 3,984 public agent skills contained critical flaws combining code exploits with prompt injections hidden in markdown prose (OWASP AST01). ACRA-PROOF gates every skill behind a user-approved diff and an AST-based security scan before it enters the local memory pool.
Agent Development Life Cycle: compile → validate → ship--- name: codex-workflow-hardening description: Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops. Use when reviewing or accepting code from an agent, setting up a validation loop, or planning an implementation before editing a repo. license: Apache-2.0 metadata: version: 1.0.0 author: ACRA Insight project: ContextJamming domain: codex-workflow memory-type: procedural sensitivity: public-demo utility-score: "0.91" source: seed generated-by: ContextJamming/SkillMemoryBank --- # Codex Workflow Hardening ## Overview Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops. ## When to use this skill - reviewing or accepting code from an agent - setting up a validation loop - planning an implementation before editing a repo ## Procedure 1. Inspect repository guidance and conventions. 2. Write a concise implementation plan. 3. Implement within the narrowest safe scope. 4. Validate the rendered outcome, not only the code. ## Gotchas - A green build is not proof — verify the rendered outcome in a browser. - Do not widen scope mid-task; keep unrelated worktree changes out. ## Output template ```markdown ## Implementation plan 1. Scope — narrowest safe change 2. Files touched 3. Verification: build + rendered check ## Verification - [ ] tsc / build green - [ ] rendered outcome inspected ``` ## Key entities - @CodexWorkflow - AGENTS.md - PlanMode - ValidationLoop ## Evidence - Always inspect repo-local instructions before editing. - A green build is necessary but visual verification closes the loop. ## Causal links - Repo inspection prevents convention drift. - Rendered verification catches failures compilation cannot. ## References - [`references/schema.json`](references/schema.json) — serialization schema for this memory skill.
Current AI memory systems append every conversation automatically and indiscriminately — no structure, no review, no quality signal. The /memory bank mechanism inverts this: distillation is deliberate, structured, and gated. The deliberateness is the governance.
Conversation length alone is a poor proxy for knowledge value. The trigger scores a tripartite composite — fire when any threshold trips:
SKILL.md in an isolated memory buffer using a strict distillation prompt: extract only durable, reusable procedures and constraints; generate a 100-token description optimized for keyword routing; separate executable procedural logic from raw contextual examples.skills-validator runs an AST-based 5-pass pipeline (YAML parse → structure → content → reference chain → Semgrep security scan) to confirm spec compliance and screen for prompt injections embedded in the markdown prose..agents/skills/<name>/ and committed via git commit with an auto-generated semantic message. Every memory alteration has a cryptographic commit hash — a legally compliant audit trail from day one.Hybrid by design: local models for volume and privacy, frontier models for judgment, deterministic code where no model is needed. Model selection is required across every lane—and ships in each exported repo as MODEL_SELECTION.md.
| Stage | Model | Tier | Why | Cost | Latency |
|---|---|---|---|---|---|
| Distillation (conversation → skill) | Claude (Opus) | frontier | Quality-critical, low volume: turning messy transcripts into typed, scoped skills is the one step where judgment pays for itself. | High per call, low total | Seconds, off the hot path |
| Compression / stale-memory summary | Granite 8B (quantized GGUF, local) | local | High volume, latency- and cost-sensitive, privacy-preserving — data never leaves the device. | ≈$0 marginal | Local, sub-second |
| Retrieval scoring / graph pulse | Deterministic local code | deterministic | Scoring and traversal are pure functions; no model needed, fully inspectable and reproducible. | Zero | Microseconds |
| Eval judge (subjective coherence) | Claude (Opus), blind comparison | frontier | Scoring with/without-skill coherence needs a strong, impartial judge run blind to the condition to avoid bias. | Moderate, eval-time only | Seconds, offline |
One discipline per phase, from scope to iterate. Evaluate and observe are concrete: the with/without eval harness and the governance log. Exported with each repo as ADLC.md.
Target the universal agent pain — context bloat and ungoverned memory. Define a browser-local skill compiler that emits spec-valid agentskills.io packages with no backend.
Four planes: a typed memory compiler, an inspectable skill lakebed, an abstract-first router with graph pulse, and a governance plane. Tiny Level-5 abstracts index full skills for progressive disclosure.
Pure, deterministic TypeScript compiler/validator/exporter; React only orchestrates and renders. JSZip + yaml, no new dependencies, no network calls.
Every exported skill ships evals/evals.json with observable assertions and evals/run.py, a with/without-skill harness that scores pass-rate mechanically — so the skill must beat a no-skill baseline to earn full marks.
Static export on Cloudflare Workers, auto-deployed on push to main. Skills download as runnable repos a judge can unzip and git-push immediately.
The governance log is the observe loop: every accepted, ignored, corrected, archived, or restored retrieval is recorded with a utility delta, and run.py emits benchmark.json carrying the with-vs-without pass-rate delta.
Human corrections are authoritative and feed back as utility priors; low-utility skills decay, merge, or archive; descriptions and triggers are refined from eval misses.
Feedback changes what the system tries first. It does not rewrite reality. Human correction remains authoritative, and every memory needs a visible exit.
No assessments recorded for this skill yet.
No governance actions across the lakebed yet.
Illustrative, conservative arithmetic—not a benchmark. The model compares replaying every prior transcript with scanning abstracts and loading a few complete skills.
Instead of replaying 12 full sessions, the agent scans 18 compact indexes and recursively loads 3 complete skills. Real savings depend on transcript, tokenizer, and retrieval behavior.
A useful memory substrate preserves different kinds of knowledge without pretending they are interchangeable.
Events, outcomes, exceptions, and sequence.
Preferences, facts, principles, and constraints.
Repeatable steps, checks, and decision rules.
Utility, correction, decay, merge, and archive.
| Dimension | Full-context / Flat inject | Vector RAG (Mem0 / Zep) | Skill Memory Bank |
|---|---|---|---|
| Token cost per turn | O(N) — all memories injected every message regardless of relevance | O(N) index scan + retrieved chunks; grows with corpus | O(k) — 100-token abstracts only; body fetched on activation |
| Routing mechanism | None — entire memory block always present (ChatGPT: Redis KNN; Claude: filesystem dump) | Semantic similarity search; misses procedural and temporal nuance | Description-driven trigger matching; 3-stage EvoAgent-style escalation |
| Procedural structuring | Absent — prose blobs or narrative summaries; no step-level constraints | Chunked prose; no executable procedure separation | Typed SKILL.md: procedure, gotchas, output templates, causal links |
| Version control / audit trail | Opaque KV store or filesystem; no diff, no rollback, no lineage | Vector embeddings; not human-readable, not diffable | Git-backed Markdown; every change has a commit hash and timestamp |
| Sensitivity / scope gating | None — all memories visible in all contexts regardless of topic | Filter-dependent; metadata optional; no hard scoping before retrieval | user_id + project_id + sensitivity gate before context assembly |
| Supply-chain security | N/A — proprietary, closed; trust the vendor | 36% of public skills contain critical flaws (Snyk ToxicSkills 2026) | ACRA-PROOF: user-approved diff + AST validation + Semgrep scan before commit |
| Quality signal | Zero — no way to know if a memory helps or hurts responses | Retrieval metrics only; no downstream task improvement measurement | evals/evals.json: with-skill vs. no-skill pass-rate delta required |
| Portability | Vendor-siloed; ChatGPT memories cannot move to Claude or Cursor | Embedding-format-dependent; not cross-platform | agentskills.io open spec; loads in Claude Code, Cursor, Copilot, Gemini CLI |
A production-intent artifact needs an adversarial review loop. This portable brief asks another agent to inspect the live behavior, infer the implementation, prioritize risks, and recommend the next high-leverage iteration.
You are reviewing a production-intent POC for ContextJamming called **Skill Memory Bank**. **Live page:** https://www.contextjamming.com/SkillMemoryBank **Project context (for you):** - This is a browser-local simulation (no server, no external APIs) that turns ephemeral agent conversations into scoped, inspectable, reusable procedural memory "skills". - Core architecture: a typed memory compiler, inspectable skill lakebed, abstract-first context router with @mention graph pulses, and a governance plane that enforces strict scoping before intelligence. - It must feel like a real memory substrate, not just marketing copy. - Visual/typographic language should stay consistent with Context Jamming (editorial, high-signal, Fraunces + IBM Plex Mono influence, clean cards, clear hierarchy). - This POC is meant to be both a compelling demo for investors/partners and a working primitive we can evolve toward real agent use (MCP resources/tools, future sovereign stacks). **Your task — thorough code + UX review:** 1. **Analyze the live page thoroughly** (all sections, all interactive elements, buttons, cards, the distillation example, graph pulse area, safety boundary table, governance feedback, import/export/clear). 2. **Infer or request the implementation details** you need (HTML structure, Tailwind usage, vanilla JS / state model, localStorage schema, distillation heuristic logic, graph representation, how pulsing and scoping actually work in code, any seeded data vs dynamic creation). 3. **Deliver a structured review** with the following sections: **A. Executive Summary** (2-3 sentences on overall quality, fidelity to the 4-plane architecture and runtime loop, and production readiness) **B. Strengths** (What is already working well — architecture, UX moments, conceptual clarity, code patterns worth keeping) **C. Issues & Risks** (categorized + prioritized) - P0 (must fix for this to be a credible demo) - P1 (important for maintainability / extensibility) - P2 (nice-to-have polish) Categories to cover: - Conceptual fidelity (does the implementation actually demonstrate the memory compiler + skill lakebed + context router + governance plane, or is it mostly static explanation?) - Distillation heuristic quality & transparency - Graph pulse / retrieval implementation - Data model & localStorage design (schema, versioning, migration path) - State management & reactivity - Code organization & maintainability (especially if still monolithic single-file) - UX / interactivity gaps (empty states, feedback, real skill creation flow, verification loops) - Accessibility & keyboard support - Mobile / responsive behavior - Error handling, edge cases, and "what if" scenarios - Performance / DOM bloat risks - Security / scoping simulation robustness **D. Specific Recommendations** For each major issue, give concrete suggestions (and code snippets where helpful). Prioritize changes that increase the "this feels like real governed memory" perception. **E. Quick Wins** A short list of high-impact, low-effort improvements that would make the POC feel significantly more alive. **F. Extensibility Notes** How easy/hard would it be to: - Add real user-created skills from pasted conversation - Evolve the graph into a proper traversable structure - Wire this to a real MCP server later - Reuse components/patterns in future ContextJamming POCs **G. Final Verdict + Recommended Next Step** (One paragraph + a clear suggested scope for the next iteration) **Tone & Approach:** - Be direct but constructive. You are helping ship a high-signal artifact. - Reference the "Codex Workflow Hardening" skill principles where relevant (plan-first, repo-local guidance, verification loops, visual + build verification). - Assume we will iterate quickly — focus on leverage, not perfectionism. Begin your review now. If you need the full HTML source or specific sections of the JS, tell me exactly what to paste.
This POC is a conversation artifact: a way to show how personal and organizational memory can become inspectable, portable, governed agent skill infrastructure.
§ · Invoice No. 001 · The Build Ledger
Filed · contextjamming.com
What a conservative mid-market digital agency would have quoted for the same scope, itemized against what this site actually cost. Agency numbers are the floor — not the premium brand-studio tier.
TIME
12 weeks
2 days
~42× faster
COST
~$150,000
~$300
~500× cheaper
TEAM
5-person agency
1 human + 3 models
Same deliverable
§ Itemized — what a mid-market agency SOW would have billed
Agency figure assumes ~700 billable hours at $200/hr blended, plus ~18% PM overhead — the conservative floor of a mid-market SOW. Premium brand studios would have quoted 2–3× that. Stack: Antigravity (orchestrator), Claude Opus 4.8 (auditor), Codex (adversary), Cloudflare Workers / OpenNext.
§ Colophon
Vol. 26 · build log
Every page on contextjamming.com is the output of a real-time, three-body Mixture-of-Experts loop. One model orchestrates. Two consult. The human holds the thesis. No single model commits alone.
View Redesign Assessment →Orchestrator
Google DeepMind
Auditor
1M context
Adversary
Cross-model MoE
Stack
Typeset in
Infrastructure
human intent
│
▼
┌────────────────────┐ ┌─────────────────┐
│ Antigravity │ ◄────► │ Claude Opus 4.8 │ ← auditor loop
│ (orchestrator) │ │ (auditor) │
└─────────┬──────────┘ └─────────────────┘
│ ◄───────────┐
▼ │
┌──────────┐ ┌────┴───────┐
│Cloudflare│ │ Codex │ ← adversarial loop
│ Workers │ │ │
└─────┬────┘ └────────────┘
│
▼
contextjamming.com
│
▼
┌──────────────┐
│ Git push │ ← audit trail
└──────────────┘