ContextJamming / systems POC 001POC simulation — local browser storage only

The memory control plane for AI agents

Skill Memory Bank

Stop paying models to reread their own past.

Most agents treat the context window like a hard drive. Costs climb, stale instructions leak forward, and nobody can explain which memory shaped the answer. Skill Memory Bank compiles conversations into governed, spec-valid SKILL.md packages that load in any skills-compatible agent—Claude Code, Cursor, Copilot, Codex, Gemini CLI—then loads only the evidence and procedures the next task needs.

Compile, don’t append

A session becomes a governed SKILL.md via heuristic trigger, user diff, and AST validation — not a silent blob append.

O(k) not O(N)

100-token abstracts at query time; full body only on activation. Token cost scales with matched skills, not total memory count.

Govern, don’t hope

Scope, correct, decay, merge, and archive every memory. Git-tracked. Snyk ToxicSkills-hardened. Portable across agents.

System architecture / 04 planes

Separate durable memory from the volatile context window.

01Memory compiler

Turn experience into a typed asset.

Conversations become evidence-backed episodic, semantic, and procedural skills—not another transcript dump.

conversation → distillation → skill artifact

02Skill lakebed

Store memory as inspectable infrastructure.

Abstracts, procedures, evidence, graph edges, scope, and utility live together as portable resources.

artifact → index → relationships

03Context router

Load the path, not the past.

The agent scans tiny abstracts, pulses the graph, applies scope gates, and assembles a minimal retrieval bundle.

scan → pulse → gate → inject

04Governance plane

Make memory corrigible.

Human correction, audit history, utility, decay, merge, archive, and restore operate across every layer.

assess → correct → evolve

Runtime loop

Observe compile index retrieve scope-gate inject assess

Category thesis / why this compounds

The model is replaceable. The memory substrate compounds.

Foundation models will keep changing. The durable enterprise asset is the governed layer that remembers what worked, why it worked, who may use it, and when it should stop being trusted.

01 / Land

Cut repeated context.

Start where every agent team already hurts: rising prompt payloads, latency, and stale history leaking into new work.

02 / Expand

Make behavior reproducible.

Move from one agent remembering one user to teams sharing approved procedures, constraints, and exceptions.

03 / Defend

Own the correction graph.

Every accepted, denied, corrected, merged, or archived retrieval strengthens a governed memory asset competitors cannot copy from a model API.

Skill Memory Bank sits between models and memory stores—turning raw experience into scoped, inspectable, reusable agent behavior.

01 / Distillation

Conversation → skill artifact

This deterministic local heuristic extracts entities, procedures, semantic preferences, causal hints, and a Level 5 abstract. No API, model, or server receives the text.

raw_conversation.txt142 est. tokens

Conversation transcript

Project scopeMemory emphasisSensitivity

selected_skill.artifactactive

procedural memory

Codex Workflow Hardening

91utility

Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops.

@CodexWorkflowAGENTS.mdPlanModeValidationLoop

Observation

Always inspect repo-local instructions before editing. · A green build is necessary but visual verification closes the loop.

Outcome

Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops.

Pattern

agent workflow, verification, repo guidance

Protocol

Inspect repository guidance and conventions. · Write a concise implementation plan.

Abstract

Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops.

skill://ContextJamming/codex-workflow/abstract

distillation_tracedeterministic local heuristic — not model truth

entities 4evidence 2steps 4causal 2L5 length ok

memory type: procedural
extracted entities: @CodexWorkflow, AGENTS.md, PlanMode, ValidationLoop
candidate steps: Inspect repository guidance and conventions.
Write a concise implementation plan.
Implement within the narrowest safe scope.
Validate the rendered outcome, not only the code.
evidence snippets: Always inspect repo-local instructions before editing. · A green build is necessary but visual verification closes the loop.
causal hints: Repo inspection prevents convention drift. · Rendered verification catches failures compilation cannot.
generated L5: Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops.

02 / Skill Lakebed

Browse the memory substrate

Seeded fictional skills and browser-local additions share one searchable lakebed. Select a card to inspect its MCP resource and governance controls.

Search skillsFilter by projectFilter by memory typeShow archived

7 visible skills0 local additions

03 / Retrieval

Pulse the graph

A graph pulse retrieves a path, not a pile of chunks. Entity anchors activate candidate skills, graph edges explain traversal, and scope gates run before any memory enters context.

Graph prompt

Active user scopeActive project

Try a pulse

entitysemantictemporalcausal

retrieval_bundle.json

awaiting pulse

Pulse an @mention to reveal the scored, scoped traversal bundle.

04 / Safety boundary

Scoped memory by default

Memory must be scoped before it is smart. Graph pulses are intersected with user_id, project_id, and sensitivity constraints before context is assembled. The agent may discover that relevant memory exists without exposing the underlying content when scope is denied.

Client-side simulation only. This POC demonstrates the control model; it is not an authorization system.

Candidate memoryUserProjectResult

Codex Workflow Hardeningdemo-userContextJammingretrievable

FounderFile Research Pipelinedemo-userFounderFilemetadata only

Client A Migration Exceptionclient-aClientWorkmetadata only

Why not just long context?

Full-context injectVector RAGSkill Memory Bank

Token cost per turnO(N) — all memories injected every message regardless of relevanceO(N) index scan + retrieved chunks; grows with corpusO(k) — 100-token abstracts only; body fetched on activation

Sensitivity / scope gatingNone — all memories visible in all contexts regardless of topicFilter-dependent; metadata optional; no hard scoping before retrievaluser_id + project_id + sensitivity gate before context assembly

Supply-chain securityN/A — proprietary, closed; trust the vendor36% of public skills contain critical flaws (Snyk ToxicSkills 2026)ACRA-PROOF: user-approved diff + AST validation + Semgrep scan before commit

Condensed view — full comparison in section 13.

05 / Interface

Memory as MCP resources + tools

Keep discovery lightweight and execution explicit. Resources expose inspectable memory; tools perform lifecycle operations. Agent Skills are the procedural/memory layer—file-based, zero-latency, progressive disclosure; MCP is the connectivity layer that reaches live systems. SMB-exported SKILL.mdpackages compose directly with enterprise memory stacks like Red Hat's OpenViking (viking:// filesystem, OpenShift AI): SMB is where teams compile and govern skills; OpenViking is where they execute at scale.

Resources

resources/listskill://{project}/{domain}/abstractskill://{project}/{domain}/tmt/L5skill://{project}/{domain}/procedure

Tools

distill_conversation_to_skillpulse_entity_networkassess_skill_utilitymerge_redundant_skillsarchive_low_utility_skill

mock resource response

{
  "uri": "skill://ContextJamming/codex-workflow/abstract",
  "mimeType": "text/markdown",
  "tokens": 26,
  "scope": "demo-user:ContextJamming",
  "utilityScore": 0.91
}

06 / Ship it

Export as Agent Skill

The bank compiles each conversation into a governed, spec-valid SKILL.md package that loads in any skills-compatible agent—Claude Code, Cursor, Copilot, Codex, or Gemini CLI. Agent Skills are the procedural/memory layer (file-based, zero-latency, progressive disclosure); MCP is the connectivity layer.

Validated against agentskills.io frontmatter rules, then ACRA-PROOF sanitized on export: restricted skills excluded, secrets redacted, provenance stamped. The 2026 Snyk ToxicSkills audit found 36% of 3,984 public agent skills contained critical flaws combining code exploits with prompt injections hidden in markdown prose (OWASP AST01). ACRA-PROOF gates every skill behind a user-approved diff and an AST-based security scan before it enters the local memory pool.

Agent Development Life Cycle: compile → validate → ship

codex-workflow-hardening/SKILL.md

---
name: codex-workflow-hardening
description: Harden agent-generated code with plan-first implementation,
repo-local guidance, and verification loops. Use when reviewing or accepting
code from an agent, setting up a validation loop, or planning an
implementation before editing a repo.
license: Apache-2.0
metadata:
version: 1.0.0
author: ACRA Insight
project: ContextJamming
domain: codex-workflow
memory-type: procedural
sensitivity: public-demo
utility-score: "0.91"
source: seed
generated-by: ContextJamming/SkillMemoryBank
---

# Codex Workflow Hardening

## Overview

Harden agent-generated code with plan-first implementation, repo-local guidance, and verification loops.

## When to use this skill

- reviewing or accepting code from an agent
- setting up a validation loop
- planning an implementation before editing a repo

## Procedure

1. Inspect repository guidance and conventions.
2. Write a concise implementation plan.
3. Implement within the narrowest safe scope.
4. Validate the rendered outcome, not only the code.

## Gotchas

- A green build is not proof — verify the rendered outcome in a browser.
- Do not widen scope mid-task; keep unrelated worktree changes out.

## Output template

```markdown
## Implementation plan
1. Scope — narrowest safe change
2. Files touched
3. Verification: build + rendered check

## Verification
- [ ] tsc / build green
- [ ] rendered outcome inspected
```

## Key entities

- @CodexWorkflow
- AGENTS.md
- PlanMode
- ValidationLoop

## Evidence

- Always inspect repo-local instructions before editing.
- A green build is necessary but visual verification closes the loop.

## Causal links

- Repo inspection prevents convention drift.
- Rendered verification catches failures compilation cannot.

## References

- [`references/schema.json`](references/schema.json) — serialization schema for this memory skill.

Valid · portable across skills-compatible agentsMirrors skills-ref validate — agentskills.io frontmatter rules + progressive-disclosure budgets.

metadata tokens: 66 / ~100 target
body tokens: 474 / 5000 budget
body lines: 77 / 500 budget

Progressive disclosure: only the ~66-token name + description load until the skill is invoked; the 474-token body loads on demand.

package tree

codex-workflow-hardening/
├── SKILL.md
├── references/
│   └── schema.json
├── evals.json
└── scripts/
    └── .gitkeep

07 / Memory bank

How a session becomes a governed skill

Current AI memory systems append every conversation automatically and indiscriminately — no structure, no review, no quality signal. The /memory bank mechanism inverts this: distillation is deliberate, structured, and gated. The deliberateness is the governance.

Auto-trigger heuristic

Conversation length alone is a poor proxy for knowledge value. The trigger scores a tripartite composite — fire when any threshold trips:

Repetition density— user mentions the same entity, constraint, or preference 3+ times in a session. Mirrors Perplexity's proven retention signal: repetition is the algorithm's proxy for user intent.
Correction loops— user explicitly corrects the agent's output. Corrective turns are the highest-fidelity indicator of a durable procedural rule being established.
Token compaction threshold — session approaches the context clearing limit. Distill before the system truncates; preserve operational state before it is lost.

4-step commit pipeline

Draft. LLM compiles the session into a SKILL.md in an isolated memory buffer using a strict distillation prompt: extract only durable, reusable procedures and constraints; generate a 100-token description optimized for keyword routing; separate executable procedural logic from raw contextual examples.
Diff. A visual diff is presented to the user before anything is written. Human-in-the-loop review is the governance gate — the user sees exactly what the agent wants to remember and can edit or reject it.
Validate. skills-validator runs an AST-based 5-pass pipeline (YAML parse → structure → content → reference chain → Semgrep security scan) to confirm spec compliance and screen for prompt injections embedded in the markdown prose.
Commit. The skill is written to .agents/skills/<name>/ and committed via git commit with an auto-generated semantic message. Every memory alteration has a cryptographic commit hash — a legally compliant audit trail from day one.

Current systemsAutomatic, indiscriminate, opaque KV append

SMB /memory bankHeuristic-triggered, user-reviewed, git-tracked, security-scanned

08 / Model routing

The right model for the right job

Hybrid by design: local models for volume and privacy, frontier models for judgment, deterministic code where no model is needed. Model selection is required across every lane—and ships in each exported repo as MODEL_SELECTION.md.

Stage	Model	Tier	Why	Cost	Latency
Distillation (conversation → skill)	Claude (Opus)	frontier	Quality-critical, low volume: turning messy transcripts into typed, scoped skills is the one step where judgment pays for itself.	High per call, low total	Seconds, off the hot path
Compression / stale-memory summary	Granite 8B (quantized GGUF, local)	local	High volume, latency- and cost-sensitive, privacy-preserving — data never leaves the device.	≈$0 marginal	Local, sub-second
Retrieval scoring / graph pulse	Deterministic local code	deterministic	Scoring and traversal are pure functions; no model needed, fully inspectable and reproducible.	Zero	Microseconds
Eval judge (subjective coherence)	Claude (Opus), blind comparison	frontier	Scoring with/without-skill coherence needs a strong, impartial judge run blind to the condition to avoid bias.	Moderate, eval-time only	Seconds, offline

09 / ADLC worksheet

Built against the Agent Development Life Cycle

One discipline per phase, from scope to iterate. Evaluate and observe are concrete: the with/without eval harness and the governance log. Exported with each repo as ADLC.md.

01Scope
Target the universal agent pain — context bloat and ungoverned memory. Define a browser-local skill compiler that emits spec-valid agentskills.io packages with no backend.
02Design
Four planes: a typed memory compiler, an inspectable skill lakebed, an abstract-first router with graph pulse, and a governance plane. Tiny Level-5 abstracts index full skills for progressive disclosure.
03Build
Pure, deterministic TypeScript compiler/validator/exporter; React only orchestrates and renders. JSZip + yaml, no new dependencies, no network calls.
04Evaluate
Every exported skill ships evals/evals.json with observable assertions and evals/run.py, a with/without-skill harness that scores pass-rate mechanically — so the skill must beat a no-skill baseline to earn full marks.
05Deploy
Static export on Cloudflare Workers, auto-deployed on push to main. Skills download as runnable repos a judge can unzip and git-push immediately.
06Observe
The governance log is the observe loop: every accepted, ignored, corrected, archived, or restored retrieval is recorded with a utility delta, and run.py emits benchmark.json carrying the with-vs-without pass-rate delta.
07Iterate
Human corrections are authoritative and feed back as utility priors; low-utility skills decay, merge, or archive; descriptions and triggers are refined from eval misses.

10 / Read—Write—Assess—Govern

Utility is a retrieval prior, not truth.

Feedback changes what the system tries first. It does not rewrite reality. Human correction remains authoritative, and every memory needs a visible exit.

Read

Write

Assess

Govern

Selected skill

Codex Workflow Hardening

0.91active

Human-authoritative correction

Recent governance log

No assessments recorded for this skill yet.

No governance actions across the lakebed yet.

11 / Token economics

Scan small. Load selectively.

Illustrative, conservative arithmetic—not a benchmark. The model compares replaying every prior transcript with scanning abstracts and loading a few complete skills.

Raw transcript tokensNumber of prior sessionsAbstracts scannedTokens per abstractFull skills loadedTokens per full skill

Full-context estimate312,000tokens

Skill bank estimate6,120tokens

Illustrative reduction98%

Instead of replaying 12 full sessions, the agent scans 18 compact indexes and recursively loads 3 complete skills. Real savings depend on transcript, tokenizer, and retrieval behavior.

12 / Research translation

Four forms of durable memory

A useful memory substrate preserves different kinds of knowledge without pretending they are interchangeable.

Episodic → what happened

Events, outcomes, exceptions, and sequence.

Semantic → what remains true

Preferences, facts, principles, and constraints.

Procedural → how to do it again

Repeatable steps, checks, and decision rules.

Governance → what deserves to persist

Utility, correction, decay, merge, and archive.

Why not just long context?

Dimension	Full-context / Flat inject	Vector RAG (Mem0 / Zep)	Skill Memory Bank
Token cost per turn	O(N) — all memories injected every message regardless of relevance	O(N) index scan + retrieved chunks; grows with corpus	O(k) — 100-token abstracts only; body fetched on activation
Routing mechanism	None — entire memory block always present (ChatGPT: Redis KNN; Claude: filesystem dump)	Semantic similarity search; misses procedural and temporal nuance	Description-driven trigger matching; 3-stage EvoAgent-style escalation
Procedural structuring	Absent — prose blobs or narrative summaries; no step-level constraints	Chunked prose; no executable procedure separation	Typed SKILL.md: procedure, gotchas, output templates, causal links
Version control / audit trail	Opaque KV store or filesystem; no diff, no rollback, no lineage	Vector embeddings; not human-readable, not diffable	Git-backed Markdown; every change has a commit hash and timestamp
Sensitivity / scope gating	None — all memories visible in all contexts regardless of topic	Filter-dependent; metadata optional; no hard scoping before retrieval	user_id + project_id + sensitivity gate before context assembly
Supply-chain security	N/A — proprietary, closed; trust the vendor	36% of public skills contain critical flaws (Snyk ToxicSkills 2026)	ACRA-PROOF: user-approved diff + AST validation + Semgrep scan before commit
Quality signal	Zero — no way to know if a memory helps or hurts responses	Retrieval metrics only; no downstream task improvement measurement	evals/evals.json: with-skill vs. no-skill pass-rate delta required
Portability	Vendor-siloed; ChatGPT memories cannot move to Claude or Cursor	Embedding-format-dependent; not cross-platform	agentskills.io open spec; loads in Claude Code, Cursor, Copilot, Gemini CLI

13 / Verification loop

Put the POC under pressure

A production-intent artifact needs an adversarial review loop. This portable brief asks another agent to inspect the live behavior, infer the implementation, prioritize risks, and recommend the next high-leverage iteration.

skill_memory_bank_production_review.md

You are reviewing a production-intent POC for ContextJamming called **Skill Memory Bank**.

**Live page:** https://www.contextjamming.com/SkillMemoryBank

**Project context (for you):**
- This is a browser-local simulation (no server, no external APIs) that turns ephemeral agent conversations into scoped, inspectable, reusable procedural memory "skills".
- Core architecture: a typed memory compiler, inspectable skill lakebed, abstract-first context router with @mention graph pulses, and a governance plane that enforces strict scoping before intelligence.
- It must feel like a real memory substrate, not just marketing copy.
- Visual/typographic language should stay consistent with Context Jamming (editorial, high-signal, Fraunces + IBM Plex Mono influence, clean cards, clear hierarchy).
- This POC is meant to be both a compelling demo for investors/partners and a working primitive we can evolve toward real agent use (MCP resources/tools, future sovereign stacks).

**Your task — thorough code + UX review:**

1. **Analyze the live page thoroughly** (all sections, all interactive elements, buttons, cards, the distillation example, graph pulse area, safety boundary table, governance feedback, import/export/clear).
2. **Infer or request the implementation details** you need (HTML structure, Tailwind usage, vanilla JS / state model, localStorage schema, distillation heuristic logic, graph representation, how pulsing and scoping actually work in code, any seeded data vs dynamic creation).
3. **Deliver a structured review** with the following sections:

**A. Executive Summary**
(2-3 sentences on overall quality, fidelity to the 4-plane architecture and runtime loop, and production readiness)

**B. Strengths**
(What is already working well — architecture, UX moments, conceptual clarity, code patterns worth keeping)

**C. Issues & Risks** (categorized + prioritized)
- P0 (must fix for this to be a credible demo)
- P1 (important for maintainability / extensibility)
- P2 (nice-to-have polish)

Categories to cover:
- Conceptual fidelity (does the implementation actually demonstrate the memory compiler + skill lakebed + context router + governance plane, or is it mostly static explanation?)
- Distillation heuristic quality & transparency
- Graph pulse / retrieval implementation
- Data model & localStorage design (schema, versioning, migration path)
- State management & reactivity
- Code organization & maintainability (especially if still monolithic single-file)
- UX / interactivity gaps (empty states, feedback, real skill creation flow, verification loops)
- Accessibility & keyboard support
- Mobile / responsive behavior
- Error handling, edge cases, and "what if" scenarios
- Performance / DOM bloat risks
- Security / scoping simulation robustness

**D. Specific Recommendations**
For each major issue, give concrete suggestions (and code snippets where helpful). Prioritize changes that increase the "this feels like real governed memory" perception.

**E. Quick Wins**
A short list of high-impact, low-effort improvements that would make the POC feel significantly more alive.

**F. Extensibility Notes**
How easy/hard would it be to:
- Add real user-created skills from pasted conversation
- Evolve the graph into a proper traversable structure
- Wire this to a real MCP server later
- Reuse components/patterns in future ContextJamming POCs

**G. Final Verdict + Recommended Next Step**
(One paragraph + a clear suggested scope for the next iteration)

**Tone & Approach:**
- Be direct but constructive. You are helping ship a high-signal artifact.
- Reference the "Codex Workflow Hardening" skill principles where relevant (plan-first, repo-local guidance, verification loops, visual + build verification).
- Assume we will iterate quickly — focus on leverage, not perfectionism.

Begin your review now. If you need the full HTML source or specific sections of the JS, tell me exactly what to paste.

Portable / inspectable / governed

Socialize the architecture

This POC is a conversation artifact: a way to show how personal and organizational memory can become inspectable, portable, governed agent skill infrastructure.

Everything on this page runs in your browser. No memory is sent to a server. Reload persistence uses localStorage.

Skill Memory Bank

Compile, don’t append

O(k) not O(N)

Govern, don’t hope

Turn experience into a typed asset.

Store memory as inspectable infrastructure.

Load the path, not the past.

Make memory corrigible.

The model is replaceable. The memory substrate compounds.

Cut repeated context.

Make behavior reproducible.

Own the correction graph.

Conversation → skill artifact

Codex Workflow Hardening

Browse the memory substrate

Codex Workflow Hardening

Graph Pulse Retrieval

FounderFile Research Pipeline

Memory Governance Protocol

ContextJamming Visual System

MCP Resource Template Pattern

Client A Migration Exception

Pulse the graph

Scoped memory by default

Memory as MCP resources + tools

Export as Agent Skill

How a session becomes a governed skill

Auto-trigger heuristic

4-step commit pipeline

The right model for the right job

Built against the Agent Development Life Cycle

Utility is a retrieval prior, not truth.

Codex Workflow Hardening

Scan small. Load selectively.

Four forms of durable memory

Episodic → what happened

Semantic → what remains true

Procedural → how to do it again

Governance → what deserves to persist

Why not just long context?

Put the POC under pressure

Socialize the architecture

The Ledger.

How this site is made.

Antigravity

Claude Opus 4.8

Codex