CONTEXT JAMMING

Field notes from inside the context window.

FounderFiles · N°013Recipe Transfer · World Models · LLM Foundations

Ethan He editorial portrait
Fig. · The Recipe Transfer SpecialistxAI · NVIDIA · FAIR

Subject · Ethan He (Yihui He)

Ethan He.

Video-model intelligence is now mostly coming from the language model, not the video distribution model itself.

He led the small team that took xAI’s Grok Imagine from “no data, no infra, no model” to a shipped v0.9 in three months. Then he made a deliberate choice: after delivering reference-to-video, video extension, and world-model work, he left to focus on what he now sees as the higher-leverage layer — LLM foundations, context management, and agent orchestration.

TRAINED
CMU Robotics · Megvii Channel Pruning
AT
FAIR · NVIDIA Cosmos & MoE · xAI Grok Imagine
FILE
N°013
§ 01 · The Compression Instinct

Finding the Minimal Sufficient Representation

Across more than a decade, Ethan He has repeatedly solved the same underlying problem: how do you preserve the intelligence that matters while dramatically reducing what the system has to carry?

It began with Channel Pruning at Megvii (ICCV 2017), where he developed methods to remove redundant channels from deep networks while reconstructing the feature maps that actually drove downstream accuracy. The same instinct reappears in his NVIDIA work on Mixture-of-Experts routing — activating only the experts a token needs — and again in the VAE temporal compression choices inside Cosmos and Grok Imagine. Most recently, it has migrated into LLM context management and agent harness design.

He has described the long-context problem in video models and the context-compaction problem in LLM agents as fundamentally the same research question viewed through different substrates.

§ 02 · The Three-Month Build

Grok Imagine: Velocity as a Systems Problem

When He joined xAI in July 2025, the team had no data pipeline, no training infrastructure, and no model. Three months later, on October 7, 2025, Grok Imagine v0.9 shipped — five days after OpenAI released Sora 2.

He attributes the speed less to raw compute and more to a combination of extreme talent density, almost empty calendars, strong pre-existing inference and data foundations, and a transferable technical recipe from his time on NVIDIA’s Cosmos project. The same image-first bootstrapping, synthetic captioning, VAE tokenizer, and step-distillation approach that worked at NVIDIA was adapted and accelerated at xAI.

Compute mattered, but primarily as a multiplier of iterations per day rather than as the decisive variable.

The visual intelligence is actually mostly coming from language. Every time you see some improvement on these models, I would say mostly this comes from the language model, not coming from the video distribution models themselves.
Ethan He, Latent Space — June 1, 2026
§ 03 · The LLM Driver Thesis

Where the Marginal Gains Have Moved

He’s central claim is that the frontier of progress in video and world models has shifted. The diffusion or video generation component has become relatively mature; the intelligence that turns vague user intent into rich, coherent output now lives primarily in the language model layer — the prompt rewriter, planner, and orchestrator.

This is not a claim that video data is irrelevant. It is a claim about where the highest-leverage work currently sits. In his view, the next major qualitative leap will come from better agentic systems that can plan, generate, critique, edit, and iterate — treating the video model as one tool among several rather than the sole source of capability.

§ 04 · Video Agents Over Raw Model Scaling

The Product and Research Direction

He predicts that by the end of 2026, production-grade video agents will trigger a new wave of capability and spending — particularly from enterprises willing to pay for iterative, multi-step creative and simulation workflows that single-shot generation cannot deliver.

Grok Imagine’s Agent Mode, the open canvas where the system plans and stitches together longer outputs, was an early signal of this direction. He sees generative interfaces and real-time interactive world models as the longer-term destination, where the boundary between model and application becomes increasingly fluid.

§ 05 · The Autonomy Re-Bet

Choosing Research Freedom Over Scale

He made three deliberate moves up the institutional compute ladder: FAIR, then NVIDIA, then xAI. After delivering v0.9, reference-to-video, video extension, and world-model work at xAI, he chose to leave.

His stated reasons were direct: there was research he wanted to pursue that he could not do inside a company, and company priorities can shift quickly. This was not an impulsive exit but a calculated reversal — trading access to massive compute and engineering velocity for the autonomy to work on LLM foundations, self-managed context, and test-time model behavior.

His departure sits within a broader 2026 pattern at xAI following the SpaceX acquisition, as multiple researchers and engineers opted for smaller, more independent environments once the organization’s focus shifted.

§ 06 · Self-Managing Context and Test-Time Adaptation

The Current Research Agenda

He is now focused on problems that extend his long-standing interest in minimal sufficient representations into the language model domain:

  • Models that can understand and actively manage their own context length
  • Agent harnesses that a model can inspect and modify at test time
  • Moving from heuristic context pruning to learned, continual mechanisms

The through-line across his career remains consistent: identify what actually carries the intelligence, strip away what does not, and build systems that can act intelligently on the remainder.

Trajectory
  1. 2017

    Channel Pruning (ICCV)

    Introduced iterative channel selection and feature reconstruction. First major expression of his core instinct: preserve what matters, remove what does not.

  2. 2019-2021

    FAIR / Reality Labs

    Epipolar Transformers and early multimodal fusion work. Began exploring cross-view and temporal structure in human-centric vision.

  3. 2023-2025

    NVIDIA Cosmos & MoE

    Co-authored the Cosmos world foundation model and led upcycling work for large Mixture-of-Experts models. Developed the full technical recipe later used at xAI.

  4. July 2025 - Early 2026

    xAI Grok Imagine

    Joined when there was no data, infra, or model. Shipped v0.9 in three months, followed by reference-to-video, video extension, and world-model work.

  5. 2026-

    Independent Research

    Left xAI to focus on LLM foundations: self-managed context, test-time model modification, and moving from heuristic to learned continual mechanisms.

The Index
3 months
Grok Imagine from zero infrastructure to v0.9
NVIDIA Cosmos
World model recipe transferred and accelerated at xAI
Lead Author
Upcycling Large Language Models into Mixture of Experts
9k+ citations
Channel Pruning, MoE, and world model contributions
Dossier

Current Direction. LLM foundations with emphasis on self-managed context, test-time harness modification, and continual learning mechanisms.

Signature Strength. Transferring working technical recipes across domains while dramatically improving iteration speed and efficiency.

Notable Institutions. Megvii, CMU Robotics Institute, Facebook AI Research / Reality Labs, NVIDIA (Cosmos & MoE), xAI (Grok Imagine).

Caveat. Internal Grok Imagine architecture is not public. This file treats Cosmos as a recipe-transfer analog, not as proof of xAI implementation details.

Career Shape
dash-shaped — pure breadth, the inverse of I

Dash Velocity Generalist

Composes and orchestrates rather than digging; treats speed of assembly as the moat and gets to a working artifact in weeks.

Credential Path
Practitioner
Abstraction
Bottom Up
Exit Horizon
Velocity
Moat Instinct
Orchestration
Capital Posture
Venture
Role-Model Reference Class
  • Fast-shipping small teams
  • Orchestration-over-scaling proponents
Founder Context · JSON

A small reasoning persona distilled from this file. Inject it into a chat or deep-research context to assess a business problem the way He would.

Reason as a velocity-first builder. Ask what a small team could ship in weeks by composing existing pieces rather than building from scratch. Look for leverage in orchestration, context management, and agent harnesses rather than raw scale. Optimize for the fastest path to a working end-to-end artifact, then iterate.

{
  "$schema": "https://www.contextjamming.com/schemas/founder-context-v1.json",
  "file": "N°013",
  "persona": "Ethan He",
  "archetype": "dash-velocity",
  "shape": "—",
  "one_line": "Treats orchestration and assembly speed as the moat, not raw model scale.",
  "cognitive_basis": {
    "credentialPath": "practitioner",
    "abstractionDirection": "bottom-up",
    "exitHorizon": "velocity",
    "moatInstinct": "orchestration",
    "capitalPosture": "venture"
  },
  "operating_questions": [
    "What can a small team ship in weeks by composing existing pieces?",
    "Is the next leap in scale, or in orchestration, context, and the agent harness?",
    "What is the fastest path from zero to a working artifact?"
  ],
  "first_principles": [
    "Composition and orchestration beat raw scaling for the next increment.",
    "Velocity is a moat; the team that ships v0.9 first sets the agenda
  …
Share
FounderFiles N°013 · Ethan He
Filed by Bret Kerr · ACRA Insight LLC · Franklin, MA
contextjamming.com · @bretkerr
← back to Context Jamming

§ · Invoice No. 001 · The Build Ledger

The Ledger.

Filed · contextjamming.com

What a conservative mid-market digital agency would have quoted for the same scope, itemized against what this site actually cost. Agency numbers are the floor — not the premium brand-studio tier.

TIME

12 weeks

2 days

~42× faster

COST

~$150,000

~$300

~500× cheaper

TEAM

5-person agency

1 human + 3 models

Same deliverable

§ Itemized — what a mid-market agency SOW would have billed

Discovery · brand positioning · workshops40–80 hr$10,000
Design system · Figma tokens · 3 rounds60–120 hr$18,000
Wavesurfer audio carousel · single-track context60–100 hr$16,000
Dual lightbox systems · focus trap · keyboard30–50 hr$8,000
LLM product flows · streaming · state machine80–160 hr$26,000
Stripe · checkout · webhooks · env hardening40–80 hr$10,000
Editorial routes · 6 sub-pages · templates60–100 hr$14,000
Accessibility pass · aria · reduced-motion40–80 hr$10,000
QA · cross-browser · mobile matrix60–100 hr$14,000
Cross-publication rebrand · masthead + IA · 2026-04-2820–40 hr$6,000
Subtotal~700 hr$126,000
Project management · 18% overhead$24,000
Agency total — conservative floor~700 hr~$150,000
Actually spent · Claude + Gemini stack~20 hr~$300

Agency figure assumes ~700 billable hours at $200/hr blended, plus ~18% PM overhead — the conservative floor of a mid-market SOW. Premium brand studios would have quoted 2–3× that. Stack: Antigravity (orchestrator), Claude Opus 4.8 (auditor), Codex (adversary), Cloudflare Workers / OpenNext.

§   Colophon

How this site is made.

Vol. 26 · build log

Every page on contextjamming.com is the output of a real-time, three-body Mixture-of-Experts loop. One model orchestrates. Two consult. The human holds the thesis. No single model commits alone.

View Redesign Assessment →

Orchestrator

Antigravity

Google DeepMind

  • Primary author
  • Terminal-native, direct push to Cloudflare
  • Audit trail to GitHub on every commit
  • Adaptive thinking · effort: extra-high

Auditor

Claude Opus 4.8

1M context

  • Editorial critic
  • Code review before merge
  • Backup-of-record
  • Co-signs every commit

Adversary

Codex

Cross-model MoE

  • Factual adjudication
  • Structural dissent
  • Deep Research → semantic triples
  • Caught the Donelan incident

Stack

Next.js
16.2 · App Router
React
19.2
TypeScript
5
Tailwind
v4 · @theme inline
@opennextjs/cloudflare
adapter
wrangler
Pages deploy
framer-motion
transitions
wavesurfer.js
audio waveforms

Typeset in

Fraunces
variable · opsz + SOFT
Playfair Display
debate display
IBM Plex Mono
editorial metadata
Geist Mono
utility mono
Caveat
grease-pencil marginalia
All via
next/font/google
Palette
single @theme block
No dupe tokens
ever

Infrastructure

Deploy
Cloudflare Workers / OpenNext
ISR
30-min revalidate · Cloudflare-served
Repo
github.com/BretKerrAI/founderfile
Branch
main
Analytics
Google Tag Manager
Apex
contextjamming.com
Runtime
Node 24
Build tool
Turbopack
       human intent
            │
            ▼
   ┌────────────────────┐         ┌─────────────────┐
   │    Antigravity     │  ◄────► │ Claude Opus 4.8 │      ← auditor loop
   │    (orchestrator)  │         │     (auditor)   │
   └─────────┬──────────┘         └─────────────────┘
             │  ◄───────────┐
             ▼              │
       ┌──────────┐    ┌────┴───────┐
       │Cloudflare│    │   Codex    │          ← adversarial loop
       │ Workers  │    │            │
       └─────┬────┘    └────────────┘
             │
             ▼
       contextjamming.com
             │
             ▼
       ┌──────────────┐
       │   Git push   │         ← audit trail
       └──────────────┘
Assembled on Mac in Terminal · Filed from Franklin, MAContext Jamming · ACRA Insight LLC · MIT License · FounderFile.ai · RelationalIntelligence.xyz · Commission a Dispatch →