Layer four: the AI runtime. The layer with judgment. This is the layer that decides things. Everything below feeds it, everything above presents what it produces, but here is where the system stops moving data around and starts forming opinions about what the footage means and what to do with it.

01 Interfacewhat you click
02 Orchestrationwhat gives the orders
03 Computewhat does the work
04 AI runtimewhat makes the judgment calls
05 Statewhere everything rests
Five layers under every AI system. We go top to bottom: the surface you touch, down to the bedrock everything rests on.
What makes the judgment calls. Perception in parallel, then a crew of eight agents.
What makes the judgment calls. Perception in parallel, then a crew of eight agents.

Three senses, in parallel

Before anything can be decided, the system has to perceive. It watches the footage three ways at once: three senses, in parallel.

Rekognitionlabels what it sees in the frames
Transcribeevery word, with timestamps
Bedrock Data Automationthe whole video, for meaning

Rekognition looks at the frames and labels what it sees, building a visual timeline. Transcribe listens, writing down every word with its timestamp: what was said, and exactly when. And Bedrock Data Automation watches the whole video for meaning, the narrative summary a human would give you.

The order these run in is the whole point.

Run one after another, that is three minutes. In parallel, all three finish in under ninety seconds.

The three streams merge into one brief: title, summary, key moments. A foundation model drafts it, and the pipeline validates it as strict JSON. If the model returns garbage, the pipeline fails loudly. No fallback data. Ever. Invalid JSON does not get patched, guessed, or quietly replaced with a default. The workflow stops, on purpose, rather than pass confidently wrong structure downstream.

Still from the module video

The film crew: eight agents

Then the brief goes to the strangest part of this machine: the film crew. Eight AI agents, each with one specialty, deliberate over your footage like a real production team.

  • Director owns the narrative arc.
  • Editor selects clips.
  • Cinematographer judges framing and light.
  • Sound Designer handles audio and sync.
  • Story Analyst finds the beats.
  • Pacing Analyst sets the rhythm of the cuts.
  • Fact Checker compares every claim against the analysis and flags what does not match.
  • Quality Checker is the final gate. It can reject the whole plan and send the crew back to work.

They argue in tokens instead of around a table, with shared working memory, and they hand back one artifact: a VariantSpec. Which clips. In what order. With which transitions. For three aspect ratios: widescreen, vertical, square.

Why eight agents instead of one big prompt? Because specialists check each other. A single model confidently makes things up. A crew with a fact checker gets caught.

One layer left: the one everything else stands on.