← all projects

$ video · spoolcast-explainer

I don't make videos. My AI pipeline does.

apr 15, 2026 · 8m 0s · style: inline

style library

inline

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palet…

summary

how this video was made

writing

Claude · screenplay, shot-list, scene prompts

images

generated

audio

Puck · 143 beats

render

Remotion

audit

passed

chunks

61 scenes

#1 · C1

Cold Open

You build things.

✓ narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Stick figure at laptop, just clicked publish on something. Expectant face. Orange shirt. Clean, minimal.

#2 · C1A

Cold Open

It came out great.

✓ narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Same stick figure now proudly holding up the finished artifact \u2014 a small tablet/phone/poster showing the completed work. Big smile, one thumb up. Small sparkle lines around the object to signal 'look what I made.'.

#3 · C1A2

Cold Open

Now what?

✓ narration ✓ render

Same stick figure, standing still with the artifact off to the side. Arms slightly up in a shrug. Eyes looking around blankly, confused 'now what?' expression. Deadpan, empty background.

#4 · C1B

Cold Open

Getting attention for what you built is a separate job.

✓ narration ✓ render

Split frame: left side stick figure coding (labeled BUILDING), right side stick figure at a podium with a megaphone (labeled MARKETING). Arrow or barrier between them.

#5 · C1C

Cold Open

Different skills. Different time. Different energy.

! narration ✓ render

Three micro-panels horizontally: SKILLS (paintbrush + camera icons), TIME (clock), ENERGY (drained battery). Quick rhythm visual for the tricolon.

#6 · C2

Cold Open

Even if you have all three — skills, time, energy — the harder problem is: most of what gets built doesn't reach anyone.

! narration ✓ render

Single-beat deadpan punchline chunk. Stick figure staring at a laptop showing '0 views' or empty analytics. Deflated expression, arms down.

#7 · C3

Cold Open

The obvious fix is to have an AI make the video for you.

✓ narration ✓ render

Builder hopefully typing into an AI prompt box labeled 'make me a video.'

#8 · C4

Cold Open

That produces slop.

! narration ✓ render

MEME SPIKE — 'this is fine' dog sitting in a burning room (K.C. Green, reference use). Full-frame, replaces anchor style for deadpan punctuation. See ASSET_RULES.md Punchline Chunk Carve-Out.

#9 · C5

Cold Open

Style drifts from scene to scene.

! narration ✓ render

Three tiled illustration panels in wildly different art styles (cartoon, watercolor, photorealistic). Same subject, totally different styles. Off-anchor deliberately.

#10 · C5B

Cold Open

Pacing is random.

! narration ✓ render

Clock face with erratic, jittery hands. Or a timeline with unevenly spaced beat markers — some scrunched, some stretched. Chaotic rhythm visible.

#11 · C5C

Cold Open

You can't iterate without the whole thing regenerating into a different video.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: A refresh/regenerate icon above a stack of wildly different-looking video thumbnails. Each regeneration = a whole new video, not an edit.

#12 · C6

Cold Open

This is spoolcast.

It's an architecture for making real short-form illustrated video.

Built for people who build things and don't want marketing to be a second job.

! narration ✓ render

Clean title card. 'spoolcast' hand-drawn lettering, small stick figure builder next to it holding a tool.

#13 · C8

Cold Open

In the next seven minutes, we'll walk through the pipeline that made that clip for under three dollars.

And the agent layer that runs it in the background while you keep building.

! narration ✓ render

Builder at laptop, relaxed, glancing at a chat notification with mp4 thumbnail.

#14 · C9

Reframe

The fix isn't a smarter AI.

It's a division of labor.

! narration ✓ render

AI-brain character overwhelmed, juggling 6 labeled jobs (story, script, images, animation, camera, render). Balls spilling out.

#15 · C10

Reframe

The generic approach asks one model to do everything.

Story. Script. Images. Animation. Camera. Timing. Rendering.

When anything's wrong, there's no atomic unit you can change and re-run.

That's why the output feels slopped together.

! narration ✓ render

Same overwhelmed AI-brain with a red X marking one of the labeled jobs. No way to fix just that one item.

#16 · C11

Reframe

Structurally, it was.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Single-beat deadpan punchline. AI-brain looking straight at camera. Flat expression.

title card

#17 · C12

Reframe

In spoolcast, there's exactly one atomic driver.

The script.

Specifically, a shot list — a spreadsheet with one row per sentence the narrator will say, grouped into scenes.

Change one row. Only the artifacts downstream of that row get rebuilt.

Everything else stays untouched.

! narration ✓ render

Screenshot/PNG of the actual spoolcast-explainer shot-list.xlsx, opened in Numbers. Pan across rows. Real artifact, no AI illustration.

#18 · C13

Reframe

Every other layer has exactly one job. And one rule.

It cannot improvise.

! narration ✓ render

Four small layer icons (IMAGE/ANIMATION/VOICE/RENDER) each with a 'NO IMPROVISING' sticker. Flat.

title card

#19 · BUMP-ANATOMY

Anatomy

title card

✓ narration ✓ render

#20 · C14

Anti-slop Catalog

Here's what specifically keeps the script from being slop.

✓ narration ✓ render

Title card / section divider. 'ANTI-SLOP PROCESSES' hand-lettered. Clean.

title card

#21 · C15

Anti-slop Catalog

One. Images are grouped by chunk, not by beat.

A chunk is one to six sentences that share a visual moment.

One illustration covers the whole chunk.

Saves the budget. Keeps the eye from whipping around.

✓ narration ✓ render

Screenshot of real pilot shot-list.json with beats indented under chunks. Syntax-highlighted. Shows 4 beats wrapped in one chunk visually.

#22 · C16

Anti-slop Catalog

Two. The image doesn't illustrate a sentence.

It shows the chunk's narrative throughline.

A chunk about ad saturation gets one image of an ad-wall.

Not a picture of every individual word.

! narration ✓ render

The real pilot C1-nano-1k.png displayed full-frame. Chaotic ad-wall. On screen as proof the technique worked.

#23 · C17

Anti-slop Catalog

Three. Chunks have a hard cap around fifteen seconds.

Past that, the static image goes dead.

Split it. New image.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: A timer at 15 seconds with a yawn icon appearing. Then an arrow to a second image replacing the first.

#24 · C18

Anti-slop Catalog

Four. Every sentence has to survive on its own.

If a line only makes sense when you read the one before it, that's an essay.

Rewrite until each beat can be lifted out and still land.

✓ narration ✓ render

A single sentence in quotes, highlighted, lifted out of a page. The rest of the page fades.

#25 · C19

Anti-slop Catalog

Five. Text-to-speech reads exactly what you type.

So you write it the way you want it said.

Write 'roe-ass.' Not 'ROAS.'

! narration ✓ render

Side-by-side: 'ROAS' with audio waveform labeled 'R-O-A-S' (letter by letter, sad), vs 'roe-ass' with waveform labeled 'roe-ass' (clean).

#26 · C20

Anti-slop Catalog

Otherwise it says each letter out loud. Like a roll call.

✓ narration ✓ render

MEME SPIKE — Wall-E on a trash pile at sunset (Pixar still, reference use — copyrighted character used as meme per ASSET_RULES.md carve-out). The absurdity of a robot taking instructions literally lands on top of the beat. Full-frame.

#27 · C21

Anti-slop Catalog

Six. One stage in the pipeline is deliberately not automatable.

Turning messy raw material into a script needs judgment.

What's the practical question. What's the turning point. What's the one thing the viewer should come away with.

Mechanize that and you get generic.

Leave it human. The rest of the pipeline takes over from there.

! narration ✓ render

Horizontal split. Left side: human head icon with thought bubble labeled JUDGMENT. Right side: gear-icon pipeline labeled DETERMINISTIC. Arrow flows left to right.

#28 · C22

Budget

Before the layers. The economics.

Per five-minute video.

Images — about fifty illustrations on kie.ai — roughly one to three dollars.

Voice — Google Cloud's Chirp3-HD text-to-speech. Free within the monthly tier.

Animation and render — both run locally. Zero cloud cost.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Itemized receipt visual: Images $2, Voice $0, Animation $0, Render $0. Hand-drawn price tags.

#29 · C23

Budget

Total per video: roughly the price of a coffee.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Single-beat deadpan. A coffee cup next to a small receipt. The receipt shows one underlined total at the bottom reading exactly: 'TOTAL: $2'. No other line items, no other numbers visible anywhere in the image. Minimal, hand-drawn feel.

#30 · C24

Budget

Hiring a human animator for a five-minute explainer runs five hundred to two thousand dollars.

Adobe plus Descript plus stock footage — fifty a month in subscriptions and a dozen hours of manual work.

Spoolcast is an order of magnitude cheaper than either path.

! narration ✓ render

Three bar-chart bars: HUMAN ANIMATOR (tall), DIY ADOBE (medium), SPOOLCAST (tiny). Height dramatic.

title card

#31 · BUMP-LAYERS

Layers

title card

✓ narration ✓ render

#32 · C25

Image Layer

Four layers.

Image makes the pictures. Animation gives them motion. Voice narrates. Render stitches it all into an mp4.

Each does one thing. Together they make the video.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: A clean hand-drawn pipeline diagram. Four labeled boxes stacked horizontally, left to right: IMAGE (with a small paintbrush icon), ANIMATION (with a small motion-arrow icon), VOICE (with a small speech-bubble icon), RENDER (with a small mp4 play-button icon). Thin arrows connecting each box to the next. At the far right, a single larger mp4 thumbnail icon with a play button. Minimal stick-figure builder off to one side just observing the flow.

#33 · C26

Image Layer

Image layer.

One AI illustration per chunk.

The service is kie.ai. Their nano-banana-2 model.

Each image is one HTTP request — a JSON body with a prompt and an image_input array.

! narration ✓ render

JSON request visual flying into a kie.ai cloud icon, PNG returning. Clean.

#34 · C27

Image Layer

The problem this solves.

If you ask a model for fifty illustrations, you get fifty art styles.

Scene one is a cartoon. Scene two is a watercolor. Scene three is photorealistic.

! narration ✓ render

Three mismatched-style tiles side by side. Same subject, totally different rendering.

title card

#35 · C28

Image Layer

The fix is called image-ref chaining.

First generation establishes a style anchor.

Every subsequent request passes that anchor back in.

The model matches the existing style instead of inventing a new one.

Fifty scenes. One artist.

! narration ✓ render

3x3 grid composite of real pilot C1-C9 nano-1k PNGs. Proof the technique produces visually consistent scenes.

#36 · C29

Animation Layer

Animation layer.

A still image held on screen for eleven seconds while someone talks is dead air.

The image needs to appear over time.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Still image next to a yawning stick figure viewer. Clear 'boring' signal.

#37 · C30

Animation Layer

The naive fix is a CSS wipe.

Robotic. Up-down or left-right. Obvious every time.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: A mechanical conveyor-belt mechanism with a big solid rectangular wipe bar sliding across the frame in a straight horizontal line. Sharp edge. Obvious machinery: gears, arrows showing the direction, a clunky computer-generated feel. A small stick figure on the side looks unimpressed, arms crossed. Label: 'CSS WIPE'.

#38 · C31

Animation Layer

The real fix is a Python script that computes a per-pixel reveal-time map.

For every pixel, pick a time between zero and the chunk's duration when it should become visible.

Connected shapes reveal in parallel.

Big shapes start early. Small details fill in later. Everything finishes together.

! narration ✓ render

Heatmap visualization: the same image colored by reveal time. Big regions blue (early), small details red (late). With a label 'reveal-time map.'

#39 · C32

Animation Layer

Looks like an artist speed-painting. Not a computer wiping.

! narration ✓ render

A human hand holding a paintbrush, painting rapidly across a canvas with visible brush strokes going in multiple directions. The canvas shows a partially-completed illustration emerging with multiple strokes appearing at once, not a single straight line. Motion lines around the hand showing speed. Warm, organic, hand-made feel. Label: 'ARTIST, NOT MACHINE'.

#40 · C33

Animation Layer

The script uses OpenCV, a computer vision library.

Runs locally. No AI tokens. No network calls. No randomness.

Same image in, same frames out. Always.

! narration ✓ render

Python logo + OpenCV logo, input PNG to folder of numbered output frames. Padlock icon for determinism.

#41 · C34

Voice Layer

Voice layer.

Google Cloud Chirp3-HD text-to-speech.

The voice called Puck. Played at 1.2x speed.

! narration ✓ render

Microphone icon labeled 'Puck' with a 1.2x speed indicator.

title card

#42 · C35

Voice Layer

Default speed sounds like a robot.

1.2x sounds like a real podcast host.

That's the whole trick.

! narration ✓ render

Two waveform rows with label: top 'Puck 1.0x — robotic', bottom 'Puck 1.2x — podcast host'. Real audio plays back both for A/B compare.

#43 · C36

Voice Layer

The earlier pick was ElevenLabs. Marginally better. Hundreds of dollars a month at daily-video pace.

Chirp3-HD is free at this volume. Close enough nobody notices.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Two labeled microphones side-by-side. ElevenLabs with $$$, Chirp3-HD with $0. Checkmark on Chirp3-HD.

#44 · C37

Render Layer

Render layer.

Remotion. A React-based video framework.

Each scene is a React component that renders a frame.

Running one command — npx remotion render — produces an mp4.

! narration ✓ render

Terminal with 'npx remotion render' command being typed, mp4 appearing in folder below. Remotion + React logos.

title card

#45 · C38

Render Layer

Headless. No browser opens. It just runs.

Deterministic. Same inputs, same mp4. Always.

! narration ✓ render

Terminal-styled image showing the `npx remotion render` command with real-looking output: bundling, metadata, progress bar at 92%, success messages. Dark theme. Monospace font. Designed to pan across slightly for motion in Remotion rather than a literal screen recording.

#46 · C39

Render Layer

Four layers. Each doing one narrow thing.

Nothing improvises.

! narration ✓ render

Full pipeline diagram: SCRIPT at top, four layer boxes below, each with a swap/replace icon.

title card

#47 · BUMP-PROOF

Proof

title card

✓ narration ✓ render

#48 · C40

Proof

Here's what the whole pipeline produced.

! narration ✓ render

Simple arrow pointing right, setting up the reveal.

title card

#49 · C41

Proof

[object Object]

✓ narration ✓ render

20-25 sec of pilot-full-v2.mp4 with its audio. Section showing reveal + style-locked scenes + recognizable narration.

#50 · C42

Proof

Five minutes. Forty-four chunks.

AI voice. AI images. AI-written title. AI-generated thumbnail.

Zero overlay improvisation.

Rendered headless. One command.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Stats card with checkmark list: 5 MIN, 44 CHUNKS, AI VOICE, AI IMAGES, etc.

#51 · C43

Agent Layer

One thing left — and this piece is still being built as of this video.

Someone has to decide there's a video worth making and write the brief.

The plan is to delegate that too.

! narration ✓ render

Builder stick figure doing a 'pass it off' handoff motion toward Zara (pfp overlay inset top-right). Small 'COMING SOON' tag visible.

title card

#52 · C44

Agent Layer

The design is an agent that lives in your chat room.

In this project — Zara. An Animabot running on a Matrix server.

Built and talking in chat already. Not yet wired up to spoolcast.

! narration ✓ render

Screenshot of 195.201.90.47/public.html — the real Animabot admin panel showing Zara's avatar, status, MBTI. Honest framing: the bot exists, the pipeline integration is the next step.

title card

#53 · C45

Agent Layer

Once it's wired up, Zara will watch every signal the builder generates while working.

Commits pushed to GitHub.

New tracker entries.

Chat messages about a milestone.

A thread where the builder explained what they just figured out.

! narration ✓ render

Real Zara pfp (cyberpunk character portrait, purple hair) displayed prominently — this is the real agent, not an anchor-style illustration. Signal icons (GitHub, tracker, chat, thread) float in as overlays during their mentioned beats.

#54 · C46

Agent Layer

When the signals line up, Zara writes the brief.

A few paragraphs. Project name. Practical question. Turning point. Core message.

The brief goes to the editorial agent. The editorial agent returns a shot list. Spoolcast renders.

Zara drops the finished mp4 back in the chat with a short note.

! narration ✓ render

Horizontal flow: Zara (pfp at leftmost position) → brief page → editorial agent (Claude icon) → shot list → spoolcast logo → mp4 → chat bubble. Optional small 'DESIGNED' watermark to keep honest framing.

#55 · C47

Agent Layer

From the builder's side, once this is live, the whole thing will feel like this.

You ship a thing. You go back to building.

A few hours later, a video about what you just built appears in your chat.

You won't write the script. You won't generate images. You won't edit anything.

You did the building. The content got made.

! narration ✓ render

Split sequence: builder coding (left) → notification pops up with mp4 thumbnail and Zara pfp inset (right) → builder glances, smiles briefly, keeps coding.

#56 · C48

Agent Layer

The 'mostly' in 'mostly automated' is the editorial agent that still needs real context.

Because the alternative is a generic summary nobody watches.

Mostly automated. With one intentional judgment layer that keeps the output from being slop.

! narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Pie chart: most of pie labeled AUTOMATED, small slice labeled JUDGMENT (human head icon). Small slice is a feature not a bug.

#57 · C49

Close

Using AI video to get attention used to mean one of two things.

Producing slop. Or spending a week on every video.

✓ narration ✓ render

Two crossed-out options: slop pile (X) and calendar showing 7 days (X).

#58 · C50

Close

It doesn't anymore.

! narration ✓ render

Single-beat deadpan reveal. Clean slate. New checkmark appearing over 'spoolcast' label.

#59 · C51

Close

Scope the AI narrowly.

Let deterministic code handle the rest.

Put an agent in the chat room to pull the trigger.

! narration ✓ render

Three numbered items stacked vertically with small icons.

#60 · C52

Close

The builder keeps building.

The content gets made.

Passively. And mostly automated.

! narration ✓ render

Builder coding at laptop (center), small video-thumbnail icons accumulating off to the side. Builder never looks up.

#61 · C53

Close

Spoolcast. The rendering engine for passive content.

✓ narration ✓ render nano-banana-2

Loose hand-drawn black ink line art with basic flat fill colors on pure white background (#FFFFFF). Simple stick-figure characters — round bald heads, dot eyes, expressive simple mouths, thin curved limbs. Flat basic colors only. Core palette: orange for clothing, peach for skin tone, gray for devices, white background. Additional colors (green, blue, red) only when the specific scene description mentions them. No shading, no gradients, no texture. Bold black ink outlines on every shape. Cyanide-and-Happiness / XKCD visual family. Charmingly amateurish notebook-doodle vibe. No text inside the drawing unless the scene explicitly calls for a short UI label. Scene: Title card: Spoolcast tagline + youtu.be/hqbmHuEtayM + github.com/artluai/spoolcast. Minimal.

You build things. It came out great. Now what? Getting attention for what you built is a separate job. Different skills. Different time. Different energy. Even if you have all three — skills, time, energy — the harder problem is: most of what gets built doesn't reach anyone. The obvious fix is to have an AI make the video for you. That produces slop. Style drifts from scene to scene. Pacing is random. You can't iterate without the whole thing regenerating into a different video. This is spoolcast. It's an architecture for making real short-form illustrated video. Built for people who build things and don't want marketing to be a second job. In the next seven minutes, we'll walk through the pipeline that made that clip for under three dollars. And the agent layer that runs it in the background while you keep building. The fix isn't a smarter AI. It's a division of labor. The generic approach asks one model to do everything. Story. Script. Images. Animation. Camera. Timing. Rendering. When anything's wrong, there's no atomic unit you can change and re-run. That's why the output feels slopped together. Structurally, it was. In spoolcast, there's exactly one atomic driver. The script. Specifically, a shot list — a spreadsheet with one row per sentence the narrator will say, grouped into scenes. Change one row. Only the artifacts downstream of that row get rebuilt. Everything else stays untouched. Every other layer has exactly one job. And one rule. It cannot improvise. Here's what specifically keeps the script from being slop. One. Images are grouped by chunk, not by beat. A chunk is one to six sentences that share a visual moment. One illustration covers the whole chunk. Saves the budget. Keeps the eye from whipping around. Two. The image doesn't illustrate a sentence. It shows the chunk's narrative throughline. A chunk about ad saturation gets one image of an ad-wall. Not a picture of every individual word. Three. Chunks have a hard cap around fifteen seconds. Past that, the static image goes dead. Split it. New image. Four. Every sentence has to survive on its own. If a line only makes sense when you read the one before it, that's an essay. Rewrite until each beat can be lifted out and still land. Five. Text-to-speech reads exactly what you type. So you write it the way you want it said. Write 'roe-ass.' Not 'ROAS.' Otherwise it says each letter out loud. Like a roll call. Six. One stage in the pipeline is deliberately not automatable. Turning messy raw material into a script needs judgment. What's the practical question. What's the turning point. What's the one thing the viewer should come away with. Mechanize that and you get generic. Leave it human. The rest of the pipeline takes over from there. Before the layers. The economics. Per five-minute video. Images — about fifty illustrations on kie.ai — roughly one to three dollars. Voice — Google Cloud's Chirp3-HD text-to-speech. Free within the monthly tier. Animation and render — both run locally. Zero cloud cost. Total per video: roughly the price of a coffee. Hiring a human animator for a five-minute explainer runs five hundred to two thousand dollars. Adobe plus Descript plus stock footage — fifty a month in subscriptions and a dozen hours of manual work. Spoolcast is an order of magnitude cheaper than either path. Four layers. Image makes the pictures. Animation gives them motion. Voice narrates. Render stitches it all into an mp4. Each does one thing. Together they make the video. Image layer. One AI illustration per chunk. The service is kie.ai. Their nano-banana-2 model. Each image is one HTTP request — a JSON body with a prompt and an image_input array. The problem this solves. If you ask a model for fifty illustrations, you get fifty art styles. Scene one is a cartoon. Scene two is a watercolor. Scene three is photorealistic. The fix is called image-ref chaining. First generation establishes a style anchor. Every subsequent request passes that anchor back in. The model matches the existing style instead of inventing a new one. Fifty scenes. One artist. Animation layer. A still image held on screen for eleven seconds while someone talks is dead air. The image needs to appear over time. The naive fix is a CSS wipe. Robotic. Up-down or left-right. Obvious every time. The real fix is a Python script that computes a per-pixel reveal-time map. For every pixel, pick a time between zero and the chunk's duration when it should become visible. Connected shapes reveal in parallel. Big shapes start early. Small details fill in later. Everything finishes together. Looks like an artist speed-painting. Not a computer wiping. The script uses OpenCV, a computer vision library. Runs locally. No AI tokens. No network calls. No randomness. Same image in, same frames out. Always. Voice layer. Google Cloud Chirp3-HD text-to-speech. The voice called Puck. Played at 1.2x speed. Default speed sounds like a robot. 1.2x sounds like a real podcast host. That's the whole trick. The earlier pick was ElevenLabs. Marginally better. Hundreds of dollars a month at daily-video pace. Chirp3-HD is free at this volume. Close enough nobody notices. Render layer. Remotion. A React-based video framework. Each scene is a React component that renders a frame. Running one command — npx remotion render — produces an mp4. Headless. No browser opens. It just runs. Deterministic. Same inputs, same mp4. Always. Four layers. Each doing one narrow thing. Nothing improvises. Here's what the whole pipeline produced. Five minutes. Forty-four chunks. AI voice. AI images. AI-written title. AI-generated thumbnail. Zero overlay improvisation. Rendered headless. One command. One thing left — and this piece is still being built as of this video. Someone has to decide there's a video worth making and write the brief. The plan is to delegate that too. The design is an agent that lives in your chat room. In this project — Zara. An Animabot running on a Matrix server. Built and talking in chat already. Not yet wired up to spoolcast. Once it's wired up, Zara will watch every signal the builder generates while working. Commits pushed to GitHub. New tracker entries. Chat messages about a milestone. A thread where the builder explained what they just figured out. When the signals line up, Zara writes the brief. A few paragraphs. Project name. Practical question. Turning point. Core message. The brief goes to the editorial agent. The editorial agent returns a shot list. Spoolcast renders. Zara drops the finished mp4 back in the chat with a short note. From the builder's side, once this is live, the whole thing will feel like this. You ship a thing. You go back to building. A few hours later, a video about what you just built appears in your chat. You won't write the script. You won't generate images. You won't edit anything. You did the building. The content got made. The 'mostly' in 'mostly automated' is the editorial agent that still needs real context. Because the alternative is a generic summary nobody watches. Mostly automated. With one intentional judgment layer that keeps the output from being slop. Using AI video to get attention used to mean one of two things. Producing slop. Or spending a week on every video. It doesn't anymore. Scope the AI narrowly. Let deterministic code handle the rest. Put an agent in the chat room to pull the trigger. The builder keeps building. The content gets made. Passively. And mostly automated. Spoolcast. The rendering engine for passive content.