day 24 · apr 10, 2026 · launched

AI brain-response video comparison with TRIBE v2

Compared two videos with Meta's TRIBE v2 brain-response model and tested whether the model-favored creative matched real ROAS.

Overview
A research experiment using Meta's TRIBE v2 model to compare two videos through predicted brain-response patterns. The question was whether the model could identify meaningful differences between the videos, and whether those differences would align with actual business performance.

What was measured
The analysis focused on named cortical ROIs, or brain regions of interest. These regions were used as rough proxies for attention, salience, control, and evaluation. The comparison looked at hook strength near the start of each video, overall average response, peak ROI response, and when those peaks happened.

Method
Two videos were run through TRIBE v2 in a blind A vs B comparison. The model does not read a real person's brain. Instead, it predicts brain-response patterns from the video itself. Those predictions were then summarized into selected frontal, cingulate, and temporal ROIs to make the output easier to interpret.

What the model favored
TRIBE-style signals favored Video A. Video A showed a stronger early hook and stronger peak responses across several target regions tied to attention switching, salience, cingulate control, and prefrontal control. On the model's read, Video A looked like the more immediately attention-grabbing creative.

What actually happened
Real performance went the other way. Video B had ROAS above 3, while Video A was closer to 2.5. That mismatch became the most valuable result in the project. The model produced clear and structured differences, but those differences did not line up with the real business outcome in this case.

Interpretation
That makes this more useful as a research and hypothesis tool than a standalone decision tool. The model seems capable of describing how a video loads onto cortical salience and attention patterns, but not of reliably picking the winning creative on its own.

Operational finding
The workflow was heavy from end to end. Running it locally on a 16 GB MacBook was slow and fragile. Free Google Colab with a Tesla T4 could complete the analysis, but each short video still took more than 3.5 hours and the runtime was unreliable. A serious version of this workflow would need a dedicated GPU-backed system instead of a laptop or a low-cost VPS.

Constraint
TRIBE v2 is under a non-commercial license, so this exact model cannot simply be dropped into a product without resolving licensing first.

Built with
Python, TRIBE v2, Hugging Face, PyTorch, WhisperX, ffmpeg, NumPy, Google Colab.

stack

PythonGoogle ColabffmpegHugging FacePyTorchNumPyWhisperXTRIBE v2Other / unmapped