@EXM7777: how to make AI videos so convi...
@EXM7777
39 views
Oct 12, 2025
2
you're writing your Sora 2 prompts like you're writing an essay...
"create a cinematic video of a sunset over mountains with dramatic lighting and smooth camera movement"
it (barely) works... but you're leaving so much quality and realism on the table
here's the process that changed everything for me:
"create a cinematic video of a sunset over mountains with dramatic lighting and smooth camera movement"
it (barely) works... but you're leaving so much quality and realism on the table
here's the process that changed everything for me:
3
before we get into the real stuff...
if you're serious about learning ai, here are some very good resources:
more free prompts + content in my telegram (link in bio)
weekly newsletter (no ads/spam): aifirstbrain.com
now back to the thread
if you're serious about learning ai, here are some very good resources:
more free prompts + content in my telegram (link in bio)
weekly newsletter (no ads/spam): aifirstbrain.com
now back to the thread
4
JSON (JavaScript Object Notation) is just structured data... and before you roll your eyes thinking this sounds technical, stay with me because this is simpler than you think
instead of writing paragraphs hoping the AI interprets correctly, you're organizing instructions the way the model actually processes them
like filling out a form vs explaining what you want in a rambling email
instead of writing paragraphs hoping the AI interprets correctly, you're organizing instructions the way the model actually processes them
like filling out a form vs explaining what you want in a rambling email
5
same idea as that essay prompt... but now every parameter has its own label
> no ambiguity about what "dramatic" modifies
> no confusion about relationships between elements
> just clean, organized instructions
> no ambiguity about what "dramatic" modifies
> no confusion about relationships between elements
> just clean, organized instructions
6
why this creates photorealistic results:
Sora 2 doesn't have to waste processing power parsing grammar and inferring meaning... it reads structured key-value pairs directly
which means more computing power goes to actually GENERATING the video instead of understanding your prompt
this is why JSON outputs consistently look more polished
Sora 2 doesn't have to waste processing power parsing grammar and inferring meaning... it reads structured key-value pairs directly
which means more computing power goes to actually GENERATING the video instead of understanding your prompt
this is why JSON outputs consistently look more polished
7
and here's something nobody talks about... token efficiency
when you write "create a cinematic video with dramatic lighting, smooth camera movement, and a sunset over mountains" the AI processes every single word, punctuation mark, grammatical structure
JSON skips 90% of that linguistic overhead
same information, fraction of the tokens
when you write "create a cinematic video with dramatic lighting, smooth camera movement, and a sunset over mountains" the AI processes every single word, punctuation mark, grammatical structure
JSON skips 90% of that linguistic overhead
same information, fraction of the tokens
8
but Sora 2 is fundamentally different from image generators...
(this is where most people's mental model breaks)
it doesn't generate a pretty picture and add motion... it actually understands how scenes EVOLVE over time
physics, momentum, cause and effect
which means you need to prompt temporally, not just spatially
(this is where most people's mental model breaks)
it doesn't generate a pretty picture and add motion... it actually understands how scenes EVOLVE over time
physics, momentum, cause and effect
which means you need to prompt temporally, not just spatially
9
here's what I mean by temporal prompting:
{
"duration": "10s",
"sequence": [
{"time": "0-3s", "action": "camera zooms in on subject"},
{"time": "3-7s", "action": "subject turns head slowly"},
{"time": "7-10s", "action": "fade to black"}
]
}
you're literally choreographing a timeline
{
"duration": "10s",
"sequence": [
{"time": "0-3s", "action": "camera zooms in on subject"},
{"time": "3-7s", "action": "subject turns head slowly"},
{"time": "7-10s", "action": "fade to black"}
]
}
you're literally choreographing a timeline
10
this shift from spatial to temporal thinking is HUGE
images = "what's in the frame"
videos = "what happens WHEN in the frame"
once you internalize this... your video quality jumps dramatically because you're finally speaking the model's language instead of fighting against how it actually works
images = "what's in the frame"
videos = "what happens WHEN in the frame"
once you internalize this... your video quality jumps dramatically because you're finally speaking the model's language instead of fighting against how it actually works
11
let me break down the 5 components every photorealistic Sora 2 prompt needs:
1. scene description (spatial)
2. camera parameters (perspective)
3. motion/action (temporal)
4. lighting/atmosphere (mood)
5. temporal structure (pacing)
miss any of these and you get that "AI video" look everyone recognizes
1. scene description (spatial)
2. camera parameters (perspective)
3. motion/action (temporal)
4. lighting/atmosphere (mood)
5. temporal structure (pacing)
miss any of these and you get that "AI video" look everyone recognizes
12
here's a scene description done right:
{
"subject": "elderly craftsman in workshop",
"environment": "cluttered wooden workbench with tools",
"objects": ["vintage hand saw", "wood shavings", "half-finished chair"],
"composition": "medium shot, rule of thirds"
}
specific spatial relationships... not vague descriptions like "a nice workshop scene"
{
"subject": "elderly craftsman in workshop",
"environment": "cluttered wooden workbench with tools",
"objects": ["vintage hand saw", "wood shavings", "half-finished chair"],
"composition": "medium shot, rule of thirds"
}
specific spatial relationships... not vague descriptions like "a nice workshop scene"
13
amera parameters (this is where cinematography knowledge pays off):
{
"camera": {
"angle": "eye level, slight dutch tilt",
"movement": "slow dolly left to right",
"lens": "35mm equivalent, shallow depth of field",
"focus": "subject sharp, background soft bokeh"
}
}
Sora 2 understands real cinematography language... use it
{
"camera": {
"angle": "eye level, slight dutch tilt",
"movement": "slow dolly left to right",
"lens": "35mm equivalent, shallow depth of field",
"focus": "subject sharp, background soft bokeh"
}
}
Sora 2 understands real cinematography language... use it
14
motion and action:
{
"motion": {
"primary": "hands carefully sanding wood grain",
"secondary": "dust particles floating through light beam",
"tertiary": "workshop fan oscillating in background",
"pace": "calm, meditative"
}
}
layers of motion at different speeds create depth and realism... single-layer motion looks flat and fake
{
"motion": {
"primary": "hands carefully sanding wood grain",
"secondary": "dust particles floating through light beam",
"tertiary": "workshop fan oscillating in background",
"pace": "calm, meditative"
}
}
layers of motion at different speeds create depth and realism... single-layer motion looks flat and fake
15
lighting creates emotion (and believability):
{
"lighting": {
"source": "single window, late afternoon",
"direction": "45 degrees camera left",
"quality": "soft directional with visible god rays",
"color_temp": "warm 3200K",
"mood": "nostalgic, contemplative"
}
}
real scenes have motivated lighting... random "good lighting" screams AI
{
"lighting": {
"source": "single window, late afternoon",
"direction": "45 degrees camera left",
"quality": "soft directional with visible god rays",
"color_temp": "warm 3200K",
"mood": "nostalgic, contemplative"
}
}
real scenes have motivated lighting... random "good lighting" screams AI
16
temporal structure ties everything together:
{
"timeline": {
"0-2s": "establish wide shot of workshop",
"2-6s": "push in to medium shot, focus on hands working",
"6-8s": "rack focus to craftsman's concentrated face",
"8-10s": "pull back revealing finished piece, soft smile"
}
}
this is narrative pacing... not just "make a 10 second video"
{
"timeline": {
"0-2s": "establish wide shot of workshop",
"2-6s": "push in to medium shot, focus on hands working",
"6-8s": "rack focus to craftsman's concentrated face",
"8-10s": "pull back revealing finished piece, soft smile"
}
}
this is narrative pacing... not just "make a 10 second video"
17
now compare JSON to natural language for the same prompt...
"create a video of an elderly craftsman in a cluttered workshop with vintage tools and wood shavings, late afternoon window light from the left creating soft god rays, camera slowly dollying left to right at eye level with 35mm lens and shallow depth of field, hands carefully sanding wood while dust floats and a fan oscillates, starting wide then pushing to medium then racking focus to face then pulling back to reveal finished work..."
see how it becomes an unreadable mess?
"create a video of an elderly craftsman in a cluttered workshop with vintage tools and wood shavings, late afternoon window light from the left creating soft god rays, camera slowly dollying left to right at eye level with 35mm lens and shallow depth of field, hands carefully sanding wood while dust floats and a fan oscillates, starting wide then pushing to medium then racking focus to face then pulling back to reveal finished work..."
see how it becomes an unreadable mess?
18
JSON keeps complex prompts organized:
{
"scene": {...},
"camera": {...},
"motion": {...},
"lighting": {...},
"timeline": {...}
}
everything nested logically
nothing ambiguous
infinitely more maintainable
and here's the real power move... you can save these as templates and swap values
{
"scene": {...},
"camera": {...},
"motion": {...},
"lighting": {...},
"timeline": {...}
}
everything nested logically
nothing ambiguous
infinitely more maintainable
and here's the real power move... you can save these as templates and swap values
19
template-based workflow:
{
"scene": {
"subject": "{{SUBJECT}}",
"environment": "{{ENVIRONMENT}}",
"objects": ["{{OBJ1}}", "{{OBJ2}}", "{{OBJ3}}"]
},
"camera": {{CAMERA_PRESET_CINEMATIC}},
"lighting": {{LIGHTING_PRESET_NATURAL}}
}
systematic video generation instead of starting from scratch every time... this is AI-First thinking
{
"scene": {
"subject": "{{SUBJECT}}",
"environment": "{{ENVIRONMENT}}",
"objects": ["{{OBJ1}}", "{{OBJ2}}", "{{OBJ3}}"]
},
"camera": {{CAMERA_PRESET_CINEMATIC}},
"lighting": {{LIGHTING_PRESET_NATURAL}}
}
systematic video generation instead of starting from scratch every time... this is AI-First thinking
20
Sora 2-specific advantages you need to leverage:
- better physics understanding (fabric, water, smoke all behave realistically)
- superior multi-subject consistency (characters maintain visual identity across cuts)
- accurate reflections and shadows (environmental lighting actually works)
but you have to PROMPT for these... they're not automatic
- better physics understanding (fabric, water, smoke all behave realistically)
- superior multi-subject consistency (characters maintain visual identity across cuts)
- accurate reflections and shadows (environmental lighting actually works)
but you have to PROMPT for these... they're not automatic
21
the physics object for realistic motion:
{
"physics": {
"gravity": "earth standard",
"wind": "gentle 5mph breeze from left",
"materials": {
"fabric": "silk, flowing naturally",
"liquid": "water with realistic surface tension",
"smoke": "cigarette smoke, wispy dissipation"
}
}
}
Sora 2 understands material properties... use them
{
"physics": {
"gravity": "earth standard",
"wind": "gentle 5mph breeze from left",
"materials": {
"fabric": "silk, flowing naturally",
"liquid": "water with realistic surface tension",
"smoke": "cigarette smoke, wispy dissipation"
}
}
}
Sora 2 understands material properties... use them
22
multi-subject consistency:
{
"subjects": [
{
"id": "character_01",
"appearance": "woman, 30s, auburn hair in bun, green sweater",
"maintain_across_shots": true
},
{
"id": "character_02",
"appearance": "man, 40s, salt-pepper beard, denim jacket",
"maintain_across_shots": true
}
]
}
reference the same IDs across timeline sequences and Sora 2 keeps them consistent
{
"subjects": [
{
"id": "character_01",
"appearance": "woman, 30s, auburn hair in bun, green sweater",
"maintain_across_shots": true
},
{
"id": "character_02",
"appearance": "man, 40s, salt-pepper beard, denim jacket",
"maintain_across_shots": true
}
]
}
reference the same IDs across timeline sequences and Sora 2 keeps them consistent
23
multi-shot sequences with transitions:
{
"sequence": [
{
"shot": "01",
"setup": {...},
"duration": "5s"
},
{
"transition": "match cut on movement",
"shot": "02",
"setup": {...},
"duration": "4s"
},
{
"transition": "dissolve 1s",
"shot": "03",
"setup": {...}
}
]
}
Sora 2 understands film language... shot lists, transitions, narrative flow
{
"sequence": [
{
"shot": "01",
"setup": {...},
"duration": "5s"
},
{
"transition": "match cut on movement",
"shot": "02",
"setup": {...},
"duration": "4s"
},
{
"transition": "dissolve 1s",
"shot": "03",
"setup": {...}
}
]
}
Sora 2 understands film language... shot lists, transitions, narrative flow
24
token efficiency matters MORE as videos get longer...
a 30-second sequence in natural language might consume 500+ tokens just describing the setup
same sequence in JSON? 200 tokens
that's 300+ tokens you can spend on MORE creative direction, MORE detail, MORE control
efficiency = better outputs
a 30-second sequence in natural language might consume 500+ tokens just describing the setup
same sequence in JSON? 200 tokens
that's 300+ tokens you can spend on MORE creative direction, MORE detail, MORE control
efficiency = better outputs
25
5 practical rules for JSON video prompts:
1. start minimal, layer complexity systematically
2. test one new parameter at a time to learn its impact
3. build a template library of proven structures
4. use descriptive key names (never abbreviate for "efficiency")
5. nest related concepts together logically
1. start minimal, layer complexity systematically
2. test one new parameter at a time to learn its impact
3. build a template library of proven structures
4. use descriptive key names (never abbreviate for "efficiency")
5. nest related concepts together logically
26
mistakes that make videos look AI-generated:
- over-describing static background elements (Sora fills these naturally)
- under-describing motion and timing (this is where precision matters)
- ignoring camera movement (static = fake)
- vague lighting (unmotivated light = uncanny valley)
- no temporal structure (random pacing feels wrong)
- over-describing static background elements (Sora fills these naturally)
- under-describing motion and timing (this is where precision matters)
- ignoring camera movement (static = fake)
- vague lighting (unmotivated light = uncanny valley)
- no temporal structure (random pacing feels wrong)
27
if natural language prompting is your comfort zone... JSON feels clunky at first
like switching from mouse to keyboard shortcuts
slower initially, then 10x faster once the pattern clicks
but here's the thing... you're not abandoning creativity for structure, you're channeling creativity THROUGH structure for consistent excellence
like switching from mouse to keyboard shortcuts
slower initially, then 10x faster once the pattern clicks
but here's the thing... you're not abandoning creativity for structure, you're channeling creativity THROUGH structure for consistent excellence
28
the AI-First Brain approach to mastering this:
don't just copy templates... understand the LOGIC behind each parameter
scene -> spatial information (what + where)
camera -> POV and framing (how we see it)
motion -> temporal dynamics (what changes + when)
lighting -> mood and realism (emotional context)
timeline -> narrative pacing (story flow)
don't just copy templates... understand the LOGIC behind each parameter
scene -> spatial information (what + where)
camera -> POV and framing (how we see it)
motion -> temporal dynamics (what changes + when)
lighting -> mood and realism (emotional context)
timeline -> narrative pacing (story flow)
29
because when you understand the underlying structure... something shifts
you stop thinking "what prompt do I need for this video?"
you start thinking "what intelligence framework creates this CLASS of videos?"
you're engineering systematic video generation instead of hoping for lucky outputs
this is the difference between using AI and MASTERING AI
you stop thinking "what prompt do I need for this video?"
you start thinking "what intelligence framework creates this CLASS of videos?"
you're engineering systematic video generation instead of hoping for lucky outputs
this is the difference between using AI and MASTERING AI
30
so here's your action plan:
1. take your best natural language Sora prompt
2. convert it to JSON using the structures I shared
3. compare the outputs side by side
4. notice the difference in consistency, realism, and control
5. start building your template library
1. take your best natural language Sora prompt
2. convert it to JSON using the structures I shared
3. compare the outputs side by side
4. notice the difference in consistency, realism, and control
5. start building your template library
