| Thread Navigator

Canvas & Ratio

Choose your destination platform format

Layout Template

Choose a content structure for your slides

Preset Themes

Typography & Sizing

Font Family

Title Font Size36px

Body Font Size18px

Header & Footer Size12px

Brand Kit Customization

AGENCY

Configure brand assets for headers & footers

MULTI-PROFILES (AGENCY)

Active Brand Profile

Show Brand Watermark

Brand Watermark Text

Social Handle

Brand Logo URL (PNG) AGENCY

SAVE PRESETS (AGENCY)

Save current as Preset

Outro Slide CTA

Customize your closing call-to-action slide

CTA Title

CTA Message & Emojis

Custom CTA Buttons

Background Pattern

Source Content

Build Your Carousel

Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

Drag Post #1

Alex Prompter

@alex_prompter

🔥 Holy shit... Apple just did something nobody saw coming They just dropped Pico-Banana-400K a 400,000-image dataset for text-guided image editing that might redefine multimodal training itself. Here’s the wild part: Unlike most “open” datasets that rely on synthetic generations, this one is built entirely from real photos. Apple used their internal Nano-Banana model to generate edits, then ran everything through Gemini 2.5 Pro as an automated visual judge for quality assurance. Every image got scored on instruction compliance, realism, and preservation and only the top-tier results made it in. It’s not just a static dataset either. It includes: • 72K multi-turn sequences for complex editing chains • 56K preference pairs (success vs fail) for alignment and reward modeling • Dual instructions both long, training-style prompts and short, human-style edits You can literally train models to add a new object, change lighting to golden hour, Pixar-ify a face, or swap entire backgrounds and they’ll learn from real-world examples, not synthetic noise. The kicker? It’s completely open-source under Apple’s research license. They just gave every lab the data foundation to build next-gen editing AIs. Everyone’s been talking about reasoning models… but Apple just quietly dropped the ImageNet of visual editing. 👉 github. com/apple/pico-banana-400k

Apply Image

Drag Post #2

Alex Prompter

@alex_prompter

Apple didn’t just drop a dataset they built an entire machine that edits and evaluates itself. Nano-Banana does the edits. Gemini 2.5 Pro judges the results. Failures are retried automatically until they pass. The pipeline literally runs end-to-end with zero humans in the loop.

Apply Image

Drag Post #3

Alex Prompter

@alex_prompter

The examples are unreal 🤯 “Turn the woman into a Pixar 3D cartoon.” “Change the yellow flower to purple.” “Make it snow.” And the model nails every one perfectly preserving the original photo’s lighting and composition. This is what real multimodal alignment looks like.

Apply Image

Drag Post #4

Alex Prompter

@alex_prompter

They didn’t just dump random edits. The dataset is systematically mapped into 35 real-world edit types covering everything from global tone changes to human stylization and object relocation. It’s like teaching an AI every single Photoshop skill in existence.

Apply Image

Drag Post #5

Alex Prompter

@alex_prompter

The coverage is insane: - Add or remove objects - Swap backgrounds - Change clothes, expressions, or weather - Even “Simpsonize” a person 400,000 examples, all validated by another AI for instruction faithfulness and visual realism. Nothing like this exists.

Apply Image

Drag Post #6

Alex Prompter

@alex_prompter

Here’s the clever part Apple kept the mistakes. Every failed edit is paired with the successful one. So instead of just training models to “do better,” you can train them to know what better looks like. That’s how you build judgment into multimodal systems.

Apply Image

Drag Post #7

Alex Prompter

@alex_prompter

You can even watch the model reason across multiple edits. Start with a pumpkin. Add a vintage film grain. Replace the dark background with a haunted house. Make it snow. Then warm it up with golden-hour lighting. All chained together no human editing.

Apply Image

Drag Post #8

Alex Prompter

@alex_prompter

What’s wild is how consistent the results are. Global style edits? ~93% success. Removing or replacing objects? ~83%. Fine geometry, layout, or text edits? Still shaky around ~60%. Typography remains the hardest problem in multimodal AI by far.

Apply Image

Drag Post #9

Alex Prompter

@alex_prompter

Pico-Banana-400K isn’t just another dataset. It’s proof that AI can now generate and verify its own training data at scale, with precision, and no human supervision. Apple just quietly built the foundation for the next decade of multimodal learning. <a target="_blank" href="http://arxiv.org/abs/2510.19808" color="blue">arxiv.org/abs/2510.19808</a>