Canvas & Ratio
Choose your destination platform format
Layout Template
Choose a content structure for your slides
Preset Themes
Typography & Sizing
Brand Kit Customization
AGENCYConfigure brand assets for headers & footers
Outro Slide CTA
Customize your closing call-to-action slide
Background Pattern
Build Your Carousel
Drag and drop any post card below onto a slide, or use the quick buttons to insert content/images instantly!

🔥 Holy shit... Apple just did something nobody saw coming They just dropped Pico-Banana-400K a 400,000-image dataset for text-guided image editing that might redefine multimodal training itself. Here’s the wild part: Unlike most “open” datasets that rely on synthetic generations, this one is built entirely from real photos. Apple used their internal Nano-Banana model to generate edits, then ran everything through Gemini 2.5 Pro as an automated visual judge for quality assurance. Every image got scored on instruction compliance, realism, and preservation and only the top-tier results made it in. It’s not just a static dataset either. It includes: • 72K multi-turn sequences for complex editing chains • 56K preference pairs (success vs fail) for alignment and reward modeling • Dual instructions both long, training-style prompts and short, human-style edits You can literally train models to add a new object, change lighting to golden hour, Pixar-ify a face, or swap entire backgrounds and they’ll learn from real-world examples, not synthetic noise. The kicker? It’s completely open-source under Apple’s research license. They just gave every lab the data foundation to build next-gen editing AIs. Everyone’s been talking about reasoning models… but Apple just quietly dropped the ImageNet of visual editing. 👉 github. com/apple/pico-banana-400k


Apple didn’t just drop a dataset they built an entire machine that edits and evaluates itself. Nano-Banana does the edits. Gemini 2.5 Pro judges the results. Failures are retried automatically until they pass. The pipeline literally runs end-to-end with zero humans in the loop.


The examples are unreal 🤯 “Turn the woman into a Pixar 3D cartoon.” “Change the yellow flower to purple.” “Make it snow.” And the model nails every one perfectly preserving the original photo’s lighting and composition. This is what real multimodal alignment looks like.


They didn’t just dump random edits. The dataset is systematically mapped into 35 real-world edit types covering everything from global tone changes to human stylization and object relocation. It’s like teaching an AI every single Photoshop skill in existence.


The coverage is insane: - Add or remove objects - Swap backgrounds - Change clothes, expressions, or weather - Even “Simpsonize” a person 400,000 examples, all validated by another AI for instruction faithfulness and visual realism. Nothing like this exists.


Here’s the clever part Apple kept the mistakes. Every failed edit is paired with the successful one. So instead of just training models to “do better,” you can train them to know what better looks like. That’s how you build judgment into multimodal systems.


You can even watch the model reason across multiple edits. Start with a pumpkin. Add a vintage film grain. Replace the dark background with a haunted house. Make it snow. Then warm it up with golden-hour lighting. All chained together no human editing.


What’s wild is how consistent the results are. Global style edits? ~93% success. Removing or replacing objects? ~83%. Fine geometry, layout, or text edits? Still shaky around ~60%. Typography remains the hardest problem in multimodal AI by far.


Pico-Banana-400K isn’t just another dataset. It’s proof that AI can now generate and verify its own training data at scale, with precision, and no human supervision. Apple just quietly built the foundation for the next decade of multimodal learning. <a target="_blank" href="http://arxiv.org/abs/2510.19808" color="blue">arxiv.org/abs/2510.19808</a>