Case Study: AI Animation - "Rabbit Kicks Panda's Butt"
Tools Used: Google Gemini App with Veo, ImageFX, Affinity Photo, Google Flow, Google Whisk
Role: AI Artist / Director / Advanced Prompt Engineer
Overview: This project focused on the iterative process of creating a two-clip, silent animated short to convey a single, humorous punchline. The core objective was to move from a simple concept to a precisely controlled final product, solving critical AI challenges related to dynamic physical actions, character consistency, and narrative logic. This process serves as a detailed log of the current limitations of generative AI video models and the advanced prompting and workflow required to overcome them.
Contributions:
Conceptualized and Developed the Core Narrative Sequence: The project began with the core idea of a fish jumping into the panda's mouth, which evolved into a specific, two-part story: a tiny rabbit (the protagonist) performs a surprising airborne kick on a giant, clumsy panda (the comedic foil), leading to a funny reaction.
Iteratively Refined Narrative, Logic, and Consistency Through Prompting: The development involved an extensive series of prompt versions and addressed several key AI failures through advanced prompt engineering. This included refining specific aspects like:
Character Consistency: Ensured the panda and rabbit maintained a consistent appearance, including the panda's distinctive black and white coat and the rabbit's large, expressive green eyes, by using identical character descriptions in every prompt.
Object and Scene Consistency: Solved inconsistencies with the environment by refining the scene description to always include the panda on a muddy shore and, initially, the rabbit on a single, heart-shaped lotus leaf. The lotus leaf was eventually removed to simplify the scene and reduce complexity.
Physics and Narrative Logic: Overcame the AI's lack of physical understanding by simplifying the action. The original goal of a multi-stage kick and jolt was refined to focus on a single, clear action: the rabbit in an airborne kick pose, with the panda reacting. This was a critical step in making the animation achievable for the AI.
Analyzed and Troubleshot AI Workflow and Model Behavior: Identified and documented key AI behaviors and limitations.
Image-to-Video Workflow: Determined that the most effective workflow is to create a perfect master reference image first and then align a new video generation prompt with that visual, giving the AI a strong visual guide for action.
The "Creative Bias" Problem: Identified that the AI would often default to what it considered a "good" or visually plausible shot (e.g., a rabbit kicking several times or the panda's butt behaving like a solid object) even if it contradicted the prompt's intended logic.
Industry-Wide Challenges (As of July 26, 2025)
The iterative process of this project highlights several significant and recurring challenges in AI video generation, particularly when aiming for highly specific and consistent outputs.
Motion and Physics Fidelity:
Challenge: The AI struggled with a true understanding of physics, demonstrated by its inability to render a fluid "flying kick" with a single, comical "jolt" in response to impact.
Industry Scope: This is a fundamental hurdle. Achieving physically correct and intentional motion requires an understanding of cause and effect that current models are only beginning to grasp.
Object and Temporal Consistency:
Challenge: Maintaining the integrity and consistent state of objects (such as the lotus leaf and the appearance of the characters) was a recurring issue when generating separate clips. The AI was also unable to maintain consistent object size, with the fish changing size within an 8-second clip.
Industry Scope: This is a core challenge related to the "memory" of AI models. They struggle with object permanence and temporal awareness, leading to glitches in longer or more complex sequences.
Unprompted Element Generation (Hallucination):
Challenge: The AI added extra elements to the scene that the prompt did not request, such as a different number of lotus leaves or an unprompted background prop. This is a form of hallucination, where the AI generates plausible but unrequested content.
Industry Scope: This highlights a challenge in controlling every aspect of the AI's output. The AI, based on its training, might add elements it associates with a scene (e.g., more leaves in a pond) even if not explicitly instructed, requiring very direct and explicit negative constraints in the prompt to prevent it.
The "Visual Override" Problem:
Challenge: Starting with a scene that included both the panda and the rabbit, the AI was unable to isolate just the rabbit in a subsequent shot. The visual information from the initial image overrode the new prompt's instruction to only show the rabbit, demonstrating the AI's bias towards visual cues.
Industry Scope: This is a significant challenge in workflows that require a character to be isolated from a previously generated, multi-character scene. It highlights a limitation where the AI's visual memory can be stronger than its ability to follow new, text-based instructions.
Result: While generative AI excels at creating visually impressive clips, this project underscores that achieving a consistent, multi-clip narrative is a significant hurdle. The detailed iteration documented in this case study proves that success requires more than just text prompts; it demands a sophisticated, hybrid workflow.
Selected Prompt Development Notes
Q: Why did the AI struggle with the "flying kick" action?
A: The AI struggles because it lacks an understanding of physics. It doesn't know how to render a precise trajectory, impact, and a resulting "jolt" in a single, fluid animation. It often defaults to generic, inconsistent movements.
Q: Why was the lotus leaf a problem?
A: The lotus leaf, despite being a minor element, added an extra layer of physical complexity. The AI had to account for its position and reaction to the rabbit's leap, which made the prompt more difficult to process accurately. Removing it simplified the scene and focused the AI's attention on the core action.
Q: Why did the AI have a visual bias towards certain actions?
A: The AI tends to default to what it considers a "good shot" based on its training data. For example, it might generate the kick in a follow-through motion because that's a common depiction of a kick, even if the prompt asks for a "frozen moment of impact." This requires very explicit and sometimes counter-intuitive instructions to override.
Q: How many attempts did it take to achieve this final, simple action animation?
A: The final animation was not the result of a single, countable number of attempts. It was the culmination of an iterative process involving numerous rounds of trial, error, and prompt refinement. Each attempt was a learning opportunity to diagnose AI limitations (e.g., struggles with physics, object consistency) and simplify the prompt's instructions until a successful and reliable visual output was achieved. The final, simple prompt was a direct result of this continuous troubleshooting workflow.