Tools Used: Google Whisk, ImageFX, Google Flow, Veo 3 (within Gemini), advanced prompt engineering, collaborative AI workflow
Role: AI Artist / Director / Advanced Prompt Engineer
Overview: This project documented an extensive iterative process of creating a single-shot, AI-generated animated clip. The core objective was to move from a simple concept to a precisely controlled final product, solving critical AI challenges related to character consistency, dynamic physical actions, and narrative logic. The process serves as a detailed log of the current limitations of generative AI video models and the advanced prompting and workflow required to overcome them.
Contributions:
Conceptualized and Developed the Core Character: The project began with the idea of a simple rabbit, which evolved into a more complex and unique character: a muscular ant warrior with specific Greek armor and a sword. We iteratively refined the character's appearance to be more consistent and powerful.
Iteratively Refined Narrative, Logic, and Consistency Through Prompting: We conducted a series of over 20 prompt versions to address several key AI failures. This included refining specific aspects like:
Action Logic: Overcoming the AI's inability to render a single, decisive "chop" and its tendency to perform multiple swings or combine illogical actions.
Object Permanence: We solved the AI's failures with object consistency, ensuring the apple remained on the stick after the cut and didn't mysteriously change form or fall to the ground.
Hallucination: We used explicit, negative constraints to prevent the AI from adding unintended elements like extra swords or shields, which it often did based on its "warrior" bias.
Analyzed and Troubleshot AI Workflow and Model Behavior: Identified and documented key AI behaviors and limitations.
Prompt Language: Discovered that active, concise language works best for directing the AI. We removed passive voice and ambiguous terms, which were causing inconsistent outputs.
Creative Bias: Recognized that the AI defaults to what it considers a "cinematic" or "good" shot, even if it contradicts the prompt. For instance, the AI would generate multiple swings for a single chop to make the action look more dynamic.
Physical Logic: Documented the AI's struggle with realistic physics, such as the top half of the apple flying off-screen while the bottom half stays on the stick.
Prompt Refinement is Critical: The process from v1 to v23 shows that a high degree of precision in prompt engineering is necessary to guide the AI away from its creative biases and toward a desired result. The failures we documented—such as the AI hallucinating extra objects or misinterpreting the physical actions of the character—serve as valuable data points for other creators.
The AI's Creative Gaps: The project highlights that even with a detailed prompt, the AI lacks human intuition for logical sequencing and consistent physics. The ant's multiple swings for a single chop and the illogical movement of the apple are prime examples of the AI's creative gaps.
A Hybrid Workflow is Necessary: The project proves that a successful AI animation workflow is a hybrid process that requires a human to constantly troubleshoot, refine the prompt, and guide the AI. The AI is a powerful tool, but it's not a creative director. The overall result of the project, as documented in the case study, is not a perfect video but a detailed log of the current limitations in AI video generation. This project successfully demonstrates that achieving a specific, powerful action with consistent physical logic and object permanence is a significant hurdle for current AI models.
Q: Why did the AI struggle with a "single powerful chop?"
A: The AI struggled because it lacks a true understanding of physics. It often defaults to cinematic tropes it has been trained on, such as a warrior performing multiple preparatory swings before a final chop. This is a form of creative bias, where the AI's learned associations override the prompt's specific instructions.
Q: Why did the AI sometimes add a shield or extra swords?
A: This is an example of hallucination and creative bias. The AI associates a "Greek warrior" with a shield and multiple weapons. Even with a prompt that doesn't mention a shield, the AI would sometimes add one. To prevent this, we had to be very explicit and use negative constraints in the prompt.
Q: Why was the apple's physics so inconsistent (e.g., splitting into four pieces, bouncing off-screen)?
A: This highlights the AI's lack of physical and temporal consistency. The AI was losing track of the apple after the chop. It was trying to satisfy multiple, contradictory instructions at once, leading to illogical and inconsistent visual events.