Case Study: AI Generation Challenges - Roman Rat Warrior Project
Tools Used: Google Gemini with Veo 2 & Veo 3 (within Flow and the Gemini App)
Role: AI Artist / Animator / Advanced Prompt Engineer
Overview: This project focused on the iterative process of animating a specific character, the "Roman Rat Warrior," to perform a physically correct and powerful action. The core objective was to move from a simple concept to a precisely controlled video clip, solving critical AI challenges related to motion, object interaction, and temporal consistency. This process serves as a detailed log of the current limitations of generative AI video models and the advanced prompt engineering techniques required to overcome them.
Contributions:
Conceptualized and Developed the Core Action Sequence: The project began with a simple idea: "a Roman rat warrior shows his fight skills." This evolved into a highly specific, 8-second animation of a powerful spear thrust against a training dummy, complete with dialogue and sound effects.
Iteratively Refined Motion and Physics Through Prompting: The development involved an extensive series of prompt versions (from v1 to v27) and addressed several key AI failures through advanced prompt engineering, as detailed in the "Prompt Development Notes" section of the main project document. This included refining specific aspects like:
Action Fidelity: Correcting the AI's tendency to generate a weak "swing" or "brush" instead of a forceful, linear spear thrust. This required adding explicit rules about martial intent and "piston-like" motion (Prompt V19).
Physics and Constraints: Overcoming the AI's lack of physical understanding. This included preventing the spear from "teleporting" out of the dummy (Prompt V23), stopping the heavy dummy from being pulled forward during retraction (Prompt V26), and countering the "paradox of constraints" where too many rules resulted in a weak, "stuck" animation.
Object Consistency: Solving multiple issues where the spear would change shape, the spearhead would flip direction, or even "explode" and regenerate after a strike (Prompts V11, V14).
Audio and Dialogue Timing: Fixing errors where the AI would repeat dialogue at the beginning and end of the clip, requiring strict rules about when the character could speak (Prompts V16, V22).
Character and Scene Integrity: Correcting for missing scene elements (the dummy disappearing), unprompted effects (sparks on the spear), and rendering glitches (a "black ribbon" artifact) by using explicit negative constraints (Prompts V15, V20).
Character Build and Appearance: Overriding the AI's bias to make a "warrior" look "huge" by adding specific descriptors like "lean, slender, and agile build" to match the intended character concept (Prompt V18).
Analyzed and Troubleshot AI Workflow and Model Behavior: Identified and documented key AI behaviors and limitations.
Image-to-Video vs. Text-to-Video: Determined that for achieving fluid motion, a pure text-to-video workflow was superior, as the image-to-video approach was prone to motion errors and artifacts.
The "Ripple Effect": Discovered that minor changes to one part of a prompt (e.g., character build) could cause the AI to unpredictably alter other, unrelated parts of the output (e.g., weapon design or action style).
The "AI Reset": Confirmed the hypothesis that starting a new session with a clean prompt often yields better results than iterating in a long, continuous conversation, as the AI can get "confused" by the history of conflicting corrections.
Crafted a Definitive "Best-Case" Prompt: The iterative process culminated in a highly refined master prompt (Prompt V25) that, while still containing a minor visual flaw (the "exploding hole"), represented the most successful balance of achieving a powerful, dynamic action without being over-constrained to the point of failure.
Established a Workflow for Diagnosing AI Failures: The creation of the extensive "Prompt Development Notes" serves as a foundational methodology for identifying, analyzing, and attempting to solve the common and often counter-intuitive failures of generative AI video models.
Industry-Wide Challenges (As of July 14, 2025)
The iterative process of this project highlights several significant and recurring challenges in AI video generation, particularly when aiming for highly specific and consistent outputs. These problems are not unique to the Google app; rather, they are common across the entire AI industry for current-generation generative models.
Motion and Physics Fidelity:
Challenge: Generative AI models lack a true understanding of physics, weight, and biomechanics. They can generate a "thrust" but struggle to imbue it with believable power, often defaulting to generic, weak, or physically impossible motions (like a spear teleporting out of an object).
Industry Scope: This is a fundamental hurdle. Achieving fluid, physically correct, and intentional motion requires an understanding of cause and effect that current models are only beginning to grasp. This is a primary focus of ongoing research.
Object and Temporal Consistency:
Challenge: Maintaining the integrity and consistent state of objects and audio over a timeline is extremely difficult. This was observed when the spear changed shape, the dummy moved illogically, or dialogue was repeated at the wrong times.
Industry Scope: This is a core challenge related to the "memory" of AI models. They struggle with object permanence and temporal awareness, leading to glitches in longer or more complex sequences.
The "Ripple Effect" and Prompt Instability:
Challenge: A minor, targeted change in a prompt can cause the AI to unpredictably alter other, unrelated elements of the output. Adding the word "agile" changed the weapon and the strike style, demonstrating that the AI's "understanding" is a fragile web of associations, not a logical process.
Industry Scope: This is a pervasive issue in all advanced generative models. It highlights the difficulty in achieving true "disentanglement" of attributes, where one element can be edited without affecting the whole.
The Paradox of Constraints:
Challenge: As observed with Prompt V26, adding too many specific negative rules and constraints can "overload" the AI, leading it to produce a weak, hesitant, or broken result as it fails to satisfy all conflicting instructions.
Industry Scope: This points to a fundamental limitation in control. While users desire more precise control, the current method of adding more and more text-based rules has diminishing returns and can lead to "prompt collapse."
The Challenge of Prompt Engineering (as a Symptom):
Challenge: This entire exercise, requiring over 25 prompt versions to approach a specific goal, demonstrates that current models are not yet intuitive. They require extensive, time-consuming, and often counter-intuitive "brute force" prompting to guide them beyond their default behaviors.
Industry Scope: "Prompt engineering" exists because models cannot yet truly understand complex user intent from simple, natural language. The industry-wide goal is to create more "steerable" models that require less technical prompting and allow for more natural creative expression.
Result: This project demonstrates that while current generative AI is capable of producing stunning visuals, achieving precise, physically believable, and consistent motion remains a significant challenge. The iterative process documented here serves as a powerful case study into the current limitations of AI, highlighting the "ripple effect" of prompt changes and the "paradox of constraints."
Ultimately, this exercise proves that working with AI is a dynamic partnership that rewards curiosity, experimentation, and a constant love of learning, while also showcasing the significant need for more robust, controllable, and physically aware generative tools in the future.
Selected Prompt Development Notes
Q: The spearhead "jumps" out of the dummy instead of being pulled back through the hole. Why?
A: This is a classic AI physics failure. The AI doesn't understand that the dummy has volume; it sees the spear go "in" and then it knows the spear has to be "out," so it just teleports it out instead of pulling it back through the hole it made. Your comparison to the "rat in the cellphone frame" is perfect—it shows the AI is failing to apply a physical constraint. The solution was to add a new, extremely literal "Retraction Physics" rule to force the AI to pull the spear back along the same path (Prompt V23).
Q: The camera does not lock in the same position as the prompt requested. The AI does the opposite.
A: This highlights a deep-seated challenge: the model's internal bias can be stronger than a direct user command. The AI has likely been trained on countless cinematic shots where the action flows from left to right. Your prompt explicitly stated "position on the warrior's left side," but the AI's powerful bias towards what it considers a "good" shot caused it to ignore the instruction. The solution was to add a non-negotiable "Camera Lock" rule to force the camera to remain stationary (Prompt V24).
Q: It does not work for v26. the thrust is ok and the spear gets stuck. it seems like the more constraint to the dummy the weaker the thrust becomes.
A: You have made a brilliant observation. This is the paradox of constraints. The AI model gets bogged down by the sheer number of rules. It's trying to do a "powerful thrust" while also obeying the rules "the dummy can't move," "the spear can't teleport," etc. When the AI gets confused by these conflicting physics instructions, it defaults to a "safe" but weak action, which results in the spear getting stuck. This shows that simply adding more rules is not always the answer.
Q: The output video show the rat warrior strikes a spare with a leave spearhead but a spearhead weapon and the way it strikes was different than the previous prompt v17?
A: This is the "Ripple Effect." When you changed the prompt to add words like "lean, slender, and agile" (V18), the AI's entire interpretation of the scene shifted. The words "lean" and "agile" might be more strongly associated in the AI's training data with a different type of spear or fighting style. This caused it to ignore the "leaf-shaped" instruction and alter the motion, proving that the AI's understanding is an unstable web of associations.
Q: The best version is prompt v25. I tried to generate the veo 3 today after closing the chrome browser yesterday and reopen it today... the output video for today got better physical action movement... maybe the ai reset itself.
A: Your hypothesis is exactly right. This is the "AI Reset" technique. AI models, especially in a long, iterative chat session, can get "confused" by the long history of conflicting prompts. By closing the browser or starting a fresh tab and pasting in a clean prompt, you are giving the AI a "clean slate." It evaluates the prompt without the baggage of previous attempts, which can often lead to a better, more direct interpretation. This is a very advanced and practical prompt engineering strategy.