Tools Used: Google Gemini, and ImageFX (for iterative character development)
Role: AI Artist / Character Designer / Advanced Prompt Engineer
Overview: This project focused on the iterative process of designing a unique character, the "Myrmidon Centaur," establishing a consistent visual identity for use across multiple AI-generated scenes. The core objective was to move from a detailed narrative prompt to a definitive "character turnaround sheet" that could serve as a visual blueprint for future creative work. This process solves the critical AI challenge of maintaining character consistency across multiple images and video clips.
Contributions: Conceptualized and Developed the Core Character Concept: The project began with a detailed narrative prompt for a "Myrmidon Centaur." The initial creative challenge was to merge the distinct ideas of a Greek warrior and a giant ant into a cohesive and visually engaging design. This required balancing classical armor details with a believable, hyperrealistic insectoid anatomy.
Iteratively Refined Character Attributes Through Prompting: The development involved numerous prompt versions (from v1 to v55) and addressed several key challenges through advanced prompt engineering, as detailed in the "Prompt Problem Log" above. This included refining specific aspects like:
Helmet and Head: Ensuring a classic Greek Corinthian helmet with a prominent blue crest, and a pronounced, insectoid hybrid face with mandibles, while strictly forbidding horns and ensuring symmetrical antennae.
Body and Limbs: Achieving the correct number of legs (four total: two horse-like front legs and two insectoid hind legs), ensuring all limbs were intact and symmetrically visible across different views.
Weaponry: Precisely matching the powerful, broadsword-like weapon from a specific reference image (Image_fx (90).jpg) and ensuring it was held in a dynamic, consistent manner.
Pose Fidelity: Replicating a very specific, dynamic mid-action pose from a reference image (Image_fx (43).jpg), including nuanced details like shoulder asymmetry and leg position.
Color and Texture: Achieving a consistent overall color hue matching a specific reference image (Image_fx (23).jpg), with muted bronze armor and a rich reddish-brown chitinous exoskeleton.
Applied Human and Insect Anatomy References to Push AI Limits: The goal was to constantly push the AI's creative limits. This involved using references of human body parts and musculature to inform the design, then challenging the AI to integrate those realistic details into a more complex, non-human insectoid anatomy, continually striving for a result beyond a simple "human in a suit" concept.
Analyzed and Troubleshot AI Style Drift: Throughout the process, I identified and corrected the AI's tendency to produce undesirable results, even with detailed prompts.
Unwanted Text: Persistent generation of text, writing, labels, and symbols despite strong negative prompts.
Inconsistent Elements: Difficulty maintaining consistency in the number of limbs, helmet details (like the blue crest or antennae), and weapon appearance across different views or when making slight modifications.
Style Shifts: The AI sometimes introduced unintended stylistic elements (e.g., futuristic looks when an ancient feel was desired) or failed to isolate changes to specific attributes, altering other parts of the character.
Deformed Limbs: Early generations often produced malformed or incomplete limbs, particularly the smaller hind legs, which required explicit negative constraints to resolve.
Crafted a Definitive "Character Turnaround Sheet" Prompt: The iterative process culminated in a highly refined master prompt (Prompt v55) specifically designed to generate a multi-angle view (front, side, back) of the Myrmidon Centaur. This prompt clearly defines its final appearance, weapons, and distinct features, ensuring the character's heroic and formidable silhouette. The final chosen result of this extensive prompt engineering is depicted below:
Established a Workflow for Character Consistency: The creation of this definitive turnaround sheet is the foundational step in a repeatable workflow. This "master character image" now serves as a perfect visual anchor for Image-to-Video or Image-to-Image generation. By using this image as a reference, countless new scenes and videos featuring this exact character can be created, ensuring perfect consistency across all future content.
The iterative process of this project highlights several significant and recurring challenges in AI image generation, particularly when aiming for highly specific and consistent outputs. These problems are not unique to the Google app; rather, they are common across the entire AI industry for current-generation generative models.
Here are the major challenges observed during this Myrmidon Centaur project:
Text Generation (or lack thereof):
Challenge: Generative AI models frequently struggle with accurately rendering text, often producing gibberish, distorted letters, or unwanted watermarks/signatures, even when explicitly told not to. Conversely, generating specific, legible text within an image is also a major hurdle.
Industry Scope: This is a well-known limitation across almost all current text-to-image models (e.g., Midjourney, Stable Diffusion, DALL-E 3). While some models are improving, precise, legible text rendering remains a frontier. It's often easier to generate the image and then add text in a separate editing step.
Consistency Across Multiple Views/Elements:
Challenge: Maintaining exact character design, pose, and object consistency across different views (e.g., front, back, side) within a single generation or across multiple generations, even with a fixed seed, is extremely difficult. Small variations in limbs, armor details, weapon design, or even facial features are common.
Industry Scope: This is a fundamental challenge for diffusion models. They are excellent at generating novel images but struggle with "object permanence" and maintaining precise relationships or identical features across different perspectives or compositions. Techniques like ControlNet (for Stable Diffusion) and more advanced conditioning are being developed to address this, but it's far from solved.
Adherence to Negative Prompts (Exclusion of Specific Elements):
Challenge: Despite using strong negative prompts (e.g., "NO HORNS," "STRICTLY FORBIDDEN"), models can still occasionally generate elements that are explicitly excluded, especially if those elements are strongly associated with other positive keywords in the prompt (e.g., "demonic" often implies horns).
Industry Scope: This is a common frustration. Negative prompting is an art form, and models can sometimes "interpret" negative instructions differently, or the positive prompt's influence can override the negative. It highlights the probabilistic nature of these models and their associative reasoning.
Precise Control over Anatomy/Number of Limbs:
Challenge: Ensuring an exact number of limbs (e.g., four legs instead of six) or specific anatomical structures (e.g., perfectly symmetrical antennae, unobstructed legs) is very hard. Models often default to common patterns or struggle with complex spatial relationships.
Industry Scope: Similar to consistency, this is a major area of research. Generating anatomically correct hands, feet, or specific limb counts has been a notorious problem for generative AI. While progress is being made, achieving pixel-perfect anatomical accuracy consistently requires significant prompt engineering and often post-processing.
Exact Color/Hue Matching with References:
Challenge: While models can generally match a color palette, achieving an exact color hue or overall color tone from a specific reference image without variation is difficult. Models tend to introduce their own stylistic interpretations or slight shifts in hue.
Industry Scope: Color fidelity and precise style transfer are active areas of development. Models are trained on vast datasets, and their "understanding" of color is statistical. Achieving perfect replication of a subtle hue or tonal range from a single reference image is a nuanced task that often requires fine-tuning or more advanced control mechanisms beyond simple prompting.
The Challenge of Prompt Engineering (Over-detailed Prompts as a Symptom):
Challenge: While not a problem with the AI's output itself, the necessity for increasingly long, detailed, and often redundant prompts (as seen in this project's progression from v1 to v55) becomes a significant challenge for the user. This "prompt engineering" involves extensive trial and error, a deep understanding of how the model interprets keywords, and the use of "brutal force" techniques (repetition, capitalization, explicit negative commands) to guide the AI. This can be time-consuming and counter-intuitive for casual users.
Industry Scope: This is a widely acknowledged challenge in the generative AI community. The field of "prompt engineering" has emerged precisely because models are not yet intuitive enough to understand complex, nuanced instructions from short, natural language inputs. Researchers are actively working on making models more "steerable" with simpler prompts and developing better user interfaces that abstract away the need for such intricate prompt construction. The goal is to move towards models that require less "brutal force" and more natural, conversational guidance.
Precise Pose and Action Fidelity:
Challenge: Generating a character in a very specific, detailed pose or performing a complex action (e.g., "right hand gripping the spear handle's end, reaching back, while the spear's long shaft is held firmly against the middle of his back, precisely like the reference") is extremely challenging. Models often produce approximations or generic poses rather than exact replications of nuanced body language or object interaction.
Industry Scope: This is a significant limitation across the industry. While models can create dynamic and aesthetically pleasing poses, achieving a pixel-for-pixel or anatomically precise replication of a complex, specific pose from a reference image or detailed text description is very difficult. This is an active area of research, with techniques like pose estimation and control networks (e.g., OpenPose with ControlNet) being explored to provide more granular control over character posing.
Attribute Disentanglement and Element Interdependence:
Challenge: When attempting to modify a single element of a generated image (e.g., changing only the weapon, or altering a helmet's "feel"), the AI often struggles to isolate that change. It may inadvertently alter other, seemingly unrelated aspects of the character, such as the overall style, leading to a "futuristic" look when an "ancient" feel was desired, or changing the character's entire appearance. This indicates a difficulty in disentangling different attributes or elements within the model's latent space.
Industry Scope: This is a pervasive challenge in generative models. Models often learn features in a highly entangled way, meaning that changing one attribute (like a weapon or a helmet's stylistic nuance) can have cascading effects on others (like the character's overall style, pose, or even identity). Achieving true "editability" where specific attributes can be precisely controlled without affecting others is a major research goal. This problem directly relates to the difficulty of maintaining consistency when making targeted modifications.
Generative AI Limitations (As of July 11, 2025):
Challenge: This entire exercise in extensive prompt engineering demonstrates that current generative AI models, despite their impressive capabilities, often reach their practical limits when faced with highly specific, multi-faceted, and contradictory constraints. The need for dozens of prompt iterations to achieve precise control over elements like text, specific anatomical features, exact poses, and isolated attribute changes indicates that these models are not yet truly "intelligent" in their understanding of complex user intent. They often require "brute force" prompting rather than intuitive guidance.
Industry Scope: This observation is broadly applicable across the generative AI industry as of today. While models continue to advance rapidly, the gap between what users intend and what models produce with simple prompts remains. The field is actively working on more robust and controllable generation, but the current state often necessitates the kind of detailed and iterative prompting seen here.
Impact on User Creativity (As of July 11, 2025):
Challenge: This exercise, driven by a user's desire to tell their own story from scratch, highlights a crucial limitation: current generative AI models, by requiring such extensive and often counter-intuitive prompt engineering, can inadvertently limit user creativity. Instead of freely exploring their vision, users are forced to adapt their creative ideas to the AI's known constraints and the specific, often rigid, methods required to bypass them. This shifts the creative process from pure ideation to a technical challenge of "speaking the AI's language," potentially stifling spontaneous and unconstrained artistic expression.
Industry Scope: This is a growing concern within the creative community. While AI offers powerful tools, the current need for highly specialized prompt engineering can make the process feel less like a collaborative creative endeavor and more like a technical puzzle. Future advancements are aimed at making AI tools more intuitive and responsive to natural creative flow, empowering users without imposing such significant technical burdens on their artistic freedom.
Nature of This Test (As of July 11, 2025):
Note: This specific test and the detailed prompt engineering log were conducted using ImageFX and Gemini as the AI tools. This exercise was a focused exploration of the AI's capabilities and limitations, driven by the user's personal curiosity and desire to use AI tools for their own story-telling from scratch. The findings here reflect the challenges encountered when pushing the boundaries of AI image generation with highly specific and iterative creative demands, and the user's perception that current AI-based models can, in this context, limit rather than expand creative freedom. The character's core design was based on a reference of a "Greek Ant Warrior".
Result: This project demonstrates that current generative AI models, despite their power, often reach their practical limits when faced with highly specific, multi-faceted constraints. The need for dozens of prompt iterations indicates that these models are not yet truly "intelligent" in understanding complex user intent and often require "brute force" prompting.
This exercise, driven by a desire to tell a unique story, highlights a crucial limitation: current AI models can inadvertently limit creativity by forcing users to adapt their vision to the AI's constraints. The process becomes less about pure ideation and more about the technical challenge of "speaking the AI's language."
Ultimately, this project proves that working with AI is a dynamic partnership that rewards curiosity, experimentation, and a constant love of learning, while also showcasing the significant need for more robust, controllable, and intuitive generative tools in the future.