What Makes a Cow: ControlNet Sprite Pipeline

What Makes a Cow: ControlNet Sprite Pipeline
Our LoRA has some winners. It also thinks cows have five legs. The style is holding across a wide variety of creatures and environments but the model has opinions about animal anatomy that don't match reality.

The other problem is post-processing. After trying a few methods to enforce specific hex colors, I've decided that sampling the image for colors outside of our black ink strokes, and removing them all for a transparent background is most consistent. Shaders can address color later.
ControlNet for Structure
The LoRA doesn't know how many legs a cow should have. That structural knowledge lives in the base SDXL model, and fine-tuning for style partially overwrites it.
ControlNet lets you condition generation on a reference image. Feed it a stick figure with four legs, it generates a doodle with four legs.
Using controlnet_union_promax.safetensors with scribble conditioning, I drew crude stick-figure templates per archetype. A quadruped is four lines, a rectangle, and a circle. Six templates cover the majority of creatures because the LoRA and prompt handle species details.
Strength Matters
At 0.5, the cow had four legs but lost its cow-ness.
At 0.3, the cow-ness came back. Structure was still better than no ControlNet, but the model had freedom to add spots, ears, and whatever else makes a cow a cow.

Not Everything Needs a Skeleton
The LoRA already draws amorphous creatures well. Forcing a structural template on a jellyfish is fighting millions of years of evolution.
ControlNet is opt-in per archetype. Quadrupeds, bipeds, insects get templates. Fish, invertebrates, and blobs of jelly are skipped.
What Stuck
Archetype templates, not per-creature. One quadruped template works for wolves and cows.
Anatomy in the prompt. "Four legs, two ears, one tail" alongside "thick black outlines, white background."
Fix post-processing Better pixel manipulation beats fancier generation tools.
