We've been playing with Alibaba's WAN2.1 text-to-video model lately. Like most image and video generation models, Wan has a lot of input parameters, and each of them can have a profound impact on the quality of the generated output.
What happens when you tweak those mysterious inputs? Let's find out.
We wanted to see how the guidance scale
and shift
input parameters affect the output. For our experiment, we used the the WAN2.1 14b text-to-video model with 720p resolution.
To do this, we did what's called a "parameter sweep", systematically testing different combinations of input values to understand how they affect the output. We generated videos for each combination of guidance scale and shift values, keeping all other parameters constant.
We kept the following inputs consistent across all the videos:
prompt
: "A smiling woman walking in London at night"
seed
: 42
frames
: 81
sample_steps
: 30
We then varied just these two inputs, testing against a range of values:
sample_guide_scale
: from 0 to 10sample_shift
: from 1 to 9If you'd like to run similar experiments yourself, we've shared the code on GitHub that we used to generate these parameter sweeps.
You can think of the guide scale as the "creativity vs obedience" knob.
At guide_scale=0
, the model ignores your prompt.
As you increase the value, the model tries harder to match your prompt.
Here's what happens when you dial it from 0 to 10:
Shift controls how the model moves through the denoising process, affecting motion and time flow in your video.
It's basically controlling the "flow of time" in your generated video.
Here's what happens when you change shift from 1 to 9:
For guide scale:
guide_scale=0
: Really weird but cool outputs. Creative but barely related to the prompt.guide_scale=1-2
: Strange artifacts, especially around the woman's mouth.guide_scale=3-7
: 👈 The sweet spot. Natural looking with minimal issues.guide_scale=8+
: The dreaded "AI look" creeps in - that overcooked, shiny skin that screams "I was made by AI."Recommendation: Use 0 for weird creative stuff, 3-7 for realistic results, and avoid 8+ unless you want that AI shine.
For shift values (all with guide_scale=5
):
shift=1
: Creates a cool "dolly effect" where the background warps but the person looks real.shift=3-6
: Shows varied women (different skin tones, all brunettes) positioned on the left side with a zoomed-out perspective.shift=7-9
: Consistently shows a blonde woman on the right side of the frame, with surprisingly similar results across these values.Higher shift values tend to look better overall, but the differences are more subtle than with guide scale changes.
Getting these parameters right makes the difference between an amateur-looking video and something that looks almost professional.
Most people just use the defaults, but knowing how to tweak these gives you way more control over your outputs.
Now you don't have to guess anymore.
Got any parameters you're curious about? Let us know!