In this blog post we’ll show you how to combine AnimateDiff and the ST-MFNet frame interpolator to create smooth and realistic videos from a text prompt. You can also specify camera movements using new controls.
You’ll go from a text prompt to a video, to a high-framerate video.
AnimateDiff is a model that enhances existing text-to-image models by adding a motion modeling module. The motion module is trained on video clips to capture realistic motion dynamics. It allows Stable Diffusion text-to-image models to create animated outputs, ranging from anime to realistic photographs.
You can try AnimateDiff on Replicate.
LoRAs provide an efficient way to speed up the fine-tuning process of big models without using much memory. They are most well known for Stable Diffusion models, they are lightweight extensions to a model for a style or subject. The same concept can be applied to an AnimateDiff motion module.
The original AnimateDiff authors have trained 8 new LoRAs for specific camera movements:
Using the Replicate hosted model you can use all of these, and choose how strong their affect will be (between 0 and 1). You can also combine multiple camera movements and strengths to create specific effects.
In this example we used the 'toonyou_beta3' model with a zoom-in strength of 1 (view and tweak these settings):
Interpolation adds extra frames to a video. This increases the frame rate and makes the video smoother.
ST-MFNet is a ‘spatio-temporal multi-flow network for frame interpolation’, which is a fancy way of saying it's a machine learning model that generates extra frames for a video. It does this by studying the changes in space (position of objects) and time (from one frame to another). The "multi-flow" part means it's considering multiple ways things can move or change from one frame to the next. ST-MFNet works very well with AnimateDiff videos.
You can take a 2 second, 16 frames-per-second (fps) AnimateDiff video and increase it to 32 or 64 fps using ST-MFNet:
You can also turn it into a slow-motion 4 second video:
In this video we used the 'realisticVisionV20_v20' model with a landscape prompt. We kept the prompt and seed the same but changed the camera movement each time, then interpolated the videos:
You can use the Replicate API to combine multiple models into a workflow, taking the output of one model and using it as input to another model.
import replicate
# Initialize the Replicate API with the token
replicate.init(api_token='YOUR_REPLICATE_API_TOKEN')
print("Using AnimateDiff to generate a video")
output = replicate.run(
"zsxkib/animate-diff:269a616c8b0c2bbc12fc15fd51bb202b11e94ff0f7786c026aa905305c4ed9fb",
input={"prompt": "a medium shot of a vibrant coral reef with a variety of marine life"}
)
video = output[0]
print(video)
# https://pbxt.replicate.delivery/HnKtEcfWIoTIby5mGUufWwrXfHZ5VLpAnIHERSrNuiVAzfqGB/0-amediumshotofa.mp4
print("Using ST-MFNet to interpolate the video")
videos = replicate.run(
"zsxkib/st-mfnet:2ccdad61a6039a3733d1644d1b71ebf7d03531906007590b8cdd4b051e3fbcd7",
input={"mp4": video, "keep_original_duration": True, "framerate_multiplier": 4},
)
video = list(videos_list)[-1]
print(video)
# https://pbxt.replicate.delivery/VgwJdbh4NTZKEZpAaDhbzni1DGxzXOrHrCz5clFXIIGXOyaE/tmpaz7xlcls0-amediumshotofa_2.mp4
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
console.log("Using AnimateDiff to generate a video");
const output = await replicate.run(
"zsxkib/animate-diff:269a616c8b0c2bbc12fc15fd51bb202b11e94ff0f7786c026aa905305c4ed9fb",
{ input: { prompt: "a medium shot of a vibrant coral reef with a variety of marine life" } }
);
const video = output[0];
console.log(video);
// https://pbxt.replicate.delivery/HnKtEcfWIoTIby5mGUufWwrXfHZ5VLpAnIHERSrNuiVAzfqGB/0-amediumshotofa.mp4
console.log("Using ST-MFNet to interpolate the video");
const videos = await replicate.run(
"zsxkib/st-mfnet:2ccdad61a6039a3733d1644d1b71ebf7d03531906007590b8cdd4b051e3fbcd7",
{
input: {
mp4: video,
keep_original_duration: true,
framerate_multiplier: 4
}
}
);
console.log(videos[1]);
// https://pbxt.replicate.delivery/VgwJdbh4NTZKEZpAaDhbzni1DGxzXOrHrCz5clFXIIGXOyaE/tmpaz7xlcls0-amediumshotofa_2.mp4
You can also use the CLI for Replicate to create a workflow:
export REPLICATE_API_TOKEN="..."
replicate run zsxkib/st-mfnet --web \
keep_original_duration=true \
framerate_multiplier=4 \
mp4="$(replicate run zsxkib/animate-diff \
prompt="a medium shot of a vibrant coral reef with a variety of marine life" | \
jq -r '.output | join("")')"
# Opens https://replicate.com/p/p2j74vlbv464cojdne6sol6gq4
Have you used AnimateDiff and ST-MFNet to make a video? Great! We'd love to see it.
Share your videos with us on Discord or tweet them @replicate. Let's see what you've got!