AI video generation has gotten really good.
Some of the best video models like tencent/hunyuan-video are open-source, and the community has been hard at work building on top of them. We've adapted the Musubi Tuner by @kohya_tech to run on Replicate, so you can fine-tune HunyuanVideo on your own visual content.
Never Gonna Give You Up animal edition, courtesy of @flngr and @fofr.
HunyuanVideo is good at capturing the style of the training data, not only in the visual appearance of the imagery and the color grading, but also in the motion of the camera and the way the characters move.
This in-motion style transfer is unique to this implementation: other video models that are trained only on images cannot capture it.
Here are some examples of videos created using different fine-tunes, all with the same settings, size, prompt and seed:
You can make your own fine-tuned video model to:
In this post, we'll show you how to gather training data, create a fine-tuned video model, and generate videos with it.
Prefer to learn by watching? Check out Sakib's 5-minute video demo on YouTube.
To train a video model, you'll need a dataset of video clips and text captions describing each video.
This process can be time-consuming, so we've created a model to make it easier: zsxkib/create-video-dataset takes a video file or YouTube URL as input, slices it into smaller clips, and generates captions for each clip.
Here's how to create training data right in your browser with just a few clicks:
RCKRLL
. Avoid using real words that have existing associations.Optional: Check out the logs from your training run if you want to see the auto-generated captions for each clip.
Now you'll create your own fine-tuned video generation model using the training data you just compiled.
input_videos
input, upload the ZIP file you just downloaded.RCKRLL
Training typically takes about 5-10 minutes with default settings, but depends on the size and number of clips.
Once the training is complete, you can generate new videos in several ways:
You can run your model as an API with just a few lines of code.
Here's an example using the replicate-javascript client:
import Replicate from "replicate"
const replicate = new Replicate()
const model = "your-username/your-model:your-model-version"
const prompt = "A lion dancing on a subway train the style of RCKRLL"
const output = await replicate.run(model, {input: { prompt }})
console.log(output)
Video fine-tuning is pretty new, so we're still learning what works best.
Here are some early tips:
max_steps
to control training duration preciselyIf you want to automate the process or build applications, you can use our API.
Here's an example of how to train a new model programmatically using the Replicate Python client:
import replicate
import time
# Create a training dataset from a video
dataset = replicate.run(
"zsxkib/create-video-dataset:4eb83cc8ba563da7032933374444a9a7a6f630b5b1e4f219cf9088f6a4acc138",
input={
"video_url": "YOUR_VIDEO_URL",
"trigger_word": "UNIQUE_TRIGGER",
"start_time": 10,
"end_time": 40,
"num_segments": 8,
"autocaption": True,
"autocaption_prefix": "a video of UNIQUE_TRIGGER,"
}
)
# Create a new model to store the training results
model = replicate.models.create(
owner="your-username",
name="your-model-name",
visibility="public",
hardware="gpu-t4"
)
# Start training with the processed video
training = replicate.trainings.create(
model="zsxkib/hunyuan-video-lora",
version="04279caf015c30a635cabc4077b5bd82c5c706262eb61797a48db139444bcca9", # Current model version ID
input={
"input_videos": dataset.url,
"trigger_word": "UNIQUE_TRIGGER",
"epochs": 2,
"batch_size": 8,
},
destination="your-username/your-model-name", # Where to push the trained model
)
# Wait for training to complete
while training.status not in ["succeeded", "failed", "canceled"]:
training.reload()
time.sleep(10) # Wait 10 seconds between checks
if training.status != "succeeded":
raise Exception(f"Training failed: {training.error}")
# Generate new videos with your fine-tuned model
output = replicate.run(
training.output['version'],
input={
"prompt": "A video of UNIQUE_TRIGGER in a cyberpunk city",
"num_frames": 45,
"frame_rate": 24
}
)
Fine-tuning video models is in its early days, so we don't really know yet what is possible, and what might be able to be built on top of it.
Give it a try and show us what you've made on Discord, or tag @replicate on X.