You can now fine-tune open-source video models

Posted January 24, 2025 by

AI video generation has gotten really good.

Some of the best video models like tencent/hunyuan-video are open-source, and the community has been hard at work building on top of them. We've adapted the Musubi Tuner by @kohya_tech to run on Replicate, so you can fine-tune HunyuanVideo on your own visual content.

Never Gonna Give You Up animal edition, courtesy of @flngr and @fofr.

HunyuanVideo is good at capturing the style of the training data, not only in the visual appearance of the imagery and the color grading, but also in the motion of the camera and the way the characters move.

This in-motion style transfer is unique to this implementation: other video models that are trained only on images cannot capture it.

Here are some examples of videos created using different fine-tunes, all with the same settings, size, prompt and seed:

You can make your own fine-tuned video model to:

  • Create videos in a specific visual style
  • Generate animations of particular characters
  • Capture specific types of motion or movement
  • Build custom video effects

In this post, we'll show you how to gather training data, create a fine-tuned video model, and generate videos with it.

Note

Prefer to learn by watching? Check out Sakib's 5-minute video demo on YouTube.

Prerequisites

  • A Replicate account
  • A video or YouTube URL to use as training data

Step 1: Create your training data

To train a video model, you'll need a dataset of video clips and text captions describing each video.

This process can be time-consuming, so we've created a model to make it easier: zsxkib/create-video-dataset takes a video file or YouTube URL as input, slices it into smaller clips, and generates captions for each clip.

Here's how to create training data right in your browser with just a few clicks:

  1. Find a YouTube URL (or video file) that you want to use for training.
  2. Go to replicate.com/zsxkib/create-video-dataset
  3. Paste your video URL, or upload a video file from your computer.
  4. Choose a unique trigger word like RCKRLL. Avoid using real words that have existing associations.
  5. Click Run and download the resulting ZIP file.

Optional: Check out the logs from your training run if you want to see the auto-generated captions for each clip.

Step 2: Train your model

Now you'll create your own fine-tuned video generation model using the training data you just compiled.

  1. Go to replicate.com/zsxkib/hunyuan-video-lora/train
  2. Choose a name for your model.
  3. For the input_videos input, upload the ZIP file you just downloaded.
  4. Enter the same trigger word you used before, e.g. RCKRLL
  5. Adjust training settings (we recommend starting with 2 epochs)
  6. Click Create training

Training typically takes about 5-10 minutes with default settings, but depends on the size and number of clips.

Step 3: Run your model

Once the training is complete, you can generate new videos in several ways:

  • Run the model in your browser directly from your model's page.
  • Run your model in Replicate's Playground: Go to "Manage models" and type your model name.
  • Use the API: Go to your model's page and click the API tab for code snippets.

You can run your model as an API with just a few lines of code.

Here's an example using the replicate-javascript client:

import Replicate from "replicate"
 
const replicate = new Replicate()
 
const model = "your-username/your-model:your-model-version"
const prompt = "A lion dancing on a subway train the style of RCKRLL"
const output = await replicate.run(model, {input: { prompt }})
console.log(output)

Step 4: Experiment for best results

Video fine-tuning is pretty new, so we're still learning what works best.

Here are some early tips:

  • Use a unique trigger word that doesn't have associations with real words.
  • Experiment with training settings:
    • More epochs == better quality but longer training time
    • Adjust the LoRA rank
    • Increase batch size to speed up training
    • Use max_steps to control training duration precisely
  • If training looks like it's going to take several hours, cancel it and try:
    • Reducing the number of epochs
    • Reducing the rank
    • Increasing batch size
  • Check the GitHub README for detailed parameter explanations

Extra credit: Train new models programmatically

If you want to automate the process or build applications, you can use our API.

Here's an example of how to train a new model programmatically using the Replicate Python client:

import replicate
import time
 
# Create a training dataset from a video
dataset = replicate.run(
    "zsxkib/create-video-dataset:4eb83cc8ba563da7032933374444a9a7a6f630b5b1e4f219cf9088f6a4acc138",
    input={
        "video_url": "YOUR_VIDEO_URL",
        "trigger_word": "UNIQUE_TRIGGER",
        "start_time": 10,
        "end_time": 40,
        "num_segments": 8,
        "autocaption": True,
        "autocaption_prefix": "a video of UNIQUE_TRIGGER,"
    }
)
 
# Create a new model to store the training results
model = replicate.models.create(
    owner="your-username",
    name="your-model-name",
    visibility="public",
    hardware="gpu-t4"
)
 
# Start training with the processed video
training = replicate.trainings.create(
    model="zsxkib/hunyuan-video-lora",
    version="04279caf015c30a635cabc4077b5bd82c5c706262eb61797a48db139444bcca9",  # Current model version ID
    input={
        "input_videos": dataset.url,
        "trigger_word": "UNIQUE_TRIGGER",
        "epochs": 2,
        "batch_size": 8,
    },
    destination="your-username/your-model-name",  # Where to push the trained model
)
 
# Wait for training to complete
while training.status not in ["succeeded", "failed", "canceled"]:
    training.reload()
    time.sleep(10)  # Wait 10 seconds between checks
 
if training.status != "succeeded":
    raise Exception(f"Training failed: {training.error}")
 
# Generate new videos with your fine-tuned model
output = replicate.run(
    training.output['version'],
    input={
        "prompt": "A video of UNIQUE_TRIGGER in a cyberpunk city",
        "num_frames": 45,
        "frame_rate": 24
    }
)

What's next?

Fine-tuning video models is in its early days, so we don't really know yet what is possible, and what might be able to be built on top of it.

Give it a try and show us what you've made on Discord, or tag @replicate on X.