Low-rank adaptation (LoRA) is a technique for fine-tuning models that has some advantages over previous methods:
Last month we blogged about faster fine-tuning of Stable Diffusion with LoRA. Our friend Simon Ryu (aka @cloneofsimo) applied the LoRA technique to Stable diffusion, allowing people to create custom trained styles from just a handful of training images, then mix and match those styles at prediction time to create highly customized images.
Fast-forward one month, and we’re seeing LoRA being applied elsewhere. Now it’s being used to fine-tune large language models like LLaMA. Earlier this month, Eric J. Wang released Alpaca-LoRA, a project which contains code for reproducing the Stanford Alpaca results using PEFT, a library that lets you take various transformers-based language models and fine-tune them using LoRA. What’s neat about this is that it allows you to fine-tune models cheaply and efficient on modest hardware, with smaller (and perhaps composable) outputs.
In this blog post, we’ll show you how to use LoRA to fine-tune LLaMA using Alpaca training data.
We’ve created a fork of the original Alpaca-LoRA repo that adds support for Cog. Cog is a tool to package machine learning models in containers and we're using it to install the dependencies to fine-tune and run the model.
Clone the repository using Git:
git clone https://github.com/daanelson/alpaca-lora
cd alpaca-lora
sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog
Put your downloaded weights in a folder called unconverted-weights
. The folder hierarchy should look something like this:
unconverted-weights
├── 7B
│ ├── checklist.chk
│ ├── consolidated.00.pth
│ └── params.json
├── tokenizer.model
└── tokenizer_checklist.chk
Convert the weights from a PyTorch checkpoint to a transformers-compatible format using this command:
cog run python -m transformers.models.llama.convert_llama_weights_to_hf \
--input_dir unconverted-weights \
--model_size 7B \
--output_dir weights
You final directory structure should look like this:
weights
├── llama-7b
└── tokenizermdki
The fine-tuning script is configured by default to work on less powerful GPUs, but if you have a GPU with more memory, you can increase MICRO_BATCH_SIZE
to 32 or 64 in finetune.py
.
If you have your own instruction tuning dataset, edit DATA_PATH
in finetune.py
to point to your own dataset. Make sure it has the same format as alpaca_data_cleaned.json
.
Run the fine-tuning script:
cog run python finetune.py
This takes 3.5 hours on a 40GB A100 GPU, and more than that for GPUs with less processing power.
$ cog predict -i prompt="Tell me something about alpacas."
Alpacas are domesticated animals from South America. They are closely related to llamas and guanacos and have a long, dense, woolly fleece that is used to make textiles. They are herd animals and live in small groups in the Andes mountains. They have a wide variety of sounds, including whistles, snorts, and barks. They are intelligent and social animals and can be trained to perform certain tasks.
Here are some ideas for what you could do next:
We can't wait to see what you build.
Follow us on Twitter to follow along. We’re going to be posting lots more guides to tinkering on open-source language models.