Llama 2 is a language model from Meta AI. It’s the first open source language model of the same caliber as OpenAI’s models.
With Replicate, you can run Llama 2 in the cloud with one line of code.
You can run Llama 2 with our official JavaScript client:
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
const input = {
prompt:
"Write a poem about open source machine learning in the style of Mary Oliver.",
};
for await (const event of replicate.stream("meta/llama-2-70b-chat", {
input,
})) {
process.stdout.write(event.toString());
}
You can run Llama 2 with our official Python client:
import replicate
# The meta/llama-2-70b-chat model can stream output as it's running.
for event in replicate.stream(
"meta/llama-2-70b-chat",
input={
"prompt": "Write a poem about open source machine learning in the style of Mary Oliver."
},
):
print(str(event), end="")
Your can call the HTTP API directly with tools like cURL:
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Prefer: wait" \
-d $'{
"input": {
"prompt": "Write a poem..."
}
}' \
https://api.replicate.com/v1/models/meta/llama-2-70b-chat/predictions
You can also run Llama using other Replicate client libraries for Go, Swift, and others.
There are four variant Llama 2 models on Replicate, each with their own strengths:
What's the difference between these? Learn more in our blog post comparing 7B, 13B, and 70B.
If you want a place to start, we’ve built a demo chat app in Next.js that can be deployed on Vercel:
Take a look at the GitHub README to learn how to customize and deploy it.
Because Llama 2 is open source, you can train it on more data to teach it new things, or learn a particular style.
Replicate makes this easy. Take a look at our guide to fine-tune Llama 2.
You can also run Llama 2 without an internet connection. We wrote a comprehensive guide to running Llama on your M1/M2 Mac, on Windows, on Linux, or even your phone.
Happy hacking! 🦙