All API requests must include a valid API token in the Authorization
request header. The token must be prefixed by "Bearer", followed by a space and the token value.
Example: Authorization: Bearer r8_Hw***********************************
Find your tokens at https://replicate.com/account/api-tokens
Returns information about the user or organization associated with the provided API token.
Example cURL request:
The response will be a JSON object describing the account:
Example cURL request:
The response will be a paginated JSON list of collection objects:
Example cURL request:
The response will be a collection object with a nested list of the models in that collection:
super-resolution
or image-restoration
. See replicate.com/collections.Get a list of deployments associated with the current account, including the latest release configuration for each deployment.
Example cURL request:
The response will be a paginated JSON array of deployment objects, sorted with the most recent deployment first:
Create a new deployment:
Example cURL request:
The response will be a JSON object describing the deployment:
hardware.list
endpoint.Delete a deployment
Deployment deletion has some restrictions:
Example cURL request:
The response will be an empty 204, indicating the deployment has been deleted.
Get information about a deployment by name including the current release.
Example cURL request:
The response will be a JSON object describing the deployment:
Update properties of an existing deployment, including hardware, min/max instances, and the deployment's underlying model version.
Example cURL request:
The response will be a JSON object describing the deployment:
Updating any deployment properties will increment the number
field of the current_release
.
hardware.list
endpoint.Create a prediction for the deployment and inputs you provide.
Example cURL request:
The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a "starting"
state and need to be retrieved using the predictions.get
endpiont.
For a complete overview of the deployments.predictions.create
API check out our documentation on creating a prediction which covers a variety of use cases.
Leave the request open and wait for the model to finish generating output. Set to wait=n
where n is a number of seconds between 1 and 60.
See https://replicate.com/docs/topics/predictions/create-a-prediction#sync-mode for more information.
The model's input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the "API" tab on the model you are running or get the model version and look at its openapi_schema
property. For example, stability-ai/sdxl takes prompt
as an input.
Files should be passed as HTTP URLs or data URLs.
Use an HTTP URL when:
Use a data URL when:
This field is deprecated.
Request a URL to receive streaming output using server-sent events (SSE).
This field is no longer needed as the returned prediction will always have a stream
entry in its url
property if the model supports streaming.
An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.
By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter
in the prediction request:
start
: immediately on prediction startoutput
: each time a prediction generates an output (note that predictions can generate multiple outputs)logs
: each time log output is generated by a predictioncompleted
: when the prediction reaches a terminal state (succeeded/canceled/failed)For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:
{
"input": {
"text": "Alice"
},
"webhook": "https://example.com/my-webhook",
"webhook_events_filter": ["start", "completed"]
}
Requests for event types output
and logs
will be sent at most once every 500ms. If you request start
and completed
webhooks, then they'll always be sent regardless of throttling.
Example cURL request:
The response will be a JSON array of hardware objects:
Get a paginated list of public models.
Example cURL request:
The response will be a paginated JSON array of model objects:
The cover_image_url
string is an HTTPS URL for an image file. This can be:
Create a model.
Example cURL request:
The response will be a model object in the following format:
Note that there is a limit of 1,000 models per account. For most purposes, we recommend using a single model and pushing new versions of the model as you make changes to it.
hardware.list
endpoint.Get a list of public models matching a search query.
Example cURL request:
The response will be a paginated JSON object containing an array of model objects:
The cover_image_url
string is an HTTPS URL for an image file. This can be:
Delete a model
Model deletion has some restrictions:
Example cURL request:
The response will be an empty 204, indicating the model has been deleted.
Example cURL request:
The response will be a model object in the following format:
The cover_image_url
string is an HTTPS URL for an image file. This can be:
The default_example
object is a prediction created with this model.
The latest_version
object is the model's most recently pushed version.
Create a prediction for the deployment and inputs you provide.
Example cURL request:
The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a "starting"
state and need to be retrieved using the predictions.get
endpiont.
For a complete overview of the deployments.predictions.create
API check out our documentation on creating a prediction which covers a variety of use cases.
Leave the request open and wait for the model to finish generating output. Set to wait=n
where n is a number of seconds between 1 and 60.
See https://replicate.com/docs/topics/predictions/create-a-prediction#sync-mode for more information.
The model's input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the "API" tab on the model you are running or get the model version and look at its openapi_schema
property. For example, stability-ai/sdxl takes prompt
as an input.
Files should be passed as HTTP URLs or data URLs.
Use an HTTP URL when:
Use a data URL when:
This field is deprecated.
Request a URL to receive streaming output using server-sent events (SSE).
This field is no longer needed as the returned prediction will always have a stream
entry in its url
property if the model supports streaming.
An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.
By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter
in the prediction request:
start
: immediately on prediction startoutput
: each time a prediction generates an output (note that predictions can generate multiple outputs)logs
: each time log output is generated by a predictioncompleted
: when the prediction reaches a terminal state (succeeded/canceled/failed)For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:
{
"input": {
"text": "Alice"
},
"webhook": "https://example.com/my-webhook",
"webhook_events_filter": ["start", "completed"]
}
Requests for event types output
and logs
will be sent at most once every 500ms. If you request start
and completed
webhooks, then they'll always be sent regardless of throttling.
Example cURL request:
The response will be a JSON array of model version objects, sorted with the most recent version first:
Delete a model version and all associated predictions, including all output files.
Model version deletion has some restrictions:
Example cURL request:
The response will be an empty 202, indicating the deletion request has been accepted. It might take a few minutes to be processed.
Example cURL request:
The response will be the version object:
Every model describes its inputs and outputs with OpenAPI Schema Objects in the openapi_schema
property.
The openapi_schema.components.schemas.Input
property for the replicate/hello-world model looks like this:
The openapi_schema.components.schemas.Output
property for the replicate/hello-world model looks like this:
For more details, see the docs on Cog's supported input and output types
Start a new training of the model version you specify.
Example request body:
Example cURL request:
The response will be the training object:
As models can take several minutes or more to train, the result will not be available immediately. To get the final result of the training you should either provide a webhook
HTTPS URL for us to call when the results are ready, or poll the get a training endpoint until it has finished.
When a training completes, it creates a new version of the model at the specified destination.
To find some models to train on, check out the trainable language models collection.
A string representing the desired model to push to in the format {destination_model_owner}/{destination_model_name}
. This should be an existing model owned by the user or organization making the API request. If the destination is invalid, the server will return an appropriate 4XX response.
An object containing inputs to the Cog model's train()
function.
By default, we will send requests to your webhook URL whenever there are new outputs or the training has finished. You can change which events trigger webhook requests by specifying webhook_events_filter
in the training request:
start
: immediately on training startoutput
: each time a training generates an output (note that trainings can generate multiple outputs)logs
: each time log output is generated by a trainingcompleted
: when the training reaches a terminal state (succeeded/canceled/failed)For example, if you only wanted requests to be sent at the start and end of the training, you would provide:
{
"destination": "my-organization/my-model",
"input": {
"text": "Alice"
},
"webhook": "https://example.com/my-webhook",
"webhook_events_filter": ["start", "completed"]
}
Requests for event types output
and logs
will be sent at most once every 500ms. If you request start
and completed
webhooks, then they'll always be sent regardless of throttling.
Get a paginated list of predictions that you've created. This will include predictions created from the API and the website. It will return 100 records per page.
Example cURL request:
The response will be a paginated JSON array of prediction objects, sorted with the most recent prediction first:
id
will be the unique ID of the prediction.
source
will indicate how the prediction was created. Possible values are web
or api
.
status
will be the status of the prediction. Refer to get a single prediction for possible values.
urls
will be a convenience object that can be used to construct new API requests for the given prediction. If the requested model version supports streaming, this will have a stream
entry with an HTTPS URL that you can use to construct an EventSource
.
model
will be the model identifier string in the format of {model_owner}/{model_name}
.
version
will be the unique ID of model version used to create the prediction.
data_removed
will be true
if the input and output data has been deleted.
Create a prediction for the model version and inputs you provide.
Example cURL request:
The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a "starting"
state and need to be retrieved using the predictions.get
endpiont.
For a complete overview of the predictions.create
API check out our documentation on creating a prediction which covers a variety of use cases.
Leave the request open and wait for the model to finish generating output. Set to wait=n
where n is a number of seconds between 1 and 60.
See https://replicate.com/docs/topics/predictions/create-a-prediction#sync-mode for more information.
The model's input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the "API" tab on the model you are running or get the model version and look at its openapi_schema
property. For example, stability-ai/sdxl takes prompt
as an input.
Files should be passed as HTTP URLs or data URLs.
Use an HTTP URL when:
Use a data URL when:
This field is deprecated.
Request a URL to receive streaming output using server-sent events (SSE).
This field is no longer needed as the returned prediction will always have a stream
entry in its url
property if the model supports streaming.
An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.
By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter
in the prediction request:
start
: immediately on prediction startoutput
: each time a prediction generates an output (note that predictions can generate multiple outputs)logs
: each time log output is generated by a predictioncompleted
: when the prediction reaches a terminal state (succeeded/canceled/failed)For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:
{
"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"input": {
"text": "Alice"
},
"webhook": "https://example.com/my-webhook",
"webhook_events_filter": ["start", "completed"]
}
Requests for event types output
and logs
will be sent at most once every 500ms. If you request start
and completed
webhooks, then they'll always be sent regardless of throttling.
Get the current state of a prediction.
Example cURL request:
The response will be the prediction object:
status
will be one of:
starting
: the prediction is starting up. If this status lasts longer than a few seconds, then it's typically because a new worker is being started to run the prediction.processing
: the predict()
method of the model is currently running.succeeded
: the prediction completed successfully.failed
: the prediction encountered an error during processing.canceled
: the prediction was canceled by its creator.In the case of success, output
will be an object containing the output of the model. Any files will be represented as HTTPS URLs. You'll need to pass the Authorization
header to request them.
In the case of failure, error
will contain the error encountered during the prediction.
Terminated predictions (with a status of succeeded
, failed
, or canceled
) will include a metrics
object with a predict_time
property showing the amount of CPU or GPU time, in seconds, that the prediction used while running. It won't include time waiting for the prediction to start.
All input parameters, output values, and logs are automatically removed after an hour, by default, for predictions created through the API.
You must save a copy of any data or files in the output if you'd like to continue using them. The output
key will still be present, but it's value will be null
after the output has been removed.
Output files are served by replicate.delivery
and its subdomains. If you use an allow list of external domains for your assets, add replicate.delivery
and *.replicate.delivery
to it.
Get a paginated list of trainings that you've created. This will include trainings created from the API and the website. It will return 100 records per page.
Example cURL request:
The response will be a paginated JSON array of training objects, sorted with the most recent training first:
id
will be the unique ID of the training.
source
will indicate how the training was created. Possible values are web
or api
.
status
will be the status of the training. Refer to get a single training for possible values.
urls
will be a convenience object that can be used to construct new API requests for the given training.
version
will be the unique ID of model version used to create the training.
Get the current state of a training.
Example cURL request:
The response will be the training object:
status
will be one of:
starting
: the training is starting up. If this status lasts longer than a few seconds, then it's typically because a new worker is being started to run the training.processing
: the train()
method of the model is currently running.succeeded
: the training completed successfully.failed
: the training encountered an error during processing.canceled
: the training was canceled by its creator.In the case of success, output
will be an object containing the output of the model. Any files will be represented as HTTPS URLs. You'll need to pass the Authorization
header to request them.
In the case of failure, error
will contain the error encountered during the training.
Terminated trainings (with a status of succeeded
, failed
, or canceled
) will include a metrics
object with a predict_time
property showing the amount of CPU or GPU time, in seconds, that the training used while running. It won't include time waiting for the training to start.
Get the signing secret for the default webhook endpoint. This is used to verify that webhook requests are coming from Replicate.
Example cURL request:
The response will be a JSON object with a key
property:
We limit the number of API requests that can be made to Replicate:
If you hit a limit, you will receive a response with status 429
with a body like:
If you want higher limits, contact us.
Replicate's public HTTP API documentation is available as a machine-readable OpenAPI schema in JSON format.
See OpenAPI schema to learn more and download the schema.