Receive webhooks

Replicate can send an HTTP POST request to the URL you specified whenever the prediction is created, has new logs, new output, or is completed.

The request body is a prediction object in JSON format. This object has the same structure as the object returned by the get a prediction API. Here's an example of an unfinished prediction:

{
  "id": "ufawqhfynnddngldkgtslldrkq",
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "created_at": "2022-04-26T22:13:06.224088Z",
  "started_at": null,
  "completed_at": null,
  "status": "starting",
  "input": {
    "text": "Alice"
  },
  "output": null,
  "error": null,
  "logs": null,
  "metrics": {}
}

Refer to Prediction status for the list of possible status values.

Here's an example of a Next.js webhook handler:

// pages/api/replicate-webhook.js
export default async function handler(req, res) {
  console.log("🪝 incoming webhook!", req.body.id);
  const prediction = req.body;
  await saveToMyDatabase(prediction);
  await sendSlackNotification(prediction);
  res.end();
}

Your endpoint should respond with a 2xx status code within a few seconds; otherwise, the webhook might be retried.

Filtering webhook events

By default, Replicate sends requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying a webhook_events_filter array in the JSON body of the prediction request.

start: immediately on prediction start
output: each time a prediction generates an output (note that predictions can generate multiple outputs)
logs: each time log output is generated by a prediction
completed: when the prediction reaches a terminal state (succeeded/canceled/failed)

For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:

{
  "input": {
    "text": "Alice"
  },
  "webhook": "https://example.com/my-webhook",
  "webhook_events_filter": ["start", "completed"]
}

Requests for event types output and logs will be sent at most once every 500ms. If you request start and completed webhooks, then they'll always be sent regardless of throttling.

Retries

When Replicate sends the terminal webhook for a prediction (where the status is succeeded, failed or canceled), we check the response status we get. If we can't make the request at all, or if we get a 4xx or 5xx response, we'll retry the webhook. We retry several times on an exponential backoff. The final retry is sent about 1 minute after the prediction completed.

We do not retry any webhooks for intermediate states.

Idempotency

Make webhook handlers idempotent. Identical webhooks can be sent more than once, so you'll need handle potentially duplicate information.

Ordering

In rare cases, webhooks for a single prediction may arrive out of order. We recommend you include logic in your application to ignore all webhooks for a prediction after the terminal webhook (where the status is succeeded, failed or canceled). If you are using output or logs events, you may also want to ignore webhooks that regress the status of the prediction (for example, by emitting less output or fewer logs).