Retry webhook requests if error happens
Derrick Mehaffy
So I looked into the webhook implementation after asking @pierreburgy on a zoom meetup and it looks like it's quite basic, only one try is done when an event is received:
https://github.com/strapi/strapi/blob/master/packages/strapi/lib/services/webhook-runner.js#L77-L113
The problem with this approach is that it happens quite often that a webhook destination fails for many reasons like temporary downtime / maintenance.
An easy solution would be to use p-retry and retry a few times.
More complicated solutions from https://zapier.com/engineering/webhook-design/
Create a DB queue: When you send webhooks inline, all your code is in one place. With this option, you’ll be creating a record in your existing database for each notification you need to send. Rather than initiating a connection immediately, you’ll need a separate process to frequently look for new notifications and send them. Since the end user isn’t waiting, timeouts are not as big of a concern. And if you need to retry a notification, just keep it marked active until it succeeds or is called the maximum number of times.
Use a proper queue: If you know you’ll need to scale your solution, use a tool specifically designed for that. You can use open source scalable queueing solutions like RabbitMQ or a service like Amazon Simple Queuing Service. This way, your interaction is limited to adding and removing “messages,” which tell you what webhooks to call. Like the DB queue, you need a separate process to consume items from the queue and send notifications. In addition to using a tool designed for this purpose, a proper queue also saves database resources for what it does best–providing data to your primary application.
Wilson
Another benefit of implementing proper queue (rather than just an in-memory queue) is that we can make Strapi "serverless". Right now, making Strapi serverless have a downside that webhooks are not guaranteed to run because the worker queue is an in-memory queue, so if the serverless Strapi finishes whatever task it sets out to do, it clears the memory and that job (i.e webhooks) will not be guaranteed to run.
Plus, the file on the post is outdated. Here is the link to the new file that implements the work queue: https://github.com/strapi/strapi/blob/main/packages/core/strapi/lib/services/worker-queue.js