Sending monitoring signals over public internet is inherently unreliable. HTTP requests can sometimes take excessively long or fail completely for a variety of reasons. Here are some general tips to make your monitoring code more robust.
Put a time limit on how long each ping is allowed to take. This is especially important when sending a "start" signal at the start of a job: you don't want a stuck ping prevent the actual job from running. Another case is a continuously running worker process which pings Omnivista Healthchecks after each completed item. A stuck request would block the whole process, so it is important to guard against.
Specifying the timeout depends on the tool you use. curl, for example, has the
--max-time
(shorthand: -m
) parameter:
# Send a HTTP, 10 second timeout:
curl -m 10 https://healthchecks.dev.myovcloud.com/ping/your-uuid-here
To minimize the amount of false alerts you get from Omnivista Healthchecks, instruct your HTTP client to retry failed requests several times.
Specifying the retry policy depends on the tool you use. curl, for example, has the
--retry
parameter:
# Retry up to 5 times, uses an increasing delay between each retry (1s, 2s, 4s, 8s, ...)
curl --retry 5 https://healthchecks.dev.myovcloud.com/ping/your-uuid-here
Make sure you know how your HTTP client handles failed requests. For example, if you use a HTTP library which raises exceptions, decide if you want to catch the exceptions, or let them bubble up.