How to Detect Cron Failure Before It Breaks Things
By Aradhna
Cron jobs fail quietly. There's no error page, no alert, no user complaint — just a task that stopped running sometime last Tuesday and nobody noticed until invoices weren't sent, backups didn't complete, or a database grew wild. Knowing how to detect cron failure is one of the most underrated skills in keeping a production system healthy.
This guide covers why cron jobs fail silently, the methods available to catch them, and how to set up monitoring that actually works.
Why Cron Failures Are So Easy to Miss
Unlike a crashed web server or a broken deployment, a failing cron job produces almost nothing by default. No log entry in your application dashboard. No red LED on a status board. Most cron configurations on Linux systems will send output to the local mail spool — which nobody reads.
Common reasons a cron job stops working without fanfare:
- Silent exit codes — the script exits with
0even after an internal error
- Missing environment variables — cron runs in a stripped-down shell; paths, secrets, or
$HOMEmay not exist
- Permissions changes — a file or directory was rotated, archived, or had ownership changed
- Dependency failure — a library, database connection, or API the job depends on is unavailable
- Resource exhaustion — the job is killed by the kernel OOM killer with no cron-level notification
- Overlap — a previous run is still running, and the new invocation exits immediately
Any one of these can make your cron job appear to have never been scheduled in the first place.
Method 1: Heartbeat (Dead Man's Switch) Monitoring
The most reliable way to detect cron failure is to flip the question: instead of watching for an error, you watch for a signal that never arrives.
This is called a heartbeat or dead man's switch pattern. The idea is simple:
- At the end of your cron job, make an HTTP GET request to a unique monitoring URL.
- A monitoring service watches that URL and expects a ping within a defined window.
- If no ping arrives, the service alerts you.
Your cron job becomes:
`bash #!/bin/bash
... actual job logic ...
curl -s "https://hc.example.com/ping/your-unique-id" > /dev/null `
The beauty of this approach is that it catches everything: the job not starting at all, the job crashing partway through, the server being offline, or the scheduler itself being broken.
Uptrue's Heartbeat monitoring is built exactly for this pattern. You set the expected interval (every 5 minutes, every hour, every day), and Uptrue raises an alert the moment a ping is overdue. No code changes beyond that one curl line.
Method 2: Log Parsing and Exit Code Checks
If you can't modify the job, or you want a belt-and-braces approach, log parsing is the next option.
Redirect cron output explicitly:
` /5 * /usr/local/bin/my-job.sh >> /var/log/my-job.log 2>&1 `
Then use a log monitoring tool or a simple script to scan for failure patterns or stale timestamps. Tools like logwatch, graylog, or even a cron'd grep can alert on keywords like ERROR, Exception, or failed.
Capture exit codes explicitly:
`bash /usr/local/bin/my-job.sh EXIT_CODE=$? if [ $EXIT_CODE -ne 0 ]; then curl -s "https://alerts.example.com/notify?msg=job+failed+exit+$EXIT_CODE" fi `
This still requires you to actively look at logs or build tooling around them — which is why heartbeats tend to win in practice.
Method 3: Wrapper Scripts and Monitoring Agents
Some teams use a cron wrapper that handles timing, logging, and alerting centrally. Popular open-source options include cronitor, jobber, and supercronic. These tools run your jobs, capture their output, and can post results to a monitoring endpoint.
If you're already running infrastructure monitoring — for uptime, SSL, DNS, or security headers — it makes sense to route cron health into the same dashboard rather than a separate tool.
Mid-Page: Set Up Heartbeat Monitoring in Under Two Minutes
If you're not currently monitoring your cron jobs, sign up for Uptrue and create a Heartbeat monitor. You'll get a unique ping URL, set your expected interval, and add one line to your cron job. That's it. Uptrue handles the alerting — via email, Slack, or webhook — if the ping goes missing.
No agents to install. No infrastructure changes. Just a curl at the end of your script.
Method 4: Alerting on Downstream Side Effects
Sometimes the most pragmatic check isn't on the cron job itself but on what it produces. Examples:
- A backup job should create a file with today's date — check for its existence
- A sync job should update a database row — query and compare the timestamp
- A report job should send an email — confirm delivery via your email provider's API
This is a useful secondary layer. If your heartbeat says the job ran but the backup file is missing, you know the job ran but did something wrong internally. Combining heartbeats with output checks gives you the full picture.
Method 5: Uptime Monitoring for Cron-Adjacent Services
If your cron job relies on an external service (an API, a database, an SFTP server), monitoring that service's availability is part of the picture. Use uptime monitoring to confirm dependencies are reachable before assuming the job itself is at fault.
This is particularly useful when a cron job starts failing intermittently — the job code hasn't changed, but a downstream dependency has become unreliable.
You can also use Uptrue's free SSL checker tool to verify that any HTTPS endpoints your cron jobs call haven't expired their certificates — a surprisingly common cause of silent failures.
Putting It All Together
Here's a practical stack for robust cron failure detection:
| Layer | Method | What It Catches | |---|---|---| | Primary | Heartbeat ping | Job didn't run, crashed, server down | | Secondary | Exit code capture | Job ran but returned an error | | Tertiary | Output/side-effect check | Job ran and exited 0 but produced wrong result | | Infrastructure | Uptime + SSL monitoring | Dependencies unavailable or certificates expired |
You don't need all four layers for every job. A nightly backup probably warrants all of them. A low-stakes cache-warming script might just need a heartbeat.
Conclusion
Detecting cron failure isn't complicated, but it does require intentionality. The default cron setup on most systems is essentially "run the job and hope for the best." The heartbeat pattern — a single curl at the end of your script pointing to a monitoring service — turns that into active, alert-driven observability with almost no effort.
Set up even one of the methods above and you'll catch the next cron failure within minutes rather than days.
Further reading: Heartbeat monitoring explained · Uptime monitoring guide