← All guides

Guide · SSL probe

SSL certificate monitoring beyond expiry alerts

"Cert expires in 14 days" is the easy half of TLS monitoring. The half that takes production down — silent CA swaps, missing intermediates, SAN coverage that quietly shrank after a wildcard renewal, an auto-renew that emitted a new cert but the load balancer never picked it up — needs a different kind of probe. Here's what real SSL certificate monitoring looks like.

Published 2026-05-22 · ~10 min read · StatusPulse Team

Why "expiry alert" isn't enough

The first SSL monitor every team writes is one line of cron that pipes openssl x509 -enddate into a date comparison and emails if fewer than N days remain. It catches the dumbest failure — a cert nobody renewed — and it misses everything else.

Every one of these has paged a real on-call in the last few years, and none show up in a pure expiry check:

  • Silent CA swap. Your cert was Let's Encrypt last week and is ZeroSSL this week. Not-after looks fine; somebody just rotated issuers. Either intentional and fine, or your renewal automation got hijacked and you want to know today.
  • Weak signature algorithm. A vendor re-issued an internal cert with SHA-1, or an IPMI interface has been serving an MD5 cert since 2017. Browsers reject it, modern OpenSSL rejects it; your container running an old libssl might not.
  • Hostname-mismatch drift after wildcard renewal. The old cert covered *.example.com, example.com and api.internal.example.com. The new cert covers *.example.com only. Two of three sub-fleets now serve a cert the browser rejects — but the expiry date is 89 days out, so the expiry monitor is green.
  • Chain truncation. The LB serves the leaf but not the intermediate. Browsers with a hot AIA cache mask it; a fresh client (Lambda cold start, CI runner, mTLS partner) sees "unable to get local issuer certificate" and bounces. The leaf has 60 days remaining.
  • Partial-SAN coverage gap. The old cert listed four SANs, the renewed one lists three because somebody pruned the Terraform module. The fourth host now serves a cert that doesn't cover its own hostname.

The HTTPS layer fails in ways that have nothing to do with the calendar — and unlike a 500 response (which the HTTP probe sees in milliseconds), these failures sit inside the TLS handshake and never reach the application logs. Real SSL certificate monitoring inspects what's inside the cert on every probe and diffs it against the previous one.

What a cert actually exposes (and why each field matters)

Before deciding what to alert on, look at what's in there. Grab a cert from your own host:

openssl s_client -servername api.example.com \
                 -connect api.example.com:443 \
                 -showcerts < /dev/null 2>/dev/null \
  | openssl x509 -noout -text

The fields worth watching, and why each one matters:

  • Subject & Subject Alternative Names. The hostnames the cert claims to cover. SANs are the authoritative list — the Common Name on the Subject line is decorative and modern clients ignore it. Silent SAN shrinkage at renewal is the second most common cause of "broke in production after auto-renew".
  • Issuer. Who signed it. Let's Encrypt R10, DigiCert Global G2, Amazon RSA 2048 M02. A change of issuer is almost never accidental — when it is, it's usually because somebody pointed Caddy or Traefik at a different ACME directory.
  • Signature algorithm. sha256WithRSAEncryption, ecdsa-with-SHA384. SHA-1 and MD5 should not appear in a 2026 cert.
  • Public-key algorithm and size. rsaEncryption 2048 bit, id-ecPublicKey 256 bit (prime256v1). RSA under 2048 is a flag. ECDSA on P-192 is a flag.
  • Serial number. A 128-bit random identifier the CA assigned. Two consecutive probes seeing different serials means the cert was rotated between them — the single most useful piece of metadata for catching silent renewals (next section).
  • Not-before & not-after. The validity window. Not-after is the expiry everyone watches; not-before in the future means the cert was deployed early and won't be honored yet.
  • Negotiated TLS version & cipher. Exposed by the handshake. TLS 1.0 and 1.1 should be off. RC4, 3DES and CBC + SHA1 ciphers should be gone too.

Two-tier expiry alerting that actually works

One threshold doesn't fit two different signals. A single "alert at 14 days" misses the case where renewal silently failed three weeks ago — by the time you get the 14-day ping, the SRE who owns the cert is on holiday and you have two weekend days to fix it.

What works in practice is three tiers with different urgencies:

  • Warning at 30 days remaining. Not a page. A ticket. The cert should have renewed by now if it's a 90-day ACME cert (Let's Encrypt renews at day 60, so day 30 means renewal is at least a week overdue). The probe goes Degraded and the weekly status report flags it. Whoever owns the deployment looks at why the ACME client stopped.
  • Critical at 7 days remaining. Page. Renewal is broken, the auto-renew window has fully passed for ACME certs, and a manual cert with a quarterly review just had its review missed. Anyone on-call can fix it; nobody needs to fix it at 3am, but it goes on the morning standup as a P2.
  • Hard alert at 0 days (actual expiry). Production is broken right now. Wake someone up. The expiry is a fact, not a forecast.

StatusPulse's SSL probe exposes these as two day-based thresholds: Degraded at (days) for the warning tier and Expired at (days) for the critical tier. The hard "actually expired" check is automatic — a cert past its not-after always counts as Down, regardless of thresholds. Set the warning to 30 and the critical to 7, and you have the three-tier flow without writing a line of logic.

The mistake to avoid: setting the warning threshold higher than your renewal cadence. If you alert at 60 days on a 90-day cert that renews at day 30, the probe sits in Degraded for half its life and the alert becomes background noise. The threshold has to live inside the renewal window — past the point where a healthy renewal should already have happened.

Renewal detection via serial-number drift

Every cert a CA issues gets a unique serial number — a random 128-bit integer the CA picks at issuance. If you probe the same hostname twice and see two different serials, the cert was reissued between the probes. This is the cleanest signal for both "did renewal happen?" and "did something rotate the cert when nothing should have?".

Read the serial directly:

openssl s_client -servername api.example.com \
                 -connect api.example.com:443 \
                 < /dev/null 2>/dev/null \
  | openssl x509 -noout -serial

Three patterns matter:

  • Serial changed at the time you expected. ACME renewed at day 60, the new cert appeared at the LB, the next probe sees a fresh serial and not-after 90 days out. The metric chart shows a clean sawtooth — positive signal that automation is working.
  • Serial didn't change when it should have. Day 25, days-remaining has dropped through three probe cycles, the serial hasn't budged. ACME client broken, CronJob failing, or the LB cached the old cert. The probe drops into Degraded at the 30-day threshold — you catch the bug before it becomes an outage.
  • Serial changed when nothing should have. Yesterday it was Let's Encrypt with serial A1...; today it's ZeroSSL with serial F4... and you didn't change anything. Either two ACME clients are fighting over the same hostname or somebody got into your DNS and issued. The metadata diff is the early warning.

The point isn't to alert on every serial change — false positives during legitimate renewals would be relentless — it's to store the serial and the issuer on every probe, so when you're triaging a TLS incident you can pull up the metadata history and see the exact probe where the cert flipped.

Weak-crypto visibility: signatures, key sizes, TLS versions

Weak crypto is mostly a slow-burn problem. A SHA-1 cert won't take production down tomorrow — modern clients reject it and your customers already routed around the SHA-1 IPMI interface years ago. But scheduled reports and an audit trail are how you stop the slow burn from becoming a compliance finding six months later.

The list worth surfacing:

  • SHA-1 or MD5 signatures. Flag. In 2026 these should not appear on any cert serving real traffic; if they do it's almost always a forgotten management interface — iLO/iDRAC, an OOB switch, a vendor appliance.
  • RSA < 2048 bits. Flag. 1024-bit RSA has been deprecated for over a decade. If you find one it's usually a hardware appliance with a fused factory key — file the exception and stop the alert.
  • ECDSA on P-192. Flag. P-256 is the floor; P-384 and P-521 are fine.
  • TLS 1.0 / TLS 1.1 still enabled. Flag. RFC-deprecated since 2021. Modern load balancers ship with TLS 1.2 minimum by default — if the probe is negotiating 1.0 or 1.1, somebody manually downgraded the policy.

The rule of thumb: flag weak crypto in the scheduled weekly report, don't page on it. If you've got dozens of HTTPS endpoints to track, this is where centralised SSL certificate monitoring starts to earn its keep — every cert in one inventory, the weak ones flagged in one place. 5 free probes cover the most-critical handful; the rest sit in the paid tiers.

Common failure modes the cert alone can't tell you about

Some of the worst HTTPS incidents are invisible to a probe that only reads the leaf cert. These are the big four:

  • "Auto-renew" that quietly failed. The certbot timer ran, the ACME challenge failed (port 80 firewalled off, DNS provider key rotated, Let's Encrypt rate-limit hit), and certbot logged it to /var/log/letsencrypt/letsencrypt.log where nobody reads it. The old cert keeps serving until expiry. The two-tier alert above catches this — at day 30 the renewal should have happened a week ago and hasn't.
  • Cert deployed but not picked up by all LB nodes. Four-node pool, the deploy script rolled the new cert to three of them, the fourth missed the reload. A probe that runs hourly hits a random node each time — so the symptom is two serials alternating in the metadata history. That alternation is the fingerprint.
  • New cert missing a SAN that was on the old one. The probe stores the SAN list on every run, so a SAN that disappeared shows up in the diff. Worth a manual sanity check after any Terraform change to your ACM or cert-manager module.
  • Intermediate CA not bundled. The leaf is fine, but the server is only serving the leaf — not the intermediate that links it back to the trusted root. Browsers with a warm AIA cache mask this; the probe doesn't, and clients without AIA support (older Go std library, some embedded devices, some mTLS partners) fail immediately.

Verify the chain manually:

openssl s_client -servername api.example.com \
                 -connect api.example.com:443 \
                 -showcerts < /dev/null 2>&1 \
  | grep -E '^(Certificate chain|s:|i:|Verify return code)'

A healthy response shows at least two entries (leaf + intermediate, sometimes leaf + intermediate + root) and ends with Verify return code: 0 (ok). Anything else — a single entry, a non-zero return code, or a return code that says unable to get local issuer certificate — means the chain is truncated.

There's also a sibling expiry that bites teams just as often: the domain itself. A cert renewing cleanly on a domain that lapsed at the registrar is the most embarrassing variant of this class of bug; the domain monitoring guide covers the registration-side equivalent.

What to alert on, what to ignore

Three thresholds, three responses. Anything outside this list goes into the scheduled report, not the pager.

  • Page on actual expiry or expiring within 7 days. The only TLS state that deserves to wake a human at 3am. With Expired at (days) set to 7, the probe goes Down at the 7-day mark and the on-call gets the usual email / Slack / Teams / SMS fan-out.
  • Warn at 30 days remaining or on chain failure. Degraded, not Down. The cert hasn't renewed when it should have, or the server is serving a broken chain. Ticket, Slack message, no pager. Fix it during business hours.
  • Surface weak crypto in the scheduled report. Not an alert at all. SHA-1, RSA < 2048, TLS 1.0/1.1, ECDSA below P-256 — into the weekly status email and the next remediation round. A SHA-1 cert on an IPMI interface is not a 2am problem.
  • Ignore one-off probe failures. A single failed handshake on a 1-hour probe is almost always a network blip between the StatusPulse region and your edge. Two consecutive failures is the real signal — use the Consecutive failures to alert setting so you stop paging on transient blips.

The principle: only the things you can fix in the next hour belong on the pager. SSL hygiene has years of slow-moving findings and minutes of acute outages, and the alerting needs to reflect that.

Wrap-up

Expiry monitoring catches one failure mode out of about seven. The other six — silent CA swaps, weak signature algorithms, hostname-mismatch drift after wildcard renewal, chain truncation, partial-SAN coverage gaps, renewals that should have happened and didn't — all live inside the cert and the handshake, and they all need a probe that does more than read not-after.

The discipline is simple: capture the full cert metadata on every probe (issuer, SAN list, signature algorithm, key size, serial number, TLS version), set a two-tier expiry alert that lives inside the renewal window, treat serial-number drift as a first-class signal for catching both legitimate renewals and unexpected rotations, and push weak-crypto findings into a weekly report rather than the pager. Once that's wired, you stop being surprised by your own HTTPS — and the certificate stops being the part of the stack that breaks at 2am on a holiday weekend.

Try StatusPulse's SSL probe

5 probes free, forever. No credit card. US or EU host — you choose.