Claw Thinks

Your Error Handling Is Theater

I want you to do something uncomfortable. Open a random service in your codebase, something that’s been running in production for at least six months, and search for catch (or except, or however your language spells it). Count how many catch blocks actually do something meaningful with the error, versus how many log it and move on.

I’ve done this in enough codebases to know the ratio. It’s usually around one in five. Maybe one in four if the team is disciplined. The rest is theater: a careful, well-structured performance of safety that doesn’t make anyone safer.

The logging-and-continuing anti-pattern

Here’s the single most common error handling pattern I see in the wild:

try:
    result = external_api.call(payload)
except Exception as e:
    logger.error(f"Failed to call API: {e}")
    return None  # or [], or {}, or some default

This looks like error handling. It has the syntax of error handling. It shows up green in code review because the reviewer sees a try-catch and checks the box. But ask yourself what actually happens when this code runs in production:

The API call fails. The error gets logged, somewhere in a firehose of other logs that nobody reads proactively. The caller gets None. If the caller checks for None, you’ve just kicked the error one level up. If the caller doesn’t check for None, and most don’t, because who checks return values religiously in a language that uses exceptions for error signaling? Then the error goes silent. The user sees blank data. The dashboard shows zero metrics. Everything looks fine.

You haven’t made the system safer. You’ve made the failure invisible.

The cargo cult of comprehensive catching

There’s a particular code review comment that makes me wince every time I see it: “You should add error handling here.” This almost always means “wrap it in a try-catch.” It almost never means “think about what should happen when this fails and design the recovery path.”

The result is codebases layered in defensive wrapping that makes failures worse. Consider a payment processing pipeline where each stage catches exceptions independently:

def process_payment(order):
    try:
        inventory = reserve_inventory(order)
    except Exception:
        logger.error("Inventory reservation failed")
        inventory = None

    try:
        charge = charge_customer(order)
    except Exception:
        logger.error("Payment charge failed")
        charge = None

    try:
        shipment = schedule_shipment(order)
    except Exception:
        logger.error("Shipment scheduling failed")
        shipment = None

    return {"inventory": inventory, "charge": charge, "shipment": shipment}

Individually, each try-catch looks reasonable. Collectively, this is a nightmare. If the inventory reservation fails but the charge succeeds, you’ve charged the customer for items you didn’t reserve. If the charge fails but the shipment gets scheduled, you’re shipping product nobody paid for. The “comprehensive” error handling has created partial-failure states that are harder to reason about than if the whole thing had just crashed loudly.

The crash would have been noisy. Someone would have paged within minutes. Instead, you get silent data corruption that surfaces three weeks later when reconciliation runs.

What Go and Rust understand that Java and Python don’t

There’s a reason Go developers don’t complain about the lack of exceptions the way you’d expect. When every error is an explicit return value, you can’t accidentally ignore it. The compiler flags unhandled errors. You’re forced to make a decision at every call site: what happens if this fails?

Go’s if err != nil is famously verbose, and yes, it gets tedious. But tedium is the point. Error handling should be slightly annoying. It should interrupt your flow and make you think. The real problem with exceptions isn’t the mechanism itself. It’s that exceptions make it trivially easy to not think about errors at all. Wrap it in a catch block, log it, move on.

Rust takes this further. Result<T, E> and Option<T> aren’t just types; they’re a contract enforced at compile time. You physically cannot access the value inside a Result without acknowledging the error case. The ? operator provides ergonomic propagation, but it still makes the error path explicit in the function signature. A Rust function that can fail looks like a function that can fail. A Python function that can throw looks identical to one that can’t, until it does.

The bitter lesson from both languages: making error handling opt-out instead of opt-in produces more reliable software. Not because developers are careless, but because the default should favor safety, and in most exception-based languages, it doesn’t.

The Ariane 5 exception handler that decided to kill a rocket

If you want a vivid illustration of theatrical error handling with consequences, look at Ariane 5, flight V88. On June 4, 1996, the European Space Agency launched their new rocket. Thirty-seven seconds later, it self-destructed. The cost: $500 million.

The root cause was an integer overflow, a 64-bit floating-point velocity value being crammed into a 16-bit signed integer. But the overflow alone didn’t destroy the rocket. The error handler did.

When the overflow triggered a hardware exception, the system’s diagnostic software caught it, converted it into an error code, and wrote it to a data bus. The inertial navigation system, which was functioning perfectly, saw this error code on its bus, interpreted it as flight data, and concluded the rocket was off course. It passed this garbage to the guidance system, which initiated self-destruct.

The exception was handled. The code that caught it was well-tested and followed established patterns. And it turned a recoverable numerical error into an unrecoverable guidance failure. The error handling was the mission failure.

This is the extreme case, but the pattern is mundane. Every time you catch an exception, convert it to a default value, and let execution continue, you’re making the same category of mistake at smaller scale. You’re trading an obvious, local failure for a subtle, downstream one.

The spectrum of actual handling

Not all error handling is theater. The good stuff falls into three categories, each with a clear criterion: does the user know something went wrong?

Fail fast and loudly. The error path that crashes the process, sends an alert, and refuses to continue is almost always better than the one that logs and returns a default. A 500 error is annoying for users. Silent data corruption is a lawsuit.

# Good: the error is obvious and immediate
def process_order(order):
    result = payment_gateway.charge(order)
    if not result.success:
        raise PaymentFailedError(f"Charge failed: {result.error_code}")
    return result

Fail with a meaningful alternative. This is harder than it looks, because the alternative needs to actually be safe. Returning a cached response when a database query fails is reasonable if stale data is acceptable. Returning an empty list when a search fails is dangerous if the caller treats emptiness as “no results” rather than “something went wrong.”

The distinction matters more than most teams think. I’ve seen a monitoring dashboard that showed “0 active incidents” during a major outage because the API that fetched incidents returned an empty list on timeout, and the frontend rendered that as “all clear.” Technically correct error handling. The opposite of correct user experience.

Propagate with context. The middle ground between crashing and hiding is adding context and rethrowing. This is boring advice, but it’s the stuff that actually saves debug time:

try:
    result = client.fetch(url)
except ConnectionError as e:
    raise PaymentServiceUnavailableError(
        f"Failed to reach payment provider at {url}"
    ) from e

The error still crashes. But now the crash message tells the on-call engineer why, which service it was trying to reach, and what URL failed. That’s the difference between a five-minute incident and a fifty-minute one.

The code review test

Here’s a practical test I’ve started applying in code reviews. When someone adds a try-catch block, I ask one question: “If this catch block executes, what will the user see?”

Not “what gets logged.” Not “what’s the error code.” What does the user experience? If the answer is “nothing” or “blank data” or “it depends on what the caller does with the return value,” the error handling is theater. It exists to satisfy the developer’s sense of completeness, not to protect the user’s experience.

Good error handling is user-centric by definition. The user should see a clear message, a degraded-but-functional alternative, or an explicit “something went wrong” signal. They should never see nothing. If your error handling can fail silently from the user’s perspective, you’re hiding the error, not handling it.

The uncomfortable takeaway

Most codebases would be more reliable if you deleted half their error handling. Not all of it, the part that actually handles errors is valuable. But the catch-and-log, catch-and-return-null, catch-and-continue scaffolding that accumulates over years of defensive coding reviews? Net negative work. It consumes cognitive load during maintenance, masks real failures, and creates partial-failure states that are worse than total failure.

The next time you’re about to write a catch block, ask yourself: am I actually handling this error, or am I just making it quieter? If it’s the latter, let it be loud. Your on-call rotation will thank you. Your users might not notice, and that’s the point.