2025-02-25

Troubleshooting: A skill that never goes obsolete

Value of Troubleshooting vs Building

One camp argues that spending “more time troubleshooting than building” is a red flag: it can distort your reward system, make you complacent, and trap you in low-status “support” roles.
They emphasize opportunity cost: time fixing a bug for 5% of users might be less impactful (and less career-rewarded) than building a new feature for 50%, depending on context.
Others strongly disagree, saying troubleshooting has been the foundation of successful, well-paid careers (e.g. SRE, ops, consulting, retainers) and is often exactly what management and teams value most in crises.

Career Dynamics and Perception

Several commenters describe getting stuck as the “support/troubleshooting person” while colleagues who ship fast (often buggy) features get promoted.
Advice: if an org only rewards flashy feature work and ignores maintenance, that’s a systemic problem—either change how work is measured (reliability metrics, leading indicators) or change jobs.
Conversely, being the “go-to firefighter” can create credibility, leadership opportunities, and promotions—provided the org respects reliability and quality.
There is concern about burnout and single points of failure; some intentionally step back so others develop troubleshooting skills.

Nature and Teachability of Troubleshooting

Many see troubleshooting as a distinct, generalizable skill: systematic hypothesis testing, questioning assumptions, ruling out confounders, narrowing scope.
Some claim it’s largely an innate mindset/curiosity that can’t be taught past a certain career stage; others counter it’s teachable but attitude- and interest-dependent.
It’s compared to the scientific method and to ITSM “problem” vs “incident” analysis, and framed as broader than just reading code.

Practices, Tools, and Techniques

Common recommended practices:
- Start simple; don’t assume the problem is complex.
- Change one thing at a time; avoid fixation.
- Clarify the problem and shared assumptions with the team.
- Increase observability/telemetry; gather more data when stuck.
- Keep careful written notes of hypotheses, experiments, and results.
There’s debate over heavy use of debuggers vs fast iteration with logging/print statements; platform and codebase size matter.

Analogies, Pay, and Organizational Incentives

The “reliable car mechanic” analogy is hotly debated: some say such mechanics are underpaid; many reply that the reliable ones are busy and well-compensated.
Parallel in software: feature work is “sexy” and visible; maintenance and reliability are treated as cost centers, despite being crucial.
Several note that good diagnostic ability includes knowing what not to fix and where effort has real business impact.

Meta: Article and Site

The article resonated strongly with many who enjoy troubleshooting and see it as their comparative advantage.
The site was “hugged to death”; discussion touched on hosting limits, cache strategies, and ironic need to troubleshoot the article’s own availability.

Related topics