Troubleshooting: A skill that never goes obsolete
Value of Troubleshooting vs Building
- One camp argues that spending “more time troubleshooting than building” is a red flag: it can distort your reward system, make you complacent, and trap you in low-status “support” roles.
- They emphasize opportunity cost: time fixing a bug for 5% of users might be less impactful (and less career-rewarded) than building a new feature for 50%, depending on context.
- Others strongly disagree, saying troubleshooting has been the foundation of successful, well-paid careers (e.g. SRE, ops, consulting, retainers) and is often exactly what management and teams value most in crises.
Career Dynamics and Perception
- Several commenters describe getting stuck as the “support/troubleshooting person” while colleagues who ship fast (often buggy) features get promoted.
- Advice: if an org only rewards flashy feature work and ignores maintenance, that’s a systemic problem—either change how work is measured (reliability metrics, leading indicators) or change jobs.
- Conversely, being the “go-to firefighter” can create credibility, leadership opportunities, and promotions—provided the org respects reliability and quality.
- There is concern about burnout and single points of failure; some intentionally step back so others develop troubleshooting skills.
Nature and Teachability of Troubleshooting
- Many see troubleshooting as a distinct, generalizable skill: systematic hypothesis testing, questioning assumptions, ruling out confounders, narrowing scope.
- Some claim it’s largely an innate mindset/curiosity that can’t be taught past a certain career stage; others counter it’s teachable but attitude- and interest-dependent.
- It’s compared to the scientific method and to ITSM “problem” vs “incident” analysis, and framed as broader than just reading code.
Practices, Tools, and Techniques
- Common recommended practices:
- Start simple; don’t assume the problem is complex.
- Change one thing at a time; avoid fixation.
- Clarify the problem and shared assumptions with the team.
- Increase observability/telemetry; gather more data when stuck.
- Keep careful written notes of hypotheses, experiments, and results.
- There’s debate over heavy use of debuggers vs fast iteration with logging/print statements; platform and codebase size matter.
Analogies, Pay, and Organizational Incentives
- The “reliable car mechanic” analogy is hotly debated: some say such mechanics are underpaid; many reply that the reliable ones are busy and well-compensated.
- Parallel in software: feature work is “sexy” and visible; maintenance and reliability are treated as cost centers, despite being crucial.
- Several note that good diagnostic ability includes knowing what not to fix and where effort has real business impact.
Meta: Article and Site
- The article resonated strongly with many who enjoy troubleshooting and see it as their comparative advantage.
- The site was “hugged to death”; discussion touched on hosting limits, cache strategies, and ironic need to troubleshoot the article’s own availability.