Troubleshooting: A skill that never goes obsolete

Value of Troubleshooting vs Building

  • One camp argues that spending “more time troubleshooting than building” is a red flag: it can distort your reward system, make you complacent, and trap you in low-status “support” roles.
  • They emphasize opportunity cost: time fixing a bug for 5% of users might be less impactful (and less career-rewarded) than building a new feature for 50%, depending on context.
  • Others strongly disagree, saying troubleshooting has been the foundation of successful, well-paid careers (e.g. SRE, ops, consulting, retainers) and is often exactly what management and teams value most in crises.

Career Dynamics and Perception

  • Several commenters describe getting stuck as the “support/troubleshooting person” while colleagues who ship fast (often buggy) features get promoted.
  • Advice: if an org only rewards flashy feature work and ignores maintenance, that’s a systemic problem—either change how work is measured (reliability metrics, leading indicators) or change jobs.
  • Conversely, being the “go-to firefighter” can create credibility, leadership opportunities, and promotions—provided the org respects reliability and quality.
  • There is concern about burnout and single points of failure; some intentionally step back so others develop troubleshooting skills.

Nature and Teachability of Troubleshooting

  • Many see troubleshooting as a distinct, generalizable skill: systematic hypothesis testing, questioning assumptions, ruling out confounders, narrowing scope.
  • Some claim it’s largely an innate mindset/curiosity that can’t be taught past a certain career stage; others counter it’s teachable but attitude- and interest-dependent.
  • It’s compared to the scientific method and to ITSM “problem” vs “incident” analysis, and framed as broader than just reading code.

Practices, Tools, and Techniques

  • Common recommended practices:
    • Start simple; don’t assume the problem is complex.
    • Change one thing at a time; avoid fixation.
    • Clarify the problem and shared assumptions with the team.
    • Increase observability/telemetry; gather more data when stuck.
    • Keep careful written notes of hypotheses, experiments, and results.
  • There’s debate over heavy use of debuggers vs fast iteration with logging/print statements; platform and codebase size matter.

Analogies, Pay, and Organizational Incentives

  • The “reliable car mechanic” analogy is hotly debated: some say such mechanics are underpaid; many reply that the reliable ones are busy and well-compensated.
  • Parallel in software: feature work is “sexy” and visible; maintenance and reliability are treated as cost centers, despite being crucial.
  • Several note that good diagnostic ability includes knowing what not to fix and where effort has real business impact.

Meta: Article and Site

  • The article resonated strongly with many who enjoy troubleshooting and see it as their comparative advantage.
  • The site was “hugged to death”; discussion touched on hosting limits, cache strategies, and ironic need to troubleshoot the article’s own availability.