Back to PM Skills
Analysis & Diagnostics

Metrics Debugger

Debug metrics drops with first principles.

# Drop into ~/.claude/skills/metrics-debugger/
curl -L https://github.com/sunnyyang-hicks/pm-skills-for-claude/raw/main/metrics-debugger/SKILL.md \
  -o ~/.claude/skills/metrics-debugger/SKILL.md

Overview

A metric just moved unexpectedly. Your job is to figure out why — fast. Is it a real product problem, a data pipeline issue, an external factor, or a normal fluctuation? Panic is expensive; methodical debugging is cheap.

First Response

When a metric drops, resist the urge to hypothesize immediately. Follow this sequence:

Step 1: Verify the Data

Before investigating the cause, confirm the drop is real:

  • [ ] Is the data fresh? Check pipeline latency. Is today's data fully loaded?
  • [ ] Is the definition consistent? Did someone change how the metric is calculated?
  • [ ] Is the source stable? Any data pipeline failures, schema changes, or ETL errors?
  • [ ] Is sampling consistent? If you use sampling, did the sample change?
  • [ ] Check multiple sources. Does the drop appear in both your analytics tool AND the database? If only one shows it, it's likely a data issue.

If the data is wrong, stop. Fix the data. Don't debug a phantom problem.

Step 2: Scope the Impact

## Impact Assessment

**Metric:** [name]
**Change:** [X → Y] ([Z% change])
**When:** [started date/time]
**Duration:** [how long]

**Affected segments:**
- By platform: [iOS / Android / Web — which shows the drop?]
- By geography: [specific regions?]
- By user type: [new vs. existing, free vs. paid?]
- By channel: [organic vs. paid, specific referrer?]

**Related metrics:**
- [Upstream metric]: [changed / stable]
- [Downstream metric]: [changed / stable]
- [Correlated metric]: [changed / stable]

Step 3: Systematic Root Cause Analysis

Work through these categories in order:

1. Internal product changes:

  • Any deployments in the timeframe? (Check deploy logs)
  • Feature flag changes?
  • A/B test started or ended?
  • Infrastructure changes (CDN, load balancer, database migration)?

2. External factors:

  • Seasonality (holiday, weekend, end of month)?
  • Competitor action (launch, outage, price change)?
  • News event affecting user behavior?
  • App store changes (algorithm, featuring, ranking)?
  • Platform changes (iOS update, Chrome update, API deprecation)?

3. User behavior shift:

  • Traffic mix change (different channels sending different users)?
  • User segment shift (enterprise vs. SMB ratio)?
  • Bot traffic change?

4. Upstream dependency:

  • Third-party API outage?
  • Payment provider issue?
  • Email deliverability change?
  • Ad platform targeting change?

Step 4: Quantify and Attribute

Once you've identified the likely cause:

## Root Cause

**Primary cause:** [description]
**Confidence:** High / Medium / Low
**Evidence:** [what confirms this]

**Impact breakdown:**
- [X%] of the drop is explained by [cause 1]
- [Y%] of the drop is explained by [cause 2]
- [Z%] is unexplained / noise

**Is this a one-time event or ongoing?** [assessment]

Step 5: Recommend Response

| Urgency | Criteria | Action | |---------|---------|--------| | Critical | Revenue-impacting, user-facing, growing | Fix immediately, war room | | High | Significant but contained | Fix this sprint, monitor hourly | | Medium | Moderate impact, stable | Investigate and fix within 1-2 weeks | | Low | Minor, self-correcting | Monitor, no action needed |

Output

# Metrics Debug — [Metric Name] [Date]

## Summary
[2-3 sentences: what happened, why, what to do]

## Investigation Log
[Step-by-step what you checked and found]

## Root Cause
[Identified cause with evidence and confidence]

## Recommended Action
[What to do + urgency level]

## Monitoring Plan
[What to watch to confirm the fix or detect recurrence]

Save as METRICS-DEBUG-[metric]-[date].md.