Engineers have largely stopped reading documentation because docs and reality tend to differ. When people work on features, they read the source code or ask a teammate. Coding agents don't conduct team interviews, so they generate code from what they can find. Now that agents are reading your docs and source code, who's keeping the information fresh?
Untended docs rust. Spots spread quietly until your team's momentum is gone. At Dosu, we treat knowledge the way we treat session tokens. Each time your code changes without a corresponding knowledge edit, the information ages a bit. The goal of this post is to make that aging visible as a service-level signal you can plot and gate on.
A 2024 Empirical Software Engineering study of popular GitHub repositories found that 28.9% currently document a function, file, or class that no longer exists in the source. The average outdated reference has been wrong for 4.7 years. We've already walked through one way to catch that drift on every PR. The next question is the one this post answers. How fresh are our docs right now, on a scale we can track over time?
This post builds a documentation freshness-scoring pipeline that runs on every PR. It emits a number per doc page from 0 to 100, computed from three deterministic signals (Git age delta, frontmatter TTL contracts, and symbol-level drift) plus a Claude Code layer to help with those gray areas. By the end, you'll be plotting freshness, gating on it, and treating it like any other reliability signal your team owns.
A few things to note:
- We will build a Python prototype. The minimal symbol-extraction layer in this post uses a regular expression on
defandclass. Thefreshness.pyfile we're going to craft will also include a tree-sitter walker for TypeScript. We'll share any gotchas along the way. - We aim to show possibility over polish. Several edge cases (untracked files, brand-new pages, inline-code patterns catching CLI flags as fake symbols) are worth learning about before sharing with your team.
- This is a tutorial, not a finished product. The tool we're going to build should be helpful and useful for learning, though it's far from production ready. We think that all documented limitations and shortcomings are a feature, because they're where you can learn the most about team knowledge, especially as it ages.
- The signals live in Git. Anything documented in Notion, Confluence, or a hosted docs platform stays invisible to the script. We surface this as a gotcha later, but it's worth knowing up front. If most of your canonical docs live outside the repo, the in-repo signal alone won't cover them.
Prevent staleness with three signals
One of the best ways to prevent staleness is a docs-map.json file in your repo's root directory. Having each key tie to a source path, and the value is a list of doc pages. Your CI reads the diff, looks up affected pages, and runs a check. This is a pattern that many teams try first.
Though, this pattern can decay over time. When you rename a module, restructure a docs/ directory, or split a service in two, the map you've created falls behind, and your docs start to quickly become outdated.
In fact, the Aider team shared some interesting findings from their experience, including moving past hand-curated mappings toward a graph-ranked view of the codebase from a tree-sitter parse.
Let's use three signals to address this freshness problem in a durable way.
Git age delta. Based on Git metadata, we know when each line of code was last modified and when each line of doc was last updated. git log --follow and git blame track rename operations, so a refactor won't break our signal the way a manual mapping would. We can let Git do the heavy lifting and reduce maintenance.
Per-doc TTL contracts. Each doc page declares its own shelf life in our YAML frontmatter. For example, a quick-start guide could have ttl_days: 90 as it is updated with releases, while an architecture overview might use ttl_days: 365. We can set our expectations via a binding that lives alongside our knowledge, so when a doc is updated, our contract aligns automatically. Giant Swarm runs a version of this pattern in production with their open source frontmatter-validator, gating their public docs by last_review_date and a configurable REVIEW_TOO_LONG_AGO window. One refinement we're going to make is to let each page set its own TTL instead of inheriting a directory-wide threshold, but that underlying contract will be the same.
Symbol-level drift. Extract the function and class names mentioned in each doc, then check whether those symbols still exist in the codebase with the same signatures. A compiler-grade version of this is SCIP, Sourcegraph's typed cross-reference format. Tree-sitter is the next step down, with grammars for more than 40 languages. For a starter pipeline, a pattern match over def and class gets you most of the way on Python.
Combining the three into a single score requires deciding on thresholds, and we want those values to align with our expectations.
There are two public references that make sense for defaults. The docvet freshness check ships with drift-threshold = 30 days and age-threshold = 90 days as its out-of-the-box values, so docs older than 3 months past their referenced code start to get flagged. There's a great paper (Cox's 2014 thesis on dependency freshness ) that makes a descriptive argument we can use too. Our aim is to align with the same band as docvet's defaults, and an invitation to recalibrate against your own data once you have a baseline.
For the quants among us:
freshness(doc) = max(0, 100 - (age_penalty + drift_penalty + ttl_penalty))
age_penalty = clamp((days_since_doc_touched - days_since_code_touched) / 3, 0, 30)
drift_penalty = symbols_referenced_now_missing_or_changed * 10 # capped at 40
ttl_penalty = max(0, days_past_ttl * 2)
A page that hasn't been updated while its referenced code has been edited starts to lose points. A page whose TTL has lapsed loses more. A page whose referenced symbols have been renamed or removed takes an even larger hit to this freshness score. Cosmetic refactors don't move the score, because they don't change signatures. This formula helps identify drift and ignores noise, which is the kind of signal we want for our PRs.
Building our freshness signal
This pipeline runs via a small Python script and a GitHub Actions workflow. The script walks every doc page, computes a score from the three signals, and writes a JSON report to freshness.json that the rest of the workflow can read.
To start, create a script named .github/scripts/freshness.py with the following code:
import json
import os
import re
import subprocess
from datetime import datetime, timezone
from pathlib import Path
import yaml
REPO_ROOT = Path(__file__).resolve().parents[2]
DOCS_DIR = REPO_ROOT / os.environ.get("DOCS_DIR", "docs")
NOW = datetime.now(timezone.utc)
def last_touched(path: Path) -> datetime:
"""Last commit time, falling back to filesystem mtime for new/untracked files."""
iso = subprocess.check_output(
["git", "log", "-1", "--follow", "--format=%cI", "--", str(path)],
cwd=REPO_ROOT,
text=True,
).strip()
if iso:
return datetime.fromisoformat(iso)
if path.exists():
return datetime.fromtimestamp(path.stat().st_mtime, tz=timezone.utc)
return NOW
def days(d: datetime) -> int:
return max(0, (NOW - d).days)
def score(doc: Path) -> dict:
raw = doc.read_text()
fm_match = re.match(r"^---\n(.*?)\n---\n", raw, re.DOTALL)
front = yaml.safe_load(fm_match.group(1)) if fm_match else {}
freshness = front.get("freshness", {})
sources = []
for pattern in freshness.get("sources", []):
sources.extend(REPO_ROOT.glob(pattern))
doc_age = days(last_touched(doc))
youngest_source_age = min(
(days(last_touched(s)) for s in sources), default=doc_age
)
age_penalty = min(30, max(0, (doc_age - youngest_source_age) // 3))
ttl = freshness.get("ttl_days")
ttl_penalty = max(0, (doc_age - ttl) * 2) if ttl else 0
referenced = set(re.findall(r"`([A-Za-z_][A-Za-z0-9_]*)`", raw))
missing = referenced - _live_symbols(sources)
drift_penalty = min(40, len(missing) * 10)
return {
"path": str(doc.relative_to(REPO_ROOT)),
"score": max(0, 100 - age_penalty - drift_penalty - ttl_penalty),
"doc_age_days": doc_age,
"source_age_days": youngest_source_age,
"missing_symbols": sorted(missing),
}
def _live_symbols(sources: list[Path]) -> set[str]:
"""Python-only via regex. Swap for tree-sitter or LSP for polyglot repos."""
symbols: set[str] = set()
for s in sources:
if not s.is_file():
continue
text = s.read_text()
symbols.update(re.findall(r"\b(?:async\s+)?def\s+([A-Za-z_][A-Za-z0-9_]*)", text))
symbols.update(re.findall(r"\bclass\s+([A-Za-z_][A-Za-z0-9_]*)", text))
return symbols
if __name__ == "__main__":
docs = [*DOCS_DIR.rglob("*.md"), *DOCS_DIR.rglob("*.mdx")]
report = [score(p) for p in docs]
out = REPO_ROOT / "freshness.json"
out.write_text(json.dumps(report, indent=2))
print(f"Wrote {len(report)} pages to {out}")
_live_symbols above is Python-only via regex. For a TypeScript repo, you can swap the regex for a tree-sitter walk over the same shape of declarations. Our demo implementation (_live_symbols_typescript, about 30 lines) turns _live_symbols into a dispatcher keyed by file extension, and a single tree-sitter query captures the public-API declarations.
; captures public-API declarations
(function_declaration name: (identifier) @name)
(class_declaration name: (type_identifier) @name)
(interface_declaration name: (type_identifier) @name)
(type_alias_declaration name: (type_identifier) @name)
(enum_declaration name: (identifier) @name)
(method_definition name: (property_identifier) @name)
(method_signature name: (property_identifier) @name)
(lexical_declaration (variable_declarator name: (identifier) @name))
(variable_declaration (variable_declarator name: (identifier) @name))
If you're using Go, you can swap language_typescript for language_go and rewrite the query against function_declaration and method_declaration nodes. The shape of _live_symbols doesn't change as input is a list of source files, and output is a set of live symbol names.
The frontmatter contract is also part of this pipeline we're building. This is how each page tells the script which source files it describes and how often to revisit them.
---
title: User API
freshness:
ttl_days: 90
sources:
- "src/api/users.ts"
- "src/api/users/*.ts"
---
# User API
The sources field accepts globs, so a single line covers a whole subtree. The ttl_days field is a soft deadline. The score gradually decays past the TTL rather than flipping from green to red, which is intentional.
This GitHub Actions step runs the script, uploads the JSON as a workflow artifact, and computes the median against the latest green build of main.
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install pyyaml
- name: Compute freshness
run: |
python .github/scripts/freshness.py
python -c "
import json, statistics
scores = [r['score'] for r in json.load(open('freshness.json'))]
print(f'median={int(statistics.median(scores)) if scores else 100}')
" >> "$GITHUB_OUTPUT"
id: fresh
- name: Upload freshness report
uses: actions/upload-artifact@v4
with:
name: freshness-report
path: freshness.json
We compute the median in Python rather than in jq one-liner as Python's statistics.median handles even-length arrays correctly (averaging the two middle values), where a jq '.[length/2 | floor]' returns only the lower one. A minor detail, but skipping this step causes a bug to surface.
We computed a freshness score for every doc page in the repo without calling a language model. The signal is deterministic, meaning that with the same inputs, you get the same score every time. That provides you with a value that you can cleanly plot on a dashboard.
Use Claude to handle nuance
A page with a score of 92 is good, while a page with a score of 18 needs substantial updates. When you see scores somewhere between 35 and 65, that's where things get interesting! The symbols exist, but the surrounding behavior may or may not have changed. Pattern matching can't help decide on these types of cases, but a language model can.
Let's filter all of that knowledge that lives in the gray zone, then hand those to Claude (or your coding agent of choice). The workflow step looks like this:
- name: Identify gray-zone pages
id: gray
run: |
jq -r '.[] | select(.score >= 35 and .score < 65) | .path' \
freshness.json > /tmp/gray.txt
echo "count=$(wc -l < /tmp/gray.txt)" >> $GITHUB_OUTPUT
# The job needs id-token: write at the permissions level so
# claude-code-action can authenticate via OIDC.
- name: Claude semantic check
if: steps.gray.outputs.count != '0'
continue-on-error: true # soft signal; must not block the deterministic SLO gate
uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
claude_args: --max-turns 8
prompt: |
Read each page listed in /tmp/gray.txt and the source files in
its frontmatter sources. For each, decide whether the documented
behavior still matches the code. Respond with STILL_ACCURATE,
DRIFTED, or NEEDS_HUMAN_REVIEW and a one-line reason.
id-token: write (declared in the job's permissions: block) is required for the Action to authenticate via OIDC. continue-on-error: true is intentional. The deterministic per-page score and the SLO gate are the contract your team relies on. Claude is an assist that adds depth on the gray zone. An Anthropic API flake or auth hiccup shouldn't fail your release pipeline, and the gate that actually matters doesn't depend on Claude returning a verdict.
Once our workflow is running and establishing a baseline in your repo, a few pages typically appear in the gray zone for any given PR. Our LLM needs the same logic we wired up in our related post in this series, which you can check out for more information.
A note on cost. With --max-turns 8 and a few pages per PR in the gray zone, a typical run reads roughly 20β30k input tokens (page content plus referenced source files) and emits a short verdict per page. On Claude Sonnet 4.6, that comes to around 0.15 per PR. For a team merging 30 PRs a day, plan for 80/month in API costs. The deterministic per-page score and the SLO gate are free.
Expect a temporary spike during the bootstrap-to-mapped transition. As you add sources blocks, pages that were bootstrapped start firing the drift signal, and the gray-zone band fills with real-but-ambiguous cases worth Claude's attention. Budget extra tokens for the first couple weeks after you drop --bootstrap.
We ran this against our demo Python repo with no freshness: frontmatter and landed at a median score of 65. The reason is the symbol seam, because without a sources block, _live_symbols([]) returns an empty set, and every backticked token gets flagged as a missing symbol. Pages with zero backticks score 100. Pages that are good API references score 60.
In our demo repo, the freshness.py adds a --bootstrap flag (also reads FRESHNESS_BOOTSTRAP=1 from the environment) for this case. Pages that don't declare a sources block opt out of the drift signal in bootstrap mode, so your first runs emulate a real-world baseline (age and TTL still apply). A page with a declared sources path that no longer resolves (a typo, a rename, a deleted file) still gets a drift signal. Bootstrap is for pages you haven't mapped yet, not for masking broken paths. As you add sources to your most-edited pages, those pages fall out of bootstrap territory and full scoring resumes. Plan a focused block of time to map your top 10 pages, run with --bootstrap until you're past 50% coverage, then drop the flag.
A rough rollout sketch:
- Step 1: install the workflow, set
FRESHNESS_BOOTSTRAP=1in theCompute freshnessstep, watch the baseline land. - Step 2: map your top 10 most-edited pages. Pages that leave bootstrap territory and fall into the gray zone are where the signal is real.
- Step 3: tune the allowlist as the inline-code regex catches false positives, and add
sourcesblocks to another 10β20 pages. - Step 4: drop
--bootstrap. With your most-touched pages mapped, the median should sit comfortably above 75 and the score finally reflects something true.
Doc freshness and visibility
Now, our freshness number becomes part of a regular review flow.
We write a PR comment that compares the mean freshness on this branch to main, with the median tracked alongside as a trend. Here's the format our workflow uses:
## π Documentation freshness β 100 β 96 (-4)
π¨ **SLO breach** β 1 critical page at or below the floor of 60:
- `docs/api/authentication.md` (score 60)
### β οΈ 1 page dropped
| Page | Before β After | Reason |
|---|---|---|
| π΄ `docs/api/authentication.md` | 100 β 60 | signature drift on `decodeLibraryClaim` |
<sub>15 pages scored β’ median 100 β 100 (+0)</sub>
The headline uses mean instead of median because a single page tanking 40 points won't move the median across 15 pages. The median is robust to outliers (great for long-term trends, misleading on the PR signal), so it sits in the sub-footer as the trend marker. What reviewers can focus on is the delta. Our absolute score on a healthy repo doesn't change much from PR to PR, so a 4-point drop is a reasonable threshold to ask the author to take another pass at the affected pages before merging.
We can add an endpoint badge from Shields to the repo README for better visibility, too. The endpoint reads the same freshness.json from the latest main build and renders something like Docs Freshness: 87/100. A score visible on the README helps turn our CI outcomes into public visibility so that our team can know how fresh our knowledge is at a glance.
The third surface is our SLO. Two SLIs surface from our same freshness.json, each doing a different job:
| PR-delta SLI | Trend SLI | |
|---|---|---|
| Statistic | Mean | Median |
| Window | Per PR | Rolling 7-day |
| Surface | Sticky PR comment | README badge |
| Threshold | 4-point drop on the PR head | median >= 75 |
| Why this shape | Noisy by design, so a single critical page tanking is visible immediately | Stable by design, so long-term drift drives the review |
For the trend SLI, we use a floor of median >= 75. Allow median dips below the floor for at most 10% of green main samples in a rolling 28-day window before treating it as an impact signal. Inside our budget, the gate stays advisory. Outside it, freeze documentation-affecting merges until the median recovers. These numbers are seed values, so recalibrate against your own distribution once you have a few weeks of scores. Medians across fewer than 50 pages are noisy, so expect the floor to shift before it settles.
The critical-page floor (60 for anything tagged critical: true) is a separate, harder gate using an inclusive comparison (<= 60). The one-point cliff between 61 and 60 is intentional, since there's no error budget on a page-specific contract. If you want a softer signal, widen the band (warn at 65, fail at 55) or drop the critical: flag from pages that don't merit promise-level treatment.
Burn-rate alerts (fast burn at 2% of the 7-day budget per hour, slow burn at 10% per 6 hours, following Google SRE's workbook patterns) are the natural next step once you have a couple of months of scores to calibrate against. Treat them as a follow-up, rather than a day one concern.
Wire the SLO gate as a required status check and document a bypass-freshness-gate label that the workflow honors for incident response, and ensure you aren't blocked by a 30-day-old doc. The enforcement levels from our related post apply unchanged on top of that.
For copy-paste convenience, here's the complete workflow:
Click to expand the complete workflow
# Documentation Freshness
#
# Scores every doc page on a 0-100 scale from three deterministic signals
# (git age delta, frontmatter TTL, symbol drift), posts a sticky PR comment
# with per-page deltas, and fails the build on SLO breach:
# - median across non-excluded pages < 75 (strict: median is a soft signal)
# - any critical: true page <= 60 (inclusive: hard floor, hitting it counts)
#
# Pages in the 35-64 gray zone are routed to a conditional Claude semantic
# check that reports STILL_ACCURATE / DRIFTED / NEEDS_HUMAN_REVIEW per page.
#
# Required secret: ANTHROPIC_API_KEY (only consumed when the gray-zone count > 0)
name: Documentation Freshness
on:
pull_request:
# `labeled`/`unlabeled` so the SLO gate's `bypass-freshness-gate`
# opt-out re-runs after a label is applied to a previously-failed
# required check.
types: [opened, synchronize, reopened, labeled, unlabeled]
paths:
- 'docs/**'
- 'src/**'
- 'scripts/**'
- '.github/scripts/freshness.py'
- '.github/scripts/format_pr_comment.py'
- '.github/freshness-allowlist.txt'
- '.github/workflows/freshness.yml'
push:
branches: [main]
workflow_dispatch: {}
permissions: {}
concurrency:
group: docs-freshness-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
freshness:
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
contents: read
pull-requests: write
id-token: write # claude-code-action authenticates to the Anthropic API via OIDC
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
- name: Install uv
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
with:
enable-cache: true
cache-dependency-glob: |
pyproject.toml
uv.lock
python-version: "3.12"
- name: Install dependencies
run: uv sync --extra dev
- name: Compute freshness
id: fresh
# Day-one adopters: add `env: { FRESHNESS_BOOTSTRAP: "1" }` here for the
# first two weeks. Pages without a `freshness.sources` block opt out
# of the drift signal until you have a chance to map them. Drop the
# env once your most-edited pages declare `sources`.
run: |
set -euo pipefail
uv run python .github/scripts/freshness.py
cp freshness.json freshness.current.json
python <<'PY' >> "$GITHUB_OUTPUT"
import json, statistics
data = json.load(open('freshness.current.json'))
scores = [r['score'] for r in data]
median = int(statistics.median(scores)) if scores else 100
print(f'median={median}')
print(f'page_count={len(scores)}')
PY
- name: Upload freshness report
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: freshness-report
path: freshness.current.json
retention-days: 30
- name: Compute baseline freshness
if: github.event_name == 'pull_request'
env:
BASE_REF: ${{ github.event.pull_request.base.ref }}
run: |
set -euo pipefail
git fetch --no-tags --depth=1 origin "$BASE_REF"
git worktree add --force /tmp/freshness-base "origin/$BASE_REF"
if [ -f /tmp/freshness-base/.github/scripts/freshness.py ]; then
(cd /tmp/freshness-base && uv run python .github/scripts/freshness.py)
cp /tmp/freshness-base/freshness.json freshness.main.json
else
echo "::notice::freshness.py not on $BASE_REF; using empty baseline"
echo "[]" > freshness.main.json
fi
git worktree remove --force /tmp/freshness-base
- name: Identify gray-zone pages
id: gray
run: |
set -euo pipefail
jq -r '.[] | select(.score >= 35 and .score < 65) | .path' \
freshness.current.json > /tmp/gray.txt
count=$(jq '[.[] | select(.score >= 35 and .score < 65)] | length' freshness.current.json)
echo "count=$count" >> "$GITHUB_OUTPUT"
- name: Claude semantic check
if: steps.gray.outputs.count != '0'
continue-on-error: true
uses: anthropics/claude-code-action@dde2242db6af13460b916652159b6ba19a598f30 # v1.0.120
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
claude_args: |
--max-turns 8
--allowedTools "Read,Glob,Grep"
--model claude-sonnet-4-6
prompt: |
Read each page listed in /tmp/gray.txt and the source files in its
frontmatter `sources` block. For each, decide whether the
documented behavior still matches the code. Respond with one of
STILL_ACCURATE, DRIFTED, or NEEDS_HUMAN_REVIEW and a one-line reason.
# format_pr_comment.py renders the sticky-comment markdown from the two
# JSON reports (per-page deltas, dropped pages, mean-vs-median trend).
# Grab it from the demo repo and drop it next to freshness.py:
# https://github.com/onlydole/overdue/blob/main/.github/scripts/format_pr_comment.py
- name: Build PR comment body
if: github.event_name == 'pull_request'
run: |
set -euo pipefail
uv run python .github/scripts/format_pr_comment.py \
--current freshness.current.json \
--baseline freshness.main.json \
> /tmp/comment.md
cat /tmp/comment.md
- name: Post sticky PR comment
if: github.event_name == 'pull_request'
uses: marocchino/sticky-pull-request-comment@0ea0beb66eb9baf113663a64ec522f60e49231c0 # v3.0.4
with:
header: docs-freshness
path: /tmp/comment.md
- name: SLO gate
if: ${{ !contains(github.event.pull_request.labels.*.name, 'bypass-freshness-gate') }}
run: |
set -euo pipefail
python <<'PY'
import json, statistics, sys
data = json.load(open('freshness.current.json'))
scores = [r['score'] for r in data]
if not scores:
print('::notice::No pages scored; SLO check skipped')
sys.exit(0)
median = statistics.median(scores)
failed = False
if median < 75:
print(f'::error::Median freshness {int(median)} below SLO floor 75')
failed = True
for r in data:
if r.get('critical') and r['score'] <= 60:
print(f'::error::Critical page {r["path"]} score {r["score"]} at or below floor 60')
failed = True
if failed:
sys.exit(1)
print(f'::notice::Docs freshness OK (median={int(median)}, pages={len(scores)})')
PY
Interesting lessons learned
The freshness signal we just built does one thing well. It gives you a per-page metric you can track, plot, and gate on. The thing it can't do, no matter how carefully we tune the formula, is tell you which pages matter most.
Several more gotchas surfaced as we ran this against our demo repo, and each one was quite interesting!
git log --follow per file scales linearly. 200 doc pages and 200 source files means 400 subprocess calls. On a fresh runner with cold git state, that's noticeable. If your docs/ directory grows past a few hundred pages, batch the calls or use git log --name-only --follow once and parse the output yourself.
Long-lived feature branches accumulate age penalty unfairly. The age delta uses absolute timestamps, so a branch that's been open for six weeks shows a higher delta than reality, even though the code and docs have moved together within the branch. Computing the delta against the merge-base (git merge-base origin/main HEAD) instead of main directly fixes this, and the change is one line in last_touched.
Changelogs and historical docs are scored the same as living docs. A CHANGELOG.md legitimately references things that no longer exist ("replaced bcrypt with PyJWT"). The freshness model treats them the same as a quick-start. Either exclude them in the rglob pass or tag them freshness: { exclude: true } in frontmatter and skip the score in the script.
Dotted-path references read as drift unless you match components. A backticked MyClass.my_method legitimately points at a method on a defined class. The whole token never appears in the source, so the naΓ―ve regex marks it missing, and you take the drift penalty. Split on . and count the reference as a hit if all components are in the live symbol set.
Orβ¦just use Dosu!
The freshness score we built together is a great starting point. We've shipped variations of this workflow inside our demo repo. This solution works for teams that want to own their tooling and have the engineering hours to continue tuning it.
If you find yourself maintaining frontmatter contracts across more than a handful of pages, fighting symbol-extraction edge cases on a monorepo, or wishing the score also knew which docs your customers cite, that's where Dosu can help! Dosu automatically computes signals that teams care about. We flag updates and fit into your existing workflows, so a low score becomes a change your team can review. You can use the dosu-cli to set this up today, and utilize Dosu's MCP for all things knowledge with your codebase, Slack, Notion, and more.
Our DIY workflow gives you a process, a README badge, and a gate. Dosu monitors all your knowledge sources simultaneously. If you're ready for knowledge infrastructure that scales with you, give Dosu a try.
Try Dosu today for free and see what your knowledge infrastructure looks like when freshness is evaluated continuously, rather than after the fact.
Related in this series
- How to Catch Documentation Drift with Claude Code and GitHub Actions
- /doc-it: A Claude Code Skill for Auto-Generating Project Docs
Further reading
- giantswarm/frontmatter-validator, production-grade
last_review_dateenforcement on a public docs site - docvet's freshness check, the closest open source analog with real default thresholds
- docs-guardian for Claude Code, an emerging multi-agent staleness detector
- docfresh, a Rust take that pins each page to a source SHA via manifest
- SCIP, Sourcegraph's compiler-grade cross-reference format
- Cox's 2014 thesis on dependency freshness, the methodology behind percentile-based threshold calibration
- The RAG Freshness Problem, why retrieval pipelines need the same treatment


