Methodology

Data sources, data quality assessment, analysis approach, and query transparency. Every claim in this report is traceable to public data, and every limitation is stated explicitly.

Data Quality Assessment

Critical Context

The data underlying this report has significant gaps. All claims are made within these constraints. Improving data quality is the #1 priority for making stronger claims in future reports.

Current State

Dimension	Status	Value	Impact on Report
Total MAUDE events	Good	2.50M	Full history available
Device linkage	Solved	99.3% linked	Virtually all events analyzable by brand
LLM extraction	Moderate	39.3% (981K events)	Root cause and failure mode data improving
Embedding coverage	Good	87-91%	Vector search works
PowerGlide linked events	Good	224 events	Strong pattern detection
PowerGlide extracted events	Moderate	132 events (58.9%)	Failure mode and root cause data
BD Power line total events	Large	~5,100 events	Full portfolio analysis

What Each Gap Means

Device linkage at 99.3%: Previously the critical blocker at 18.9%. Now virtually all events are linked to specific device brands. PowerGlide specifically went from 46 to 224 linked events (4.8x improvement) through ghost event backfill and improved device matching.

LLM extraction at 39.3%: The structured fields we rely on (recall_risk, root_cause, failure_mode, user_error_blamed) come from LLM extraction of event narratives. Coverage improved from ~30% to 39.3% overall. PowerGlide has 58.9% extraction coverage (132 of 224 events). Remaining gap means some failure patterns may be underrepresented.

Embedding coverage at 87-91%: Good enough for vector similarity search and clustering. Used to discover "ghost" PowerGlide events that were filed under generic names.

Data Quality Gates

What you need before making specific types of claims:

Data Sources

FDA MAUDE Database

Source: FDA Manufacturer and User Facility Device Experience

Coverage: Medical device adverse event reports from 1992-present

Our dataset: 2.50M events (data through February 2026)

Access: Public data via openFDA API and bulk downloads

URL: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm

CMS Provider Utilization

Source: Centers for Medicare & Medicaid Services (public files)

Coverage: Medicare Part B procedures by provider

Dataset used: 727K procedures, 299K providers (2022 data)

Note: This data comes from publicly available CMS files. It is not stored in our ClickHouse database. We reference it for market sizing only.

HCPCS codes used:

36568: PICC insertion (without imaging)
36569: PICC insertion (with imaging)
36572: CVC insertion (without imaging)
36573: CVC insertion (with imaging)

INFO

Medicare represents ~30% of total procedures. Actual market volume is 3-4x larger.

FDA Enforcement Data

Sources: Recall database, warning letters, enforcement actions

Our dataset: 57K enforcement actions in ClickHouse

Analysis Pipeline

openFDA API → ClickHouse (analytics engine)

Process:

Daily date ranges to avoid API 26K pagination limit
Deduplication by MDR report key
Device and manufacturer linking via brand name matching
Geographic extraction from event narratives
LLM extraction for structured fields (recall risk, root cause, failure mode)
Vector embeddings for similarity search

Refresh: Weekly automated pipeline

LLM Extraction

Model: Gemini 2.0 Flash via Vertex AI Batch

Cost: ~$0.05 per 1M tokens (~$45 for 700K events)

Fields extracted:

Field	Type	Purpose
recall_risk	high/medium/low	Recall prediction
root_cause	enum	Failure attribution
failure_mode	enum	Technical classification
user_error_blamed	boolean	Blame detection
user_error_justified	boolean	Blame validation
affects_other_units	boolean	Batch risk indicator
facility_name	text	Geographic intelligence
facility_state	text	Territory mapping

Coverage: 39.3% of total events extracted (981K of 2.50M).

Vector Embeddings

Model: Voyage-3 (1024-dimensional)

Use cases: Similar event clustering, failure pattern discovery, semantic search across narratives

Coverage: 87-91% of events embedded

Key Queries

BD Power Line Events

sql

SELECT d.brand_name, COUNT(*) as events,
       countIf(resulted_in_injury) as injuries,
       round(countIf(resulted_in_injury) * 100.0 / COUNT(*), 1) as injury_rate
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE lower(d.brand_name) LIKE '%power%picc%'
   OR lower(d.brand_name) LIKE '%power%glide%'
   OR lower(d.brand_name) LIKE '%power%midline%'
GROUP BY d.brand_name
ORDER BY events DESC

PowerGlide Timeline

sql

SELECT toYYYYMM(event_date) as month,
       COUNT(*) as events,
       countIf(resulted_in_injury) as injuries
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE lower(d.brand_name) LIKE '%power%glide%'
  AND event_date >= '2024-11-01'
GROUP BY month
ORDER BY month

Failure Mode Distribution

sql

SELECT failure_mode, COUNT(*) as n,
       round(COUNT()*100.0/SUM(COUNT(*)) OVER(), 1) as pct
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE llm_extracted = 1
  AND (lower(d.brand_name) LIKE '%power%glide%'
       OR lower(d.brand_name) LIKE '%power%picc%'
       OR lower(d.brand_name) LIKE '%power%midline%')
GROUP BY failure_mode
ORDER BY n DESC

Root Cause Distribution

sql

SELECT root_cause, COUNT(*) as n,
       round(COUNT()*100.0/SUM(COUNT(*)) OVER(), 1) as pct
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE llm_extracted = 1
  AND (lower(d.brand_name) LIKE '%power%glide%'
       OR lower(d.brand_name) LIKE '%power%picc%'
       OR lower(d.brand_name) LIKE '%power%midline%')
GROUP BY root_cause
ORDER BY n DESC

Using the Spincast API

Spincast provides a ClickHouse endpoint for live data queries. This report is a snapshot; the API gives you real-time access.

Connection

bash

# Basic query
curl -s "https://scanpath-clickhouse.fly.dev/?user=default&password=scanpath2025secure" \
  --data "SELECT count() FROM spincast.events"

# Query with format
curl -s "https://scanpath-clickhouse.fly.dev/?user=default&password=scanpath2025secure" \
  --data "SELECT brand_name, count() as n FROM spincast.events e
  JOIN spincast.devices d ON e.device_id = d.id
  WHERE lower(d.brand_name) LIKE '%power%glide%'
  GROUP BY brand_name FORMAT Pretty"

Example: Monitor a Device Category

sql

-- Monthly event trend for any device
SELECT toYYYYMM(event_date) as month,
       COUNT(*) as events,
       countIf(resulted_in_injury) as injuries,
       round(countIf(resulted_in_injury) * 100.0 / COUNT(*), 1) as injury_rate
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE lower(d.brand_name) LIKE '%your_device%'
  AND event_date >= today() - 365
GROUP BY month
ORDER BY month

Example: Find Emerging Signals

sql

-- Devices with accelerating event velocity (last 6 months vs prior 6 months)
SELECT d.brand_name,
       countIf(event_date >= today() - 180) as recent_6mo,
       countIf(event_date >= today() - 365 AND event_date < today() - 180) as prior_6mo,
       round(countIf(event_date >= today() - 180) * 1.0 /
             nullIf(countIf(event_date >= today() - 365 AND event_date < today() - 180), 0), 2) as velocity_ratio
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE event_date >= today() - 365
GROUP BY d.brand_name
HAVING recent_6mo >= 10
ORDER BY velocity_ratio DESC
LIMIT 20

Data Improvement Roadmap

In priority order:

Priority	Gap	Current	Target	Impact
1	~~Device linkage~~	~~99.3%~~	~~95%~~	DONE -- was 18.9%, now 99.3%
1	LLM extraction	39.3%	80%+	Larger samples for root cause and failure mode analysis
2	CMS data integration	Not in DB	In ClickHouse	Enable volume normalization and account-level intelligence
3	Account-level data	Not available	CMS Open Payments + Utilization	Identify top PowerGlide accounts by triangulation

How to Run Improvements

bash

# Check current status
python -m clickhouse.pipeline.run_pipeline --status

# Run full pipeline (ingest + extract + embed)
python -m clickhouse.pipeline.run_pipeline --days 14

# Run extraction only (improve LLM coverage)
python -m clickhouse.pipeline.run_pipeline --extract-only --extract-limit 5000

# Run embeddings only
python -m clickhouse.pipeline.run_pipeline --embed-only --embed-limit 10000

PowerGlide 2025 Timeline

The Pattern (Updated with 4.8x More Data)

Period	Events/Month	Injuries/Month	Pattern
H2 2024	1-5	0	Low baseline
Jan-May 2025	20-26	3-7	Sustained elevation
Jun-Sep 2025	9-17	2-3	Moderate
Oct 2025	35	15	Spike (includes clinical study)
Nov 2025	3	3	Drop (possibly incomplete)

Updated Assessment

With improved device linkage and ghost event recovery, the picture is nuanced:

Oct 2025 (35 events) includes clinical study batch reports but is the largest single month on record
Jan-May 2025 (20-26 events/month) is a sustained real-world elevation that cannot be explained by clinical trial batching
The acceleration pattern could reflect growing PowerGlide adoption (more devices in field = more events at constant rate) or a genuine safety trend

Without procedure volume denominators, we cannot distinguish adoption growth from safety signal. CMS data integration (planned) would resolve this ambiguity.

Limitations

Reporting Bias

Voluntary facility reports (except deaths) lead to underreporting. Estimates suggest 1-10% of actual events get reported to MAUDE. Facilities may not report if manufacturer convinces them it's "user error."

Attribution Uncertainty

Manufacturer investigations are self-interested (not independent). "Use-related" conclusions are not verified by FDA. Root cause fields reflect manufacturer claim, not verified fact.

This is the biggest limitation. Raw event counts are not comparable between manufacturers with different market share. BD's ~70% mini-midline share means many more devices in the field than Stiletto's <5%. We mitigate this by using rates (injury rate per event) and patterns (failure mode distribution) rather than raw counts.

Device Linkage Gap (Resolved)

Device linkage improved from 18.9% to 99.3%. PowerGlide went from 46 to 224 linked events through ghost event backfill and improved matching. This is no longer a limitation.

LLM Extraction Reliability

Structured fields (root cause, failure mode, user error) are extracted by an LLM from narrative text. These are interpretations, not verified facts. We cite sample sizes to make the extraction basis clear.

Injury Severity

Outcome detail varies by report quality. "Injury" is binary (yes/no), not scaled by severity. A minor bruise and a surgical retrieval both count as "injury."

Interpretation Guidelines

Event counts are proxies -- Higher counts may indicate market success (more devices in field), not just safety issues
Patterns matter more than absolutes -- Same failure mode across multiple facilities signals design issue, not user error
User error claims need scrutiny -- Manufacturer investigations are self-interested; look for pattern evidence
Absence of evidence is not evidence of absence -- Low event counts may reflect low market share, underreporting, or newness
Temporal spikes require investigation -- Distinguish clinical trial batches from actual field failure increases
Sample sizes determine confidence -- n=26 gives directional signal; n=500 gives statistical confidence

Reproducibility

All queries can be run against the Spincast ClickHouse instance:

bash

curl -s "https://scanpath-clickhouse.fly.dev/?user=default&password=scanpath2025secure" \
  --data "SELECT count() FROM spincast.events"

Database Statistics (February 2026)

Table	Row Count	Coverage
events	2,499,000+	1992-2026
devices	45,000+	Brand names
manufacturers	8,000+	Company names
enforcement_actions	57,000+	Recalls, warnings
Device linkage	2,482,000	99.3% of events
LLM extractions	981,000	39.3% of events
Embeddings	2,250,000	87-91% of events

Transparency Commitment

All claims in this report are traceable to:

FDA MAUDE database (public)
CMS Medicare data (public)
FDA 510(k) clearances (public)
Clinical trial registry (ClinicalTrials.gov, public)

Independent verification: Any claim can be verified by searching MAUDE directly. All SQL queries are provided for reproducibility. Raw data access available via ClickHouse endpoint.

Methodology ​

Data Quality Assessment ​

Current State ​

What Each Gap Means ​

Data Quality Gates ​

Data Sources ​

FDA MAUDE Database ​

CMS Provider Utilization ​

FDA Enforcement Data ​

Analysis Pipeline ​

LLM Extraction ​

Vector Embeddings ​

Key Queries ​

BD Power Line Events ​

PowerGlide Timeline ​

Failure Mode Distribution ​

Root Cause Distribution ​

Using the Spincast API ​

Connection ​

Example: Monitor a Device Category ​

Example: Find Emerging Signals ​

Data Improvement Roadmap ​

How to Run Improvements ​

PowerGlide 2025 Timeline ​

The Pattern (Updated with 4.8x More Data) ​

Updated Assessment ​

Limitations ​

Reporting Bias ​

Attribution Uncertainty ​

Market Share Confound ​

Device Linkage Gap (Resolved) ​

LLM Extraction Reliability ​

Injury Severity ​

Interpretation Guidelines ​

Reproducibility ​

Database Statistics (February 2026) ​

Transparency Commitment ​

Methodology

Data Quality Assessment

Current State

What Each Gap Means

Data Quality Gates

Data Sources

FDA MAUDE Database

CMS Provider Utilization

FDA Enforcement Data

Analysis Pipeline

LLM Extraction

Vector Embeddings

Key Queries

BD Power Line Events

PowerGlide Timeline

Failure Mode Distribution

Root Cause Distribution

Using the Spincast API

Connection

Example: Monitor a Device Category

Example: Find Emerging Signals

Data Improvement Roadmap

How to Run Improvements

PowerGlide 2025 Timeline

The Pattern (Updated with 4.8x More Data)

Updated Assessment

Limitations

Reporting Bias

Attribution Uncertainty

Market Share Confound

Device Linkage Gap (Resolved)

LLM Extraction Reliability

Injury Severity

Interpretation Guidelines

Reproducibility

Database Statistics (February 2026)

Transparency Commitment