Appearance
Methodology
Data sources, data quality assessment, analysis approach, and query transparency. Every claim in this report is traceable to public data, and every limitation is stated explicitly.
Data Quality Assessment
Critical Context
The data underlying this report has significant gaps. All claims are made within these constraints. Improving data quality is the #1 priority for making stronger claims in future reports.
Current State
| Dimension | Status | Value | Impact on Report |
|---|---|---|---|
| Total MAUDE events | Good | 2.50M | Full history available |
| Device linkage | Solved | 99.3% linked | Virtually all events analyzable by brand |
| LLM extraction | Moderate | 39.3% (981K events) | Root cause and failure mode data improving |
| Embedding coverage | Good | 87-91% | Vector search works |
| PowerGlide linked events | Good | 224 events | Strong pattern detection |
| PowerGlide extracted events | Moderate | 132 events (58.9%) | Failure mode and root cause data |
| BD Power line total events | Large | ~5,100 events | Full portfolio analysis |
What Each Gap Means
Device linkage at 99.3%: Previously the critical blocker at 18.9%. Now virtually all events are linked to specific device brands. PowerGlide specifically went from 46 to 224 linked events (4.8x improvement) through ghost event backfill and improved device matching.
LLM extraction at 39.3%: The structured fields we rely on (recall_risk, root_cause, failure_mode, user_error_blamed) come from LLM extraction of event narratives. Coverage improved from ~30% to 39.3% overall. PowerGlide has 58.9% extraction coverage (132 of 224 events). Remaining gap means some failure patterns may be underrepresented.
Embedding coverage at 87-91%: Good enough for vector similarity search and clustering. Used to discover "ghost" PowerGlide events that were filed under generic names.
Data Quality Gates
What you need before making specific types of claims:
Data Sources
FDA MAUDE Database
Source: FDA Manufacturer and User Facility Device Experience
Coverage: Medical device adverse event reports from 1992-present
Our dataset: 2.50M events (data through February 2026)
Access: Public data via openFDA API and bulk downloads
URL: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm
CMS Provider Utilization
Source: Centers for Medicare & Medicaid Services (public files)
Coverage: Medicare Part B procedures by provider
Dataset used: 727K procedures, 299K providers (2022 data)
Note: This data comes from publicly available CMS files. It is not stored in our ClickHouse database. We reference it for market sizing only.
HCPCS codes used:
- 36568: PICC insertion (without imaging)
- 36569: PICC insertion (with imaging)
- 36572: CVC insertion (without imaging)
- 36573: CVC insertion (with imaging)
INFO
Medicare represents ~30% of total procedures. Actual market volume is 3-4x larger.
FDA Enforcement Data
Sources: Recall database, warning letters, enforcement actions
Our dataset: 57K enforcement actions in ClickHouse
Analysis Pipeline
openFDA API → ClickHouse (analytics engine)Process:
- Daily date ranges to avoid API 26K pagination limit
- Deduplication by MDR report key
- Device and manufacturer linking via brand name matching
- Geographic extraction from event narratives
- LLM extraction for structured fields (recall risk, root cause, failure mode)
- Vector embeddings for similarity search
Refresh: Weekly automated pipeline
LLM Extraction
Model: Gemini 2.0 Flash via Vertex AI Batch
Cost: ~$0.05 per 1M tokens (~$45 for 700K events)
Fields extracted:
| Field | Type | Purpose |
|---|---|---|
| recall_risk | high/medium/low | Recall prediction |
| root_cause | enum | Failure attribution |
| failure_mode | enum | Technical classification |
| user_error_blamed | boolean | Blame detection |
| user_error_justified | boolean | Blame validation |
| affects_other_units | boolean | Batch risk indicator |
| facility_name | text | Geographic intelligence |
| facility_state | text | Territory mapping |
Coverage: 39.3% of total events extracted (981K of 2.50M).
Vector Embeddings
Model: Voyage-3 (1024-dimensional)
Use cases: Similar event clustering, failure pattern discovery, semantic search across narratives
Coverage: 87-91% of events embedded
Key Queries
BD Power Line Events
sql
SELECT d.brand_name, COUNT(*) as events,
countIf(resulted_in_injury) as injuries,
round(countIf(resulted_in_injury) * 100.0 / COUNT(*), 1) as injury_rate
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE lower(d.brand_name) LIKE '%power%picc%'
OR lower(d.brand_name) LIKE '%power%glide%'
OR lower(d.brand_name) LIKE '%power%midline%'
GROUP BY d.brand_name
ORDER BY events DESCPowerGlide Timeline
sql
SELECT toYYYYMM(event_date) as month,
COUNT(*) as events,
countIf(resulted_in_injury) as injuries
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE lower(d.brand_name) LIKE '%power%glide%'
AND event_date >= '2024-11-01'
GROUP BY month
ORDER BY monthFailure Mode Distribution
sql
SELECT failure_mode, COUNT(*) as n,
round(COUNT()*100.0/SUM(COUNT(*)) OVER(), 1) as pct
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE llm_extracted = 1
AND (lower(d.brand_name) LIKE '%power%glide%'
OR lower(d.brand_name) LIKE '%power%picc%'
OR lower(d.brand_name) LIKE '%power%midline%')
GROUP BY failure_mode
ORDER BY n DESCRoot Cause Distribution
sql
SELECT root_cause, COUNT(*) as n,
round(COUNT()*100.0/SUM(COUNT(*)) OVER(), 1) as pct
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE llm_extracted = 1
AND (lower(d.brand_name) LIKE '%power%glide%'
OR lower(d.brand_name) LIKE '%power%picc%'
OR lower(d.brand_name) LIKE '%power%midline%')
GROUP BY root_cause
ORDER BY n DESCUsing the Spincast API
Spincast provides a ClickHouse endpoint for live data queries. This report is a snapshot; the API gives you real-time access.
Connection
bash
# Basic query
curl -s "https://scanpath-clickhouse.fly.dev/?user=default&password=scanpath2025secure" \
--data "SELECT count() FROM spincast.events"
# Query with format
curl -s "https://scanpath-clickhouse.fly.dev/?user=default&password=scanpath2025secure" \
--data "SELECT brand_name, count() as n FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE lower(d.brand_name) LIKE '%power%glide%'
GROUP BY brand_name FORMAT Pretty"Example: Monitor a Device Category
sql
-- Monthly event trend for any device
SELECT toYYYYMM(event_date) as month,
COUNT(*) as events,
countIf(resulted_in_injury) as injuries,
round(countIf(resulted_in_injury) * 100.0 / COUNT(*), 1) as injury_rate
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE lower(d.brand_name) LIKE '%your_device%'
AND event_date >= today() - 365
GROUP BY month
ORDER BY monthExample: Find Emerging Signals
sql
-- Devices with accelerating event velocity (last 6 months vs prior 6 months)
SELECT d.brand_name,
countIf(event_date >= today() - 180) as recent_6mo,
countIf(event_date >= today() - 365 AND event_date < today() - 180) as prior_6mo,
round(countIf(event_date >= today() - 180) * 1.0 /
nullIf(countIf(event_date >= today() - 365 AND event_date < today() - 180), 0), 2) as velocity_ratio
FROM spincast.events e
JOIN spincast.devices d ON e.device_id = d.id
WHERE event_date >= today() - 365
GROUP BY d.brand_name
HAVING recent_6mo >= 10
ORDER BY velocity_ratio DESC
LIMIT 20Data Improvement Roadmap
In priority order:
| Priority | Gap | Current | Target | Impact |
|---|---|---|---|---|
| DONE -- was 18.9%, now 99.3% | ||||
| 1 | LLM extraction | 39.3% | 80%+ | Larger samples for root cause and failure mode analysis |
| 2 | CMS data integration | Not in DB | In ClickHouse | Enable volume normalization and account-level intelligence |
| 3 | Account-level data | Not available | CMS Open Payments + Utilization | Identify top PowerGlide accounts by triangulation |
How to Run Improvements
bash
# Check current status
python -m clickhouse.pipeline.run_pipeline --status
# Run full pipeline (ingest + extract + embed)
python -m clickhouse.pipeline.run_pipeline --days 14
# Run extraction only (improve LLM coverage)
python -m clickhouse.pipeline.run_pipeline --extract-only --extract-limit 5000
# Run embeddings only
python -m clickhouse.pipeline.run_pipeline --embed-only --embed-limit 10000PowerGlide 2025 Timeline
The Pattern (Updated with 4.8x More Data)
| Period | Events/Month | Injuries/Month | Pattern |
|---|---|---|---|
| H2 2024 | 1-5 | 0 | Low baseline |
| Jan-May 2025 | 20-26 | 3-7 | Sustained elevation |
| Jun-Sep 2025 | 9-17 | 2-3 | Moderate |
| Oct 2025 | 35 | 15 | Spike (includes clinical study) |
| Nov 2025 | 3 | 3 | Drop (possibly incomplete) |
Updated Assessment
With improved device linkage and ghost event recovery, the picture is nuanced:
- Oct 2025 (35 events) includes clinical study batch reports but is the largest single month on record
- Jan-May 2025 (20-26 events/month) is a sustained real-world elevation that cannot be explained by clinical trial batching
- The acceleration pattern could reflect growing PowerGlide adoption (more devices in field = more events at constant rate) or a genuine safety trend
Without procedure volume denominators, we cannot distinguish adoption growth from safety signal. CMS data integration (planned) would resolve this ambiguity.
Limitations
Reporting Bias
Voluntary facility reports (except deaths) lead to underreporting. Estimates suggest 1-10% of actual events get reported to MAUDE. Facilities may not report if manufacturer convinces them it's "user error."
Attribution Uncertainty
Manufacturer investigations are self-interested (not independent). "Use-related" conclusions are not verified by FDA. Root cause fields reflect manufacturer claim, not verified fact.
Market Share Confound
This is the biggest limitation. Raw event counts are not comparable between manufacturers with different market share. BD's ~70% mini-midline share means many more devices in the field than Stiletto's <5%. We mitigate this by using rates (injury rate per event) and patterns (failure mode distribution) rather than raw counts.
Device Linkage Gap (Resolved)
Device linkage improved from 18.9% to 99.3%. PowerGlide went from 46 to 224 linked events through ghost event backfill and improved matching. This is no longer a limitation.
LLM Extraction Reliability
Structured fields (root cause, failure mode, user error) are extracted by an LLM from narrative text. These are interpretations, not verified facts. We cite sample sizes to make the extraction basis clear.
Injury Severity
Outcome detail varies by report quality. "Injury" is binary (yes/no), not scaled by severity. A minor bruise and a surgical retrieval both count as "injury."
Interpretation Guidelines
- Event counts are proxies -- Higher counts may indicate market success (more devices in field), not just safety issues
- Patterns matter more than absolutes -- Same failure mode across multiple facilities signals design issue, not user error
- User error claims need scrutiny -- Manufacturer investigations are self-interested; look for pattern evidence
- Absence of evidence is not evidence of absence -- Low event counts may reflect low market share, underreporting, or newness
- Temporal spikes require investigation -- Distinguish clinical trial batches from actual field failure increases
- Sample sizes determine confidence -- n=26 gives directional signal; n=500 gives statistical confidence
Reproducibility
All queries can be run against the Spincast ClickHouse instance:
bash
curl -s "https://scanpath-clickhouse.fly.dev/?user=default&password=scanpath2025secure" \
--data "SELECT count() FROM spincast.events"Database Statistics (February 2026)
| Table | Row Count | Coverage |
|---|---|---|
| events | 2,499,000+ | 1992-2026 |
| devices | 45,000+ | Brand names |
| manufacturers | 8,000+ | Company names |
| enforcement_actions | 57,000+ | Recalls, warnings |
| Device linkage | 2,482,000 | 99.3% of events |
| LLM extractions | 981,000 | 39.3% of events |
| Embeddings | 2,250,000 | 87-91% of events |
Transparency Commitment
All claims in this report are traceable to:
- FDA MAUDE database (public)
- CMS Medicare data (public)
- FDA 510(k) clearances (public)
- Clinical trial registry (ClinicalTrials.gov, public)
Independent verification: Any claim can be verified by searching MAUDE directly. All SQL queries are provided for reproducibility. Raw data access available via ClickHouse endpoint.
What this report does:
- Uses rates and patterns, not raw counts
- Includes sample sizes with every claim
- States limitations prominently
- Separates what the data shows from what it doesn't
What this report does NOT do:
- Compare raw event counts between products with different market share
- Use clinical trial spikes as field failure evidence
- Claim Stiletto is "proven safer" (insufficient field data)
- Hide data gaps or limitations
Last updated: February 2026