White paper

How the ArmTrack readiness score works

A transparent, literature-informed model for estimating daily throwing readiness in youth baseball — and an honest path to a data-driven system.

By Milan Chauhan · ArmTrack · armtrack.app


Abstract

Arm injuries in youth baseball are common, costly, and substantially preventable: a large share trace to cumulative overuse rather than a single acute event, yet most young players track nothing day to day. ArmTrack converts a sub-60-second daily self-report — pain, soreness, stiffness, and throwing workload — into an interpretable readiness score (0–10) and a plain-language throwing recommendation, and aggregates a team's scores into one coach view. This paper documents the v1 model: a transparent, weighted, evidence-informed scoring function with workload- and trend-based modifiers, chosen deliberately over a black-box model because at zero user data, interpretability and clinical defensibility matter more than predictive complexity.

1. The problem

Overuse-pattern arm injuries — including UCL tears requiring "Tommy John" reconstruction — have risen sharply among adolescents, with a large fraction of cases concentrated in the 15–19 age band, and fatigue is a well-documented amplifier of risk. The actionable gap is visibility: the early signal (rising soreness, incomplete recovery, climbing workload) is available daily but uncaptured. ArmTrack's thesis is that a frictionless daily measurement, surfaced as one number a player and coach will actually look at, can shift behavior earlier than pitch-count-at-the-game tools that miss the bullpens, lessons, and long-toss that make up most arm stress.

2. Design principles

Interpretability over opacity. With no training data, a learned model would be unvalidated and unexplainable to a 13-year-old or a parent. A transparent weighted model can be reasoned about, audited, and trusted.
Frictionless input. Three 0–10 self-reports and a throw count — daily adherence beats sensor precision nobody sustains.
Conservative and non-diagnostic. The score is explicitly estimated readiness, never a medical claim; it never says "do not throw," only "throwing not recommended," and always defers to a coach, trainer, or physician.

3. The model

3.1 Per-log base score. Each daily log yields a base score from three signals, weighted by their hypothesized association with tissue stress (pain highest):

weighted  = (pain·3 + soreness·2 + stiffness·1) / 6     # each ∈ [0,10]
baseScore = clamp(10 − weighted, 0, 10)                # higher = more ready

Pain is weighted 3×, soreness 2×, stiffness 1× — encoding the clinical prior that sharp, located pain is a stronger warning than general stiffness.

3.2 Workload & trend modifiers. The estimate adjusts the latest base score for cumulative load and direction of travel — the overuse signals a single day misses:

score = baseScore(latest)
  − 0.5  if latest throws > 100                 # high single-day volume
  − 0.5  if last two logs both > 75 throws      # back-to-back heavy load
  − 0.5  if pain rose across the last 3 logs    # rising-pain trend
score = clamp(score, 0, 10)

3.3 Banding. The continuous score maps to six interpretable states with position-aware recommendations:

ScoreState
≥ 8.5Ready
7.0 – 8.4Good to Go
5.5 – 6.9Proceed with Caution
4.0 – 5.4Light Day
2.0 – 3.9Rest Recommended
< 2.0Throwing Not Recommended

3.4 Supporting signals. The system also computes a consecutive-day logging streak (adherence) and a staleness guard (readiness older than two days is flagged as not reflecting "today"), and surfaces up to two prioritized insights (concerning before positive).

4. Why not machine learning yet?

A supervised model predicting injury would need labeled outcomes — who got hurt, when — across many athlete-seasons, data that does not exist at launch and is ethically and practically slow to collect for minors. Deploying an unvalidated black box for an injury-adjacent decision would be irresponsible. The v1 heuristic is the correct cold-start choice: safe, explainable, and good enough to drive the behavior change — daily attention — that is itself the primary intervention.

5. Validation plan

As data accrues: (1) measure adherence and self-report reliability; (2) test construct validity — does the score track behavior (rest taken after low scores) and "felt-off" days; (3) test predictive validity longitudinally — do declining trajectories lead reported pain spikes; (4) calibrate the band thresholds against observed outcomes.

6. Roadmap to a learned model

The transparent scorecard becomes the baseline and safety rail for a data-driven successor: re-fit feature weights and band cutoffs to data (logistic/ordinal models, keeping interpretability); add acute:chronic workload ratio (ACWR) and exponentially-weighted load features; personalize via per-athlete baselines (hierarchical models); and, once trajectories are rich, evaluate time-series methods — always benchmarked against, and constrained by, the explainable baseline that earns user trust.

7. Limitations

Subjective self-report is noisy and gameable; the v1 weights and modifiers are expert priors, not empirically fit; the score is not a medical device and does not diagnose, treat, or prevent injury. The model's value at this stage is behavioral — it makes the invisible visible daily — with predictive rigor as an explicit, staged goal.

8. Conclusion

ArmTrack v1 is a deliberately transparent, literature-informed readiness model shipped in a free iOS/web product for a population where overuse injury is common and largely preventable. Its contribution is not algorithmic novelty but the discipline to make a defensible daily measurement frictionless and usable — and a clear, honest path from an explainable cold-start heuristic to a validated, data-driven system.


Informed by USA Baseball / MLB Pitch Smart guidance and published work on pitching fatigue and adolescent UCL injury. ArmTrack provides estimated readiness based on self-reported data; it is not medical advice and is not a substitute for a coach, athletic trainer, or physician.