Background: The Dog Aging Project (DAP) collects longitudinal veterinary records from client-owned dogs. Dates with matched body weight (BW) are essential information for clinical studies, but manual extraction from heterogeneous PDF formats is labor-intensive. Hypothesis/
Objective: To compare the weighted accuracy (WA) of rule-based versus large language model (LLM)-based automated extraction of dates with matched BW from veterinary records. Animals: 59 PDF veterinary records containing 559 paired date and BW from 49 dogs enrolled in DAP.
Methods: A pilot set of 9 medical records in 9 formats was used to develop the extraction rules for a rule-based algorithm, and prompts for VetRec (a commercial veterinary LLM), targeting ≥80% accuracy (p = 0.17). Next, 50 additional records were analyzed using both methods, including 14 unseen formats in the pilot set. Automatic outputs were compared against ground truth verified by a veterinarian and the DAP project manager. WA were calculated as 3*(exact match) / [3*(exact match) + 3*(partial match) + 3*(hallucination) + 2*(omission) + 1*(non‑compliance)].
Results: Across all 559 data points, VetRec achieved an 83% WA, compared with 51% for the rule-based method (p < .001). The rule-based model frequently misassigned non-measurement dates (e.g., scan dates, dates of birth) to nearby BW values based on spatial proximity. VetRec largely avoided this contextual error but generated 46 non-compliant outputs, an error inherently avoided by deterministic rules. Conclusion and Clinical Importance: LLM-based extraction significantly outperformed rule-based methods for automated veterinary data extraction, offering a scalable solution for heterogeneous medical record formatting across institutions.