Abstract | Objectives: This study aimed on the evaluation of diagnostic accuracy and improvement of
model performance of a machine learning model in predicting the instance of AA in pediatric
patients, comparing the original dataset to our compressed subset only including patients with
a total bilirubin count.
Material and methods: This study was design as a single-center retrospective cohort study.
The dataset involves 551 pediatric patients that underwent appendectomy between January
2019 and July 2023, with a subset created including only the patients that have their total
bilirubin level in their laboratory findings, comprising 297 patients. Random Forest model and
logistic Regression model are the machine learning models used in calculations with a nested
cross-validation approach with stratification on the target variable Bilirubin. The feature
importance is calculated with Random Forest by averaging the reduction impurity brought by
each feature over all trees in the forest, called Mean Decrease Impurity (MDI).
Results: A subset was created, including only patients with laboratory values of total bilirubin
count comprised of a total of 297 patients with fourteen different variables as patients’
characteristics. For Logistic Regression the mean precision for identifying negative cases was
0.26, with a 95% CI ranging from 0.1 to 0.57 and mean recall was 0.41, with a 95% CI ranging
from 0.1 to 0.833. The mean F1-score was 0.31, with a 95% CI ranging from 0.1 to 0.60. The
Random Forest model demonstrated superior performance compared to the Logistic Regression
model with mean precision of 0.416, with a 95% CI ranging from 0.12 to 0.71 and a mean recall
of 0.697, with a 95% CI ranging from 0.33 to 1.0. The mean F1-score was 0.508, with a 95%
CI ranging from 0.18 to 0.75. Mean AUC score for Random Forest was 0.83, with a 95% CI
ranging from 0.70 to 0.95. The analysis of feature importance reveals that total bilirubin level
as the 9th most important feature of fourteen in total, with CRP and leukocyte count being the
2 most important. The negative appendicitis group exhibited a mean total bilirubin value of
9.21, with a SD of 3.8, while the positive appendicitis group had a notably higher mean total
bilirubin value of 16.07, accompanied by a larger SD of 11.9. This difference between the two
groups was statistically significant, as indicated by a p-value of 0.001.
Conclusions: As already depicted in the study with the original dataset, our findings calculated
from our subset of data demonstrated a superiority of Random Forest over Logistic Regression
with a higher precision and recall. While bilirubin acts as a useful marker in diagnosing severe
cases of AA combined with proper history and examination, incorporating it into our study did
not significantly enhance the model’s predictive accuracy. This can be attributed to its
importance in identifying patients at high risk for perforation and gangrene, rather than simple
AA cases. |
Abstract (croatian) | Ciljevi: Ovo istraživanje ima za cilj procjenu dijagnostičke točnosti i poboljšanje performansi
strojnog učenja u predviđanju upale crvuljka u djece, uspoređujući originalni skup podataka s
našim komprimiranim podskupom koji uključuje samo djecu s ukupnom razinom bilirubina.
Materijali i metode: Ovo istraživanje je retrospektivna kohortna studija provedena u jednom
centru. Uključuje 551 djece koji su podvrgnuti apendektomiji između siječnja 2019. i srpnja
2023. godine, s podskupom od 297 bolesnika koji imaju dostupne podatke o ukupnom
bilirubinu. Korišteni su modeli Random Forest i logistička regresija s ugniježđenom unakrsnom
validacijom stratificiranom prema varijabli bilirubin. Važnost značajki procijenjena je pomoću
Mean Decrease Impurity (MDI) u Random Forest modelu.
Rezultati: Stvoren je podskup koji uključuje samo bolesnike s laboratorijskim vrijednostima
ukupnog bilirubina, što čini ukupno 297 djece s četrnaest različitih varijabli kao
karakteristikama bolesnika. Za logističku regresiju srednja preciznost za identifikaciju
negativnih slučajeva bila je 0,26, s 95% CI od 0,1 do 0,57, dok je srednja osjetljivost iznosila
0,41, s 95% CI od 0,1 do 0,833. Srednja F1-ocjena iznosila je 0,31, s 95% CI od 0,1 do 0,60.
Model Random Forest je pokazao superiorniju izvedbu u usporedbi s modelom logističke
regresije s prosječnom preciznošću od 0,416, s 95% CI od 0,12 do 0,71, i s prosječnom
osjetljivošću od 0,697, s 95% CI od 0,33 do 1,0. Srednja F1-ocjena iznosila je 0,508, s 95% CI
od 0,18 do 0,75. Srednje AUC vrijednosti za Random Forest iznosili su 0,83, s 95% CI od 0,70
do 0.95. Analiza važnosti značajki pokazala je da je razina ukupnog bilirubina deveta po
važnosti od četrnaest varijabli ukupno, dok su CRP i broj leukocita dvije najvažnije značajke.
Skupina s negativnim apendektomijama imala je prosječnu vrijednost ukupnog bilirubina od
9,21, s SD od 3,8, dok je skupina s pozitivnim nalazom akutnog apendicitisa imala značajno
višu prosječnu vrijednost ukupnog bilirubina od 16,07, uz veću standardnu devijaciju od 11,9
(p=0,001).
Zaključci: Naši rezultati potvrđuju superiornost Random Forest modela u odnosu na logističku
regresiju u predviđanju akutnog apendicitisa. Iako ukupni bilirubin može biti koristan u
dijagnostici težih oblika akutnog apendicitisa, njegovo uključivanje nije značajno poboljšalo
prediktivnu točnost modela za jednostavne slučajeve upale crvuljka, već za identifikaciju
bolesnika s visokim rizikom od komplikacija poput perforacije i gangrene. |