International Classification of Diseases (ICD) code–based claims databases are often used to study infective endocarditis (IE). However, the quality of ICD coding can influence the reliability of IE research. The impact of complementing the ICD-only approach with data extracted from electronic medical records (EMRs) has yet to be explored.


We selected the information of adult patients with discharge ICD codes for IE (ICD-9: 421, 112.81, 036.42, 098.84, 115.04, 115.14, 115.94, 424.9; ICD-10: I33, I38, I39) during 2005–2016 in China Medical University Hospital. Data extraction was conducted on the basis of the modified Duke criteria to establish a reference group comprising patients with definite or possible IE. Clinical characteristics and in-hospital mortality were compared between ICD-identified and Duke-confirmed cases. The positive predictive value (PPV) was used to quantify the IE identification performance of various phenotyping algorithms.


A total of 593 patients with discharge ICD codes for IE were identified, only 56.7% met the modified Duke criteria. The crude in-hospital mortality for Duke-confirmed and Duke-rejected IE were 24.4% and 8.2%, respectively. The adjusted in-hospital mortality for ICD-identified IE was lower than that for Duke-confirmed IE by a difference of 5.1%. The best PPV was achieved (0.90, 95% CI 0.86–0.93) when major components of the Duke criteria (positive blood culture and vegetation) were integrated with ICD codes.


Integrating EMR data can considerably improve the accuracy of ICD-only approaches in phenotyping IE, which can improve the validity of EMR-based studies and their applications, including real-time surveillance and clinical decision support.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.