[ biopathway.org / MSRA2011 ]

4th International Symposium on Languages in Biology and Medicine
Analyzing Disagreements among ICD-9-CM Coders


NLP researchers find it difficult to acquire and interpret clinical free text directly, most likely because of the unfamilarity with medical practices. This is why publicly available annotated corpora would be of much help, but there are still very few in the clinical domain due to patient confidentiality. In this regard, it is encouraging to see that Computational Medicine Center's 2007 Challenge provides a publicly available corpus consisting of radiology reports with ICD-9-CM codes as independently assigned by three different coders. However, the corpus shows many disagreements among the coders, making it imperative to set the standard correctly for their proper interpretation. A proposal for such a standard as implicitly advanced by its developers is to take the majority annotation. In this paper, we propose an alternative method to address such disagreements. We believe our work not only makes a meaningful improvement on the utility of this corpus but also has good implications for similar tasks, such as ICD-10-CM coding.

Additional Files

Additional File 1: 718 Codes.
Each line consists of a report ID and a code separated by a tab.
Format: Plain File Size: 11KB Download file

Page maintained by Seung-Cheol Baek
Last modified: October 31, 2011