Skip to main content

Table 4 Data file contents and counts for annotation hierarchy subtasks.

From: The TREC 2004 genomics track categorization task: classifying full text biomedical documents

File contents Training data count Test data count
Documents – PMIDs 504 378
Genes – Gene symbol, MGI identifier, and gene name for all used 1294 777
Document gene pairs – PMID-gene pairs 1418 877
Positive examples – PMIDs 178 149
Positive examples – PMID-gene pairs 346 295
Positive examples – PMID-gene-domain tuples 589 495
Positive examples – PMID-gene-domain-evidence tuples 640 522
Positive examples – all PMID-gene-GO-evidence tuples 872 693
Negative examples – PMIDs 326 229
Negative examples – PMID-gene pairs 1072 582