Skip to main content

Table 4 Data file contents and counts for annotation hierarchy subtasks.

From: The TREC 2004 genomics track categorization task: classifying full text biomedical documents

File contents

Training data count

Test data count

Documents – PMIDs

504

378

Genes – Gene symbol, MGI identifier, and gene name for all used

1294

777

Document gene pairs – PMID-gene pairs

1418

877

Positive examples – PMIDs

178

149

Positive examples – PMID-gene pairs

346

295

Positive examples – PMID-gene-domain tuples

589

495

Positive examples – PMID-gene-domain-evidence tuples

640

522

Positive examples – all PMID-gene-GO-evidence tuples

872

693

Negative examples – PMIDs

326

229

Negative examples – PMID-gene pairs

1072

582