From: The TREC 2004 genomics track categorization task: classifying full text biomedical documents
File contents | Training data count | Test data count |
---|---|---|
Documents – PMIDs | 504 | 378 |
Genes – Gene symbol, MGI identifier, and gene name for all used | 1294 | 777 |
Document gene pairs – PMID-gene pairs | 1418 | 877 |
Positive examples – PMIDs | 178 | 149 |
Positive examples – PMID-gene pairs | 346 | 295 |
Positive examples – PMID-gene-domain tuples | 589 | 495 |
Positive examples – PMID-gene-domain-evidence tuples | 640 | 522 |
Positive examples – all PMID-gene-GO-evidence tuples | 872 | 693 |
Negative examples – PMIDs | 326 | 229 |
Negative examples – PMID-gene pairs | 1072 | 582 |