Comparison of LeMeDISCO’s J-score with the XD score, NG, SAB score and symptom similarity for correlations with comorbidity quantified by the log(RR) score, φ–score and recalla.
Recall | |||||
---|---|---|---|---|---|
Log(RR) score | φ-score | Log(RR) score | φ-score | ||
LeMeDISCO | 0.312(0.0) | 0.218(0.0) | 0.933(8.1 x 10-5) | 0.900(3.9 x 10-4) | 49.7% |
LeMeDISCO | 0.185(0.0) | 0.138(0.0) | 0.939(5.6 x 10-5) | 0.829(3.0 x 10-3) | 56.0% |
XD score26 | 0.050(5.9 x 10-18) | 0.082(0.0) | 0.445(0.20) | 0.252(0.48) | 6.5% |
NGe | 0.008(0.17) | 0.058(1.3 x 10-23) | -0.436 | -0.175 | - |
LeMeDISCO | 0.217(1.5 x10-11) | 0.282(9.0 x10-19) | 0.682(0.030) | 0.688(0.028) | 75.8% |
SAB score28 | -0.188(5.5 x10-9) | -0.218(1.2 x10-11) | -0.671(0.034) | -0.473(0.17) | 8.5% |
LeMeDISCO | 0.184(1.9x10-21) | 0.196(3.5x10-24) | 0.774(8.6 x10-3) | 0.654(0.040) | 71.1% |
Symptom similarity27 | 0.337(0.0) | 0.197(1.6 x10-24) | 0.950(2.6 x10-5) | 0.960(1.1 x10-5) | 100% |
a Numbers in parenthesis are the p-values of the corresponding correlation. Bold indicates the best results for the given data set.
b Unbinned means raw data; each pair is a data point. 10 bins: partitioning the prediction scores into 10 equal size bins. In each bin, the Log(RR) & φ–score are averaged over data points in the bin. This gives equal weight to the rare prediction scores in the correlation analysis.
c Mapping the DOID IDs from the human DO database to ICD9 IDs of Ref.25, gives a set of 198,149 disease pairs
d Mapped the ICD9 disease code to our DOID of DO and obtained a consensus subset of 29,783 pairs from Table S0 dataset of 97,665 pairs in Ref.26.
e NG is the number of shared genes between disease pairs in Ref.26.
f Consensus set of 947 disease pairs from the dataset of Ref.28 and our dataset of 198,149.
g A consensus dataset of 2,630 disease pairs was obtained from their Supplementary dataset 4 of Ref.27 compared to our set of 198,149 pairs.