Program:
Our approach has been implemented in both C and Perl, here are the pre-compiled executable files:
Data Sets:
The protein-domain relationships for each protein in S. cerevisiae, C. elegans and D. melanogaster are extracted from PFAM and SMART.
D. melanogaster proteins are represented by the CG symbols, C. elegans proteins are represented by the gene IDs, and S. cerevisiae proteins are represented by the ORF IDs.
The high-throughput yeast two-hybrid data from three organisms, S. cerevisiae, C. elegans and D. melanogaster are used to infer domain-domain interaction probabilities in our study.
Results
We compute domain-domain interaction probabilities from Y2H protein-protein interactions, and then use these domain-domain interaction probabilities to compute the interaction probability between every pair of proteins. The prediction results with a false positive rate fp=3E-4 and a false negative rate fn=0.85 are listed blow.
The matches between the MIPS data and our predictions based on different values of false negative rates and false positive rates are counted, the results are listed in the following tables:
We compare the specificity and sensitivity for the predictions based on different values of false negative rates and false positive rates and the results are ploted in the ROC curves.
We compute the gene expression correlation for each predicted interacting protein pair (fn=0.85, fp = 3E-4, threshold = 0.1) and compare the predictions against randomly chosen pairs and MIPS data. We use two publicly available gene expression dataset for the calculation. One is a time course study during the yeast cell cycle (Spellman et al. 1998) and another is the Rosetta "compendium" set (Hughes et al. 2000).
We compare our prediction method with three other methods, including the sequence-signature method, attraction-only model and attraction-repulsion model based on the training data obtained from S. cerevisiae only. The sensitivity and specificty of each prediction method obtained from one cross-validation experiment are listed in the following tables.
We compare our prediction based on complete interaction data with that based on core interaction data.
Created Date: March 28, 2005
Email: Hongyu Zhao