A New Method to Measure the Semantic Similarity of GO Terms
(Supplement Information)

James Z. Wang, Zhidian Du, Rapeeporn Payattakool, Philip S. Yu, and Chin-Fu Chen

Summary of Evaluation Results:

Total # of pathways in the SGD website 152
# of pathways having at least 3 genes 111
# of pathways used in evaluation 111
# of evaluations showing that our method is better 66
# of evaluations showing that both methods are equal 45
# of evaluations showing that our method is worse 0

Evaluation Analysis Details:

http://bioinformatics.clemson.edu/Publication/Supplement/pathway.htm

Note: the pathway figures used in our evaluation are directly copied from the SGD website. Copyright information for these pathway figures can be found at the bottom of the main page in the SGD database website http://www.yeastgenome.org/. Other figures in our evaluation are produced by our gene clustering tools and GO visualization tool.

Explanation of the Evaluation Method:

The evaluation of semantic similarity measurement methods is a challenging task because it usually requires human involvement. In natural language domain, most studies collect a small set of term pairs and let people rank their semantic similarities. Then, the correlations between the measured semantic similarity values and the human similarity rankings are used to evaluate the semantic similarity measurement method. In this paper, we use a similar approach to evaluate our similarity measurement algorithm. We use the gene annotation and classification information for pathways manually curated by researchers in the SGD database (http://pathway.yeastgenome.org/biocyc/) as the reference for our similarity measurement. Although recent studies (Wang et al., 2004; Sevilla et al., 2005; Guo et al., 2006) used the correlation with gene sequence or gene expression similarities to evaluate the semantic similarity measurement methods, the feasibility of this evaluation method is still debatable because there is not always correlation between the gene functional similarities and the gene sequence or gene expression similarities.

To demonstrate the advantages of our similarity measurement algorithm over the existing methods, we implemented two online gene-clustering tools (http://bioinformatics.clemson.edu/G-SESAME/knowledgeDiscovery.html) based on our algorithm and Resnik's method respectively. These tools first measure the functional similarities between the input genes and, then, cluster the genes based on the obtained similarity values. We also implement a visualization tool to display the annotation information for a pair of genes on the molecular function ontology to demonstrate the functional similarity of these two genes. With the annotation information in the pathway and GO database (visualized by our visualization tool if necessary), we can evaluate whether the similarity values obtained by a similarity measurement method are consistent with human perspectives by visual examination.

In our evaluation details page at http://bioinformatics.clemson.edu/Publication/Supplement/pathway.htm, we list all pathways retrieved from the SGD database and the hyper links to these pathways in the SGD database. We did not evaluate the pathways in which the numbers of genes are less than 3, because the similarity values and clustering results of 1 or 2 genes do not provide enough information for evaluation. For pathways containing at least 3 genes, we use gene clustering tools based on Resnik's method and our method respectively to cluster the genes in these pathways. We analyze the similarity values and clustering results using the annotation information from GO database (visualized by our visualization tool if necessary) and the annotation information containing in SGD pathway figures. For each pathway, we present the conclusion of our evaluation and a hyper link to our analysis details.

In our analyses, if Resnik's method and G-SESAME created the same clustering results and there is no clear evidence showing any of these two methods produced unreasonable similarity values even though different values are used for the similarities, our conclusion is "Both methods are equal". If there are clear evidences showing which method is better, we discuss these evidences and explain why one method is better than the other.

Our evaluation results show that the similarity values and clustering results obtained by our algorithm are consistent with human perspectives while similarity values and clustering results obtained by Resnik's method are often inconsistent with the human perception. Specifically, our method is shown to be better than Resnik's method in 66 of the total 111 evaluations. In other 45 cases, the similarity values and clustering results obtained by both methods are very similar. There is not a single case that Resnik's method is shown to be better than our approach.