Since many biological data analysis methods require numeric representation of the functional similarity of genes, automatically discovering the descriptive similarities of genes and converting them into measurable numeric values are very important for such analyses; This research project will solve this important problem by designing novel algorithms to measure the semantic similarity of vocabularies used to annotate genes and, in turn, devising effective algorithms to determine the functional similarity of genes.
Based on these algorithms, the following online tools are implemented with more tools still under development.
Based on our novel method to decode a GO term's semantics into a numeric value by aggregating the semantic contributions of their ancestor terms (including this specific term) in the GO hierarchy, we implemented the following tools to measure the semantic similarity of GO terms:
Based on the semantic correlation of GO terms used to annotate genes, we implemented the following tools to measure the functional similarity of genes:
Based on the gene functional similarity measurement, we implemented the following tools for gene functionality analyses:
We know the contents of the GO project can be roughly separated into two parts: GO terms and genes/gene products. Our tools also reflect them. Our tools are listed here, whose names can also be partitioned into two parts. Also, all the programs are named and numbered accordingly.
What you need to do is to write your own programs (in whatever languages you are comfortable with) to simulate http queries as what the programs of number 1 do, and then let your program calls the related the programs of number 2.
For example, check the source html code of geneCompareTwo1.php and fill what it does (needs) into your own local program and, after that, let your programs send a http query (as we click the mouse when we load the number 1 program, a form action) which can trigger geneCompareTwo2.php. The results from geneCompareTwo2.php will be sent back to you via http. Your local programs then need to accept and parse the results from the program number 2. This is also the standard way of simulating http queries in batch mode.
Every program has limitations, and our programs are not exceptions. We try to let our programs handle as many inputs as possible. However, because of the extensive and intensive usage of our tools, we have to balance the maximum number of queries and the maxumum of GO terms or gene symbols for each query.
The maximum number of GO terms that multiple GO terms comparison tool is 4,000, and the maximum number of gene clustering tool is 5,000.
If you have many GO terms or gene sysbols to analyze, there are two options. One, you write your own program and call our program in batch mode. Please refer to last section for how to call our public APIs. Two, you can run one of the multiple GO terms comparion or multiple gene clustering tools for multiple genes. However, due to the limitations of our programs, if your data excede the limitations, you have to use divide and conquer method to solve it.
For example, if you have 10,000 gene symbols while the limitation of our program is 5,000 a time, you can split your data, for example, 5,000 * 2. You have to run the programs 2*2=4 times to obtain the complete combinations of your data.