Yanni Sun

Assistant Professor


Department of Computer Science and Engineering
3115 Engineering Building
Michigan State University
East Lansing, MI 48824


Office phone: +1-517-432-5169

Office: 3134 EB
yannisun    AT   cse   DOT    msu   DOT   edu


Research Interests:

My main research interest is in Bioinformatics/computational biology. I design algorithms and develop software tools to solve problems motivated by molecular biology. In particular, I am working on searching for functional elements (such as noncoding RNAs, protein domains, etc.) in large-scale sequence data sets. To know more about my research, check my  research summary, research projects, and my CV.

Education:

Ph.D. Computer Science & Engineering, August 2008, download my dissertation defense slides  
Washington Univ. in St. Louis,
St. Louis, MO

M.S.   Computer Science, B.S.   Computer Science
Xi'an JiaoTong University,
Xi'an, China    

Funding:

NSF CAREER:New Technologies for Genome-Scale Comparative NcRNA Identification. 2010-2015.

Deep sequence profiling of gRNA transcriptomes in two stages of Trypanosoma brucei. NIH. 2011-2013. Co-PI

Projects:

 Noncoing RNA (ncRNA) search

We are interested in searching for both known ncRNAs (i.e. members of characterized ncRNA families) and novel ncRNAs. The state-of-the-art ncRNA search is still based on comparative sequence analysis. However, many ncRNAs function through both their sequences and secondary structures. Thus, comparative ncRNA search must incorporate structural information to achieve high sensitivity. In particular, we are working on the following problems:

        1.     How to efficiently detect known ncRNAs in a large sequence data set (such as metagenomic data sets or whole genomes)?

         2.     How to conduct structural alignment between multiple ncRNAs? Take a look at our recent work of using grammar strings to encode both the sequence and structural conservation of an ncRNA!
3. How to discover ncRNAs that lack strong sequence similarity? Read more about ncRNA search here.

 

 

 Protein domain search in large-scale sequence databases

The purpose of this project is to search for members of characterized protein domain families in large-scale sequence databases. Each domain family contains multiple homologous protein sequences sharing similar sequences, structures, and functions. By comparing a query sequence with all domain families, we can classify this query into a characterized domain family and then obtain more information about this query sequence. Read more about protein domain here

    

 Seed design for sequence alignment

Sequence alignment is a fundamental step to comparative sequence analysis. Although there exist exact dynamic programming algorithms to align a pair of sequences, they are too expensive to be used in genome-scale. Thus, heuristics called seeds are introduced to predict local regions that are promising to produce statistically significant alignment scores.  Design of seeds significant affects the final alignment sensitivity and specificity. Read more about seed design here.

 

                                         

Teaching

·       CSE331 Algorithms and Data Structures. Fall 2008, SS 2010, SS2011

·       CSE891 Introduction to Computational Biology Fall 2009

 

 

 

Misc.

http://weathersticker.wunderground.com/weathersticker/gizmotimetempbig_both/US/MI/East_Lansing.gif
 

http://weathersticker.wunderground.com/weathersticker/gizmotimetempbig_both/global/stations/45007.gif

Click for Saint Louis, Missouri Forecast

http://banners.wunderground.com/banner/gizmotimetempbig_both/language/www/global/stations/56294.gif