|
|
Yanni Sun
Assistant
Professor
Office:
3134 EB |
![]()
Research Interests:
My main research interest is in Bioinformatics/computational biology. I design algorithms and develop software tools to solve problems motivated by molecular biology. In particular, I am working on searching for functional elements (such as noncoding RNAs, protein domains, etc.) in large-scale sequence data sets. To know more about my research, check my research summary, research projects, and my CV.
![]()
Education:
Ph.D.
Computer Science & Engineering, August 2008, download my dissertation defense slides
Washington Univ. in
St. Louis,
M.S.
Computer Science, B.S. Computer Science
Xi'an JiaoTong
University,
Funding:
NSF CAREER:New Technologies for Genome-Scale Comparative NcRNA Identification. 2010-2015.
Deep sequence profiling of gRNA transcriptomes in two stages of Trypanosoma brucei. NIH. 2011-2013. Co-PI
![]()
Noncoing RNA (ncRNA) search
We
are interested in searching for both known ncRNAs (i.e. members of
characterized ncRNA families) and novel ncRNAs. The state-of-the-art ncRNA
search is still based on comparative sequence analysis. However, many ncRNAs
function through both their sequences and secondary structures. Thus,
comparative ncRNA search must incorporate structural information to achieve
high sensitivity. In particular, we are working on the following problems:
1.
How
to efficiently detect known ncRNAs in a large sequence data set (such as
metagenomic data sets or whole genomes)?
2.
How
to conduct structural alignment between multiple ncRNAs? Take a look at our
recent work of using grammar
strings
to encode both the sequence and structural conservation of an ncRNA!
3.
How
to discover ncRNAs that lack strong sequence similarity?

Protein domain search in
large-scale sequence databases
The
purpose of this project is to search for members of characterized protein
domain families in large-scale sequence databases. Each domain family contains
multiple homologous protein sequences sharing similar sequences, structures,
and functions. By comparing a query sequence with all domain families, we can
classify this query into a characterized domain family and then obtain more
information about this query sequence. Read more about protein domain here.

Seed design for sequence
alignment
Sequence
alignment is a fundamental step to comparative sequence analysis. Although
there exist exact dynamic programming algorithms to align a pair of sequences,
they are too expensive to be used in genome-scale. Thus, heuristics called
seeds are introduced to predict local regions that are promising to produce
statistically significant alignment scores. Design of seeds significant affects the
final alignment sensitivity and specificity. Read more about seed design here.
![]()
·
· CSE891 Introduction to Computational Biology Fall 2009
![]()
Misc.