Often in bioinformatics, we need to remind ourselves to think somewhat in statistical or probabilistic terms.
If you need a refresher, head to Canvas and review the slides and recorded lecture "Statistical Inference in Bioinformatics" (recorded by Shane Jensen, statistics). This is a very cursory review, not designed to be comprehensive. Then, answer the following questions.
Q1. Describe, in statistical terms, the concept of a "null hypothesis". How are ways a null hypothesis can be utilzed?
Q2. What is a test statistic, and how it is used?
Q3. If we reject the null hypothesis at alpha = 5% level, what does that mean?
Q4. Explain the issue of multiple hypothesis testing, and the implication. Give two statistical procedures to address this concern, and describe how to employ them.
One of the first steps in any computational project is to determine what data already exists that one can utilize to address scientific questions or gather information. Take 10 minutes to web browse and investigate one or more of the databases given below.
List of Genomic Databases
NCBI Entrez - http://www.ncbi.nlm.nih.gov/sites/gquery - huge database that encompasses other databases, including:
ExPASy - http://expasy.org/ - Another large database encompassing other databases:
This list is by no means complete, for more databases see the most recent Database Summary Paper Alpha List: http://www.oxfordjournals.org/nar/database/a/
In particular, the in class activity will focus on the UCSC genome brower
a portal that uses a Track-based system to summarize information from many databases of genomic sequence and annotations.
Spend 5 minutes on your own exploring this rich resource.