The data comes from a publication http://www.pnas.org/content/102/5/1572.full (reading it is optional).
The data describes N=109 yeast cells, that come from the crossing of two yeast strains (BY and RM). There is M =~ 3000 SNPs that are different between parent strains. Offsprings can have either BY or RM variant of any one SNP.
genotype.csv: (N+3 rows, M+1 columns)
All other lines:
expression.csv: (N+1 rows, G+1 columns)
All other rows:
- read in the files.
- choose 2 - 5 genes (columns in expression.csv file) and plot their expression distribution.
- choose 2 - 5 individual yeast IDs (rows in genotype.csv file) and plot SNP values.
- implement LOD score calculation (formulas available here).
- calculate LOD score for each SNP for 5 genes. Choose SNP with largest LOD score, plot distribution if xi=0, xi=1.
- calculate LOD score for each SNP for all genes. Draw heat map.
5 tasks out of 6 for full credit.
Properly annotate figures. Submit in pdf format.