 
1.  The data for this question come 
from a study by Dr. Arno Motulsky and coworkers, and
are published in  Thompson et al. (1988; Am.J.Hum.Genet, 42, 113-124). 
There are two loci, here labeled 1 and 2, each aith two alleles labeled
A and B.
In a Caucasian sample of 205 individuals typed, 
the counts of the two-locus phenotypes were 
143, 35, 3, 17, 5, 0, 2, 0, and 0, respectively, for the nine types 
A1A1,A2A2;
A1A1,A2B2;
A1A1,B2B2;
A1B1,A2A2;
A1B1,A2B2;
A1B1,B2B2;
B1B1,A2A2;
B1B1,A2B2;
B1B1,B2B2. 
Use the EM algorithm to estimate the 4 haplotype frequencies.
2. This very small example is unrealistic, but makes a point we will see in #3.
We have a sample of just one individual who is heterozygous at both of
two loci: A1B1, A2B2.
Denote the frequencies of the four haplotypes
A1A2,
A1B2,
B1A2, and
B1B2,
by
qAA
qAB
qBA and
qBB.
Show that there are two maximum likelihood estimates (of equal likelihood): 
qAA = qBB = 1/2,
qAB = qBA = 0  and
qAB = qBA = 1/2,
qAA = qBB = 0. 
3. Suppose now we try to apply the EM algorithm to the data of question 2. What will the EM algorithm do? -- it will depend where you start -- figure some possibilities. (These 2 questions show the problem that can arise in trying to use the EM algorithm to estimate haplotype frequencies.)
4.  Here are the SNP genotypes of 6 individuals at 5 loci, with the alleles
labelled 0 and 1.  Each pair of
digits is the genotype at a SNP locus, and each row is an individual.
(Note these are the exact same data you will run PHASE or fastPHASE on
in Lab-1.)
| 10, 10, 10, 10, 10 | 
| 00, 00, 00, 11, 10 | 
| 00, 00, 00, 10, 00 | 
| 10, 10, 10, 11, 00 | 
| 11, 11, 11, 11, 00 | 
| 00, 00, 00, 10, 11 |