Math 275 Homework Week 1

Reading: Sections 1,2 and 3 of The Mathematics of Phylogenomics.

Problem 1

Table 1 shows the genetic code. There are four distinguished codons: the ATG codon initiates translation and is therefore always the first codon in a gene. The last codon signals for translation to stop, and is one of the codons TAG, TGA or TAA. Just by looking at the table, which amino acid would you guess is the rarest?

Table 1: The Standard Genetic Code

T C A G
T TTT Phe (F)
TTC "
TTA Leu (L)
TTG "
TCT Ser (S)
TCC "
TCA "
TCG "
TAT Tyr (Y)
TAC "
TAA STOP
TAG STOP
TGT Cys (C)
TGC "
TGA STOP
TGG Trp (W)
C CTT Leu (L)
CTC "
CTA "
CTG "
CCT Pro (P)
CCC "
CCA "
CCG "
CAT His (H)
CAC "
CAA Gln (Q)
CAG "
CGT Arg (R)
CGC "
CGA "
CGG "
A ATT Ile (I)
ATC "
ATA "
ATG Met (M)
ACT Thr (T)
ACC "
ACA "
ACG "
AAT Asn (N)
AAC "
AAA Lys (K)
AAG "
AGT Ser (S)
AGC "
AGA Arg (R)
AGG "
G GTT Val (V)
GTC "
GTA "
GTG "
GCT Ala (A)
GCC "
GCA "
GCG "
GAT Asp (D)
GAC "
GAA Glu (E)
GAG "
GGT Gly (G)
GGC "
GGA "
GGG "

Problem 2

The Codon Usage Database has tables with frequencies of codon usage for different organisms. Find the table for humans (homo sapiens), and check, using proposition 2 in "The Mathematics of Phylogenomics" paper whether the data lies in the independence model for codons. Answer the same question for the fruit fly (drosophila melanogaster). If you found two points on the model suggest a definition for their distance. If not, which dataset do you think is better represented by the model, and why?

Problem 3

How many DNA sequences can code for the Calcium ATPase protein? You can read more about this protein at the Protein Data Bank Molecule of the Month site.

Problem 4

Find the chromosome in the dog genome that contains "the meaning of life" sequence TTTAATTGAAAGAAGTTAATTGAATGAAAATGATCAACTAAG. In order to do this, paste the sequence in the box at the UCSC BLAT tool and hit submit.
Maintained by Lior Pachter.