Math 127 - Fall 2016
Mathematical and Computational Methods in Molecular Biology
Instructor | David Dynerman |
Lectures | Tuesdays & Thursdays, 0800-0930 in Barrows 122 |
Question? | Post it on bCourses! |
dynerman@berkeley.edu | |
Office Hours | Tues. 0945-1045, Weds. 0830-1030 in 943 Evans |
The goal of this course is to introduce some basic mathematical tools that are used to study biological problems. We will do this by learning about two such problems: inferring evolutionary trees from living species and determining the 3D structure of proteins by x-ray crystallography and cryo-electron microscopy (cryo-EM).
These problems will give us the opportunity to see two beautiful examples of how mathematics can be rigorously applied to study interesting and important scientific problems. Here are some of the topics we will encounter during the semester:
- Rigorously modeling experimental science
- Modeling problems using trees
- Solving optimization problems
- Modeling problems using Markov chains
- Fourier Series and the Discrete Fourier Transform
- The Radon Transform
- Basic concepts in solving inverse problems
- Writing computer simulations
Note: The above topics have applications in many other areas of pure & applied mathematics, physics, chemistry and biology, so even if you do not study any further math biology, my hope is that you will find the course useful in your future work.
Syllabus
Reading should be completed before the indicated lecture.
Date | Topics | Reading | Due |
---|---|---|---|
Aug 25 | Course overview; Start of Unit 1: the problem of phylogenetic inference | ||
Aug 30 | basic concepts about DNA | Phylo 1.1-1.3 | |
Sep 1 | Trees | Python Tutorial 1 | |
Sep 6 | Maximum Parsimony | Phylo 2.1-2.4 | |
Sep 8 | Maximum Parsimony | HW1 Due, Python Tutorial 2 | |
Sep 13 | Maximum Parsimony conclusion; Probability review; Markov chains | Phylo 5.1-5.2, 6.1-6.3, 7.1 | |
Sep 15 | Markov chains; transition matrices; linear algebra review | Phylo 5.3-5.4 | Python Tutorial 3 |
Sep 20 | Perron-Frobenius Theorem; Properties of Markov matrices | Phylo 5.5-5.6 | |
Sep 22 | Modeling DNA mutation using Markov chains | Stats 3.1, 3.2 | HW2 Out |
Sep 27 | The Jukes-Cantor Model; Jukes-Cantor Distance; Introduction to distance based inference | Stats 2.1-2.4 | |
Sep 29 | Four point condition; neighbor joining | SL 1.3, 1.4; Optional SL 1.1, 1.2 | |
Oct 4 | neighbor joining | ||
Oct 6 | Start of Unit 2: DNA translation; the problem of protein structure determination; protein structure | HW2 Due | |
Oct 11 | Protein structure; protein energetics and folding | ||
Oct 13 | scattering experiments; the Fourier Transform | ||
Oct 18 | Fourier Series | HW3 Due. Project proposals due | |
Oct 20 | Discrete Fourier Transform | ||
Oct 25 | Discrete Fourier Transform | ||
Oct 27 | Mathematical model for X-ray crystallography | ||
Nov 1 | Reconstructing proteins from x-ray diffraction data | ||
Nov 3 | the cryo-EM revolution; mathematical model for biological electron microscopy | Project Week 1 deliverables due (Fri) | |
Nov 8 | reconstructing proteins from EM projections | ||
Nov 10 | directly estimating orientations of EM projections | Project Week 2 deliverables due (Fri) | |
Nov 15 | recovering 3D structures by back-projection | ||
Nov 17 | back-projection | Project Week 3 deliverables due (Fri) | |
Nov 22 | Reconstruction in practice: iterative methods | ||
Nov 24 | Thanksgiving | Project Week 4 deliverables due (Sun Nov 27) | |
Nov 29 | reconstruction by maximum likelihood | ||
Dec 1 | reconstruction by maximum likelihood | Project Week 5 deliverables due (Fri) | |
Dec 6 | RRR Week | ||
Dec 8 | RRR Week | Project due. Project demo night. (Fri) | |
Dec 14 | Final Exam | ||
3pm-6pm, location TBA |
Homework
Homework 1
You may work on this assignment in groups, but each student must hand in a solution that they themselves wrote up. Make sure you give the names of everyone you worked with and properly cite any sources you use (internet Q&A sites, books, etc).
Do not hand in a first draft of your solutions. Rewrite your first draft neatly, checking for errors in reasoning and language.
- Give the precise mathematical definitions of:
- a connected graph,
- adjacent vertices,
- a path in a graph,
- a cycle in a graph,
- a tree \(T\).
- For this problem, use two different colors of ink.
- Draw your favorite tree with 5 vertices. Pick your favorite two vertices in this tree and write down a path between the vertices. Draw the path on the tree in a second color of ink.
- Draw your favorite graph that has 5 vertices AND has no cycles AND is disconnected. Using your picture, prove that there is not a path between every pair of vertices.
- Draw your favorite graph that has 5 vertices AND has a cycle AND is connected. Using your picture, prove that there is not a unique path between every pair of vertices.
Prove the following theorem
Theorem. If \(v_0\) and \(v_1\) are two vertices in the tree \(T\), then there is a unique path from \(v_0\) to \(v_1\).
Note: Your solution to Problem #2 above shows why we must require \(T\) to be a tree: the theorem is simply not true if \(T\) is not a tree.
- How many possible codons are there? How many amino acids are found in organisms on Earth? Give an explanation that seems plausible to you as to why these numbers are different.
- Phylo Exercise 2.7.3
- Phylo Exercise 2.7.4
- Phylo Exercise 2.7.9
Homework 2
- Phylo Exercise 3.8.1
- Phylo Exercise 3.8.2
- Phylo Exercise 3.8.5
- Stats Section 3.6 Exercise 3.3
- Stats Section 3.6 Exercise 3.5
- Stats Section 2.6.1 Exercise 2.9
- Stats Section 2.6.2 Exercise 2.23
- Stats Section 2.6.2 Exercise 2.26
For the remaining exercises you will need to use the free software
packages numpy
, scipy
and matplotlib
. See here for instructions
on installing these on your own computer. If you do not have access to
a computer, please contact me.
Before starting on the following problems, work through some exercises in SL 1.3 and SL 1.4 on your own.
- Download this dataset containing GRE scores for 33,282 students. Using
numpy
andmatplotlib
, plot histograms of the GRE score datasets. - Write a function
plot_normal(mu, sigma)
that superimposes a normal curve with meanmu
and standard deviationsigma
over the top of your histogram. Hand in the source code to your function. - Using your
plot_normal()
function, plot a number of different normal curves. Try to match the histogram. Plot the normal curve best matching the histogram. What is the mean and standard deviation of this curve curve? Hand plots with your best fitting normal curves and the mean and standard deviation you deduced.
Course Textbooks
Required readings will be taken from the following sources. You are not required to personally purchase any of these books.
- Phylo The Mathematics of Phylogenetics by Allman and (John A.) Rhodes
- Stats OpenIntro Statistics
- SL Scipy Lectures
- Bio The Molecules of Life: Physical and Chemical Principles by Kuriyan, Konforti, and Wemmer
- Not free. This is the main textbook for MCB100 at Berkeley, which is a large course with many students, so there are many copies of this book on campus. Before you purchase a copy, you should:
- ask your fellow students how they obtained this book,
- use a reserve copy of the book in the library (UCB Chemistry Library Reserves Desk) for 2 hours at a time,
- borrow the book from a friend who has taken MCB100,
- purchase the five chapters we will cover from the publisher for $45,
- purchase or rent the full book (19 chapters - Used: $60+, New: $65-$109) Cal bookstore, Amazon.
- Not free. This is the main textbook for MCB100 at Berkeley, which is a large course with many students, so there are many copies of this book on campus. Before you purchase a copy, you should:
- Xtal Crystallography Made Crystal Clear by (Gale) Rhodes
- EM Electron Crystallography of Biological Macromolecules by Glaeser, Downing, DeRosier, Chiu, and Frank
- Not free. Please do not purchase at this time - I will update with more information once we start covering the topics in this book.
Course Format
In this course you will be required to read, do homework (including writing code, see writing code), take some quizzes, complete a big project, and take a final exam.
Readings | Each lecture will have assigned reading meant to be completed before the lecture. |
Homework | In the first part of the course, you will have several individual homework assignments on phylogenetic inference. |
Quizzes | There will be a handful of straightforward short unannounced quizzes on: your readings, past homework, mathematical definitions we've been using, etc. |
Project | In the second half of the semester you will work in groups of 3-4 on a substantial project that will make up a large portion of your grade. This project will involve simulating electron microscope images of a 3D protein, and then recovering the 3D structure from the simulated images. |
Final | There will be a final exam during our university-assigned exam slot. |
Your final grade will be computed as a weighted sum of your homework scores (30%), your quiz scores (10%), your project score (50%) and your final exam score (10%).
Project
During the second half of the semester you will complete a substantial group project on protein reconstruction, including simulating 2D images of a protein and then reconstructing a 3D model of the protein from these images.
Please carefully read the project page for detailed information on the project.
Writing code
This course will require everyone to write some computer code: you will write snippets of code on several homework assignments, and work in groups on a project that will involve a substantial amount of programming. Today math biology is inseparably linked with programming: some programming ability is basic literacy in this field. Writing code is necessary not only for making biological conclusions (e.g., analyzing a genome with 3 x 10^{9} genes to infer an evolutionary tree) but is also useful for forming and testing mathematical hypotheses (e.g., is the Fourier basis appropriate for the data I observed in this experiment?). Moreover, I believe that writing short snippets of code to play and experiment with the topics we cover is a fantastic way to truly master them.
In my experience most students have written code in another course and
will be able to manage this course's programming requirements. For
example, if you've ever written a Python
or MATLAB
function for a
course, you should be fine. If you've used R
to analyze a dataset,
you should be fine. If you've taken CS 61A, the first introductory
semester course in programming, you're very well prepared.
"I've never written any code!" -some student, probably
If you've never written any code, don't panic. I will hold three optional evening tutorials at the start of the semester on writing basic Python code that should provide a good foundation for what you'll need to know. You should plan to spend some extra time on the course at the start of the semester to get your feet wet. I will also provide extra resources on introductory programming and am willing to help you outside of class anytime during the semester if you need it.
Writing code is a basic skill today. My hope is that developing some programming literacy in this course will be very useful to you in the future.
Workload: 9 hours a week outside class
This is an ambitious course - in only 14 weeks we will cover a lot of interesting material. The University determines course requirements expecting that each unit will require you to work 3 hours a week, including class time. Math 127 is a 4 unit course, so you should expect to spend 9 hours a week outside class working on this course's readings, homework and projects.
My goal is to introduce you to two exciting topics at the interface of mathematics and biology - my goal is certainly not to ruin your life with a crushing work load. You should expect a challenging, intellectually engaging course, and you should be prepared to work up to 9 hours a week outside class, especially near project deadlines.
Important: If you find yourself consistently working more than 9 hours/week on this course, please contact me so we can fix it.
bCourses
Please visit this course's bCourses website to ask me questions, participate in class discussions and view your grades.
Prerequisites
If you don't satisfy some of the formal prerequisites for this course, please contact me to discuss your background and the possibility of enrolling.