Algebraic Statistics for Computational Biology

Math 275: Topics in Applied Mathematics

Tuesdays and Thursdays, 2-3:30pm in 7 Evans


A graphical model is a family of joint probability distributions for a collection of random variables that factors according to a graph. Graphical models have proved to be extremely useful for problems in computational biology, because they provide useful and versatile probabilistic frameworks for a wide range of problems, and at the same time are suitably structured for efficient inference. For example, in biological sequence analysis, specialized directed graphical models with discrete random variables are used for applications ranging from annotation and alignment to phylogeny reconstruction.

Discrete graphical models are instances of statistical models that can be characterized by polynomials in the joint probabilities. The emerging and active field of algebraic statistics offers algorithms for this polynomial representation, and is a fertile area for the application of ideas from commutative algebra and algebraic geometry.

We will focus on the rich interaction between the theory of algebraic statistics, and the motivating application of computational biology. Several recent papers have demonstrated that algebraic statistics can be applied to developing practical algorithms for biological applications, and conversely that computational biology questions motivate interesting research directions in the theory of algebraic statistics. After a brief primer in algebra and biology, we will survey some of this current literature. Students will be encouraged to select topics for study and to participate in class discussions.

Prerequisites: The class is suitable for graduate students who have a background in discrete applied mathematics, preferrably with experience in algebra and/or combinatorics. Familiarity with basic biology will be helpful, but is neither necessary nor sufficient for taking the course.


TopicDateLecturer TitleHomeworkNotes and Links
What is the mathematics of phylogenomics?August 31st Lior Pachter Introduction to the mathematics of phylogenomics HW #1The Mathematics of Phylogenomics
September 2nd Lior Pachter
Bernd Sturmfels
Introduction to biology
Algebra basics
On-Line Biology Book
NCBI home page
Gröbner bases 1
Gröbner bases 2
Hidden Markov models and gene findingSeptember 7th Lior Pachter Hidden Markov modelsHW #2
MATLAB example
Region for annotation
Likelihood function for a binary model of length three
Regions of the explanations
September 9th Lior Pachter Gene finding
Tropical geometry and parametric inferenceSeptember 14th Bernd Sturmfels Introduction to tropical geometryHandout: page 1 page 2Tropical Mathematics
September 16thLior PachterPair HMMs and sequence alignment
Maximum likelihood estimationSeptember 21st Dan Levy Introduction to Maximum likelihood
September 21st -- 3:45PMSerkan HostenThe Maximum Likelihood DegreeThe Maximum Likelihood Degree
Bernd SturmfelsSolving The likelihood EquationsSolving the Likelihood Equations
September 23rdMathias Drton
Luis Garcia
Binary bi-directed four chain
Sequence alignmentSeptember 28thLior Pachter
Colin Dewey
Parametric inferenceParametric Inference for Biological Sequence Analysis
Tropical Geometry of Statistical Models
September 30th Lior PachterProject assignments
September 30th -- 4:00PM Leroy Hood
Phylogenetic treesOctober 5th Lior Pachter The four point conditionA Note on the Metric Properties of Trees Reconstructing Trees from Subtree Weights
October 7th Lior PachterCharacterizations of treesGeometry of the Space of Phylogenetic Trees
The Tropical Grassmanian
Evolutionary modelsOctober 12th Lior Pachter Markov models on trees
October 14th Seth Sullivant Phylogenetic invariants for trees and networks Phylogenetic Algebraic Geometry
Toric Ideals of Phylogenetic Invariants
Reconstructing trees and networks I October 19th Sagi Snir Convex recolorings of trees
October 21st -- 12:00PMMichael HendyTandem duplication trees
October 21stDavid Bryant Cyclic splits and network reconstructionNeighbor-Net: An Agglomerative Method for the Construction of Phylogenetic Networks
Reconstructing trees and networks IIOctober 26th Group meetings for preliminary proposals
October 28thNicholas ErikssonConstructing trees using singular value decomposition
Bay Area Discrete Math DayOctober 30th
RNA metrics and alignmentNovember 2ndIan Holmes Simultaneous alignment and phylogeny
November 4thLior PachterRNA metrics
November 4th -- 4:10PMPhilip Hanlon
Group presentationNovember 9thParametric inference with few parameters
Veterans day holidayNovember 11th
Group presentationNovember 16thHMM: Algebraic tools
Algebraic statistics and other biologyNovember 18th Niko BeerenwinkelComputational Analysis of HIV Drug Resistance DataRECOMB paper
Group presentationNovember 23rd HMM: Numerical tools
Thanksgiving holidayNovember 25th
November 30th Raazesh SainudiinRigorous numerical statistics via enclosuresTalk abstract
Group presentationDecember 2nd Small trees Small trees website
Group presentationDecember 7th What happened to the data?
Conclusion December 9th Lior Pachter
Bernd Sturmfels
Reports due in class

Suggested Texts

R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis, Cambridge University Press (1998)- Application of graphical models to problems in biological sequence analysis.
G. Pistone, E. Riccomagno, H.P. Wynn, Algebraic Statistics: Computational Commutative Algebra in Statistics, CRC Press (2000)- Contingency tables and other applications of algebraic statistics
D. Cox, D. O'Shea, J.B. Little, Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, Springer Verlag (1996)- Excellent introduction to concrete algebra.
C. Semple, M. Steel, Phylogenetics, Oxford University Press (2004) Mathematical foundations of phylogenetics..


Open problems from the AIM Workshop on Computational Algebraic Statistics (held December 14 to 18, 2003).
Abstracts from a mini-workshop on algebraic statistics (held in Berkeley 14-16 January, 2003).
Seminar on the mathematics of phylogenetics trees hosted by Lior Pachter and Bernd Sturmfels, Fall 2003.
Template for reports.

Related Seminars and Classes this Semester

Stochastic Processes
Probability on Trees and Networks
Combinatorial Commutative Algebra

Maintained by Lior Pachter.