Constructing Phylogenetic Tree using UPGMA Method

UPGMA

UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a distance-based method for constructing phylogenetic trees. It works by iteratively clustering the two closest groups of sequences together, forming a new cluster until all sequences are grouped into a single tree. The distances between clusters are calculated using the average of all pairwise distances between sequences within those clusters. UPGMA produces rooted trees, meaning it has a defined root representing the common ancestor.

Here's a more detailed explanation:

1. Distance Matrix

  • UPGMA begins with a distance matrix, which contains the pairwise distances between all sequences being compared. These distances can be based on sequence alignment, protein structure comparisons, or other relevant metrics.

\[D_{i,j}=\max\begin{cases}D_{i-1,j-1} & + & s(a_i,b_j) \\D_{i-1,j} & + & s(a_i,-) \\D_{i,j-1} & + & s(-,b_j)\end{cases}=\max\begin{cases}D_{i-1,j-1}& + &\phantom{-}1&a_i = b_j\\D_{i-1,j-1}& + &\phantom{-}0&a_i \neq b_j\\D_{i-1,j}& + &-1&b_j = -\\D_{i,j-1}& + &-1&a_i = -\end{cases}\]

2. Iterative Clustering

  • The algorithm identifies the two sequences or clusters with the smallest pairwise distance.
  • These two are combined to form a new cluster, which is represented as a node in the tree.
  • The distance between this new cluster and all other sequences or clusters is recalculated, typically using the average of all pairwise distances between sequences within the new cluster and the other sequences or clusters.
  • This process is repeated until all sequences are grouped into a single tree.

\[\delta_m(k, ij) = \frac{D_{k,i} \cdot |i| + D_{k,j} \cdot |j|}{|i| + |j|} = \frac{1}{|k| |ij|} \sum_{s \in k} \sum_{t \in ij} D_{s,t}\]

3. Rooted Trees

  • UPGMA produces rooted trees, meaning they have a specific root node representing the most recent common ancestor of all the sequences.
  • The branch lengths in a UPGMA tree reflect the evolutionary distances between the sequences, assuming a constant rate of evolution.

\[\delta_t(i, ij) = \delta_t(j, ij) = \frac{D_{min}}{2}\]

4. Assumptions

  • UPGMA assumes a constant rate of evolution across all sequences, also known as the molecular clock hypothesis.
  • This assumption may not always hold true in real-world datasets, especially when dealing with different types of sequences or species with varying evolutionary rates.

In summary: UPGMA is a simple and computationally efficient algorithm for constructing phylogenetic trees from distance matrices. It produces rooted trees, but its assumption of a constant rate of evolution can be a limitation when dealing with data that violates this assumption.

This a short video tutorial to construct a phylogenetic tree from 5 nucleotide sequences using UPGMA method.

Comments

Most Popular Posts

TNEB Bill Calculator

TNEB Bill Calculator (New)

Technical Questions