Hydrophobicity plot using BioPython on Google Colab Notebook

Hydrophobicity plot

A hydrophobicity or lipophilicity plot is a 2D graphical display of the hydrophobic regions in the protein. The Kyte-Doolittle scale was widely used for identifying surface-exposed regions and transmembrane regions. The plot has the amino acid sequence of a protein on its x-axis and a degree of hydrophobicity on its y-axis. The graph regions with a positive value are hydrophobic. There are several hydrophobicity scales have been published for various uses. The commonly used hydrophobicity scales are Kyte-Doolittle scale, Engelman scale (GES scale), Eisenberg scale, Hopp-Woods scale, Cornette scale, Rose scale, and Janin scale. For further details, refer my previous article https://www.biob.in/2014/05/hydrophobicity-plot-using-biopython.html

Program Implementation

In this tutorial, I have used Google Colab Notebook for running the Python program. Click here for interactive demo.

Python Program

try:
    import google.colab
    !pip install biopython
except ImportError:
    pass

import os
import sys
from urllib.request import urlretrieve
import matplotlib.pyplot as plt
from Bio import SeqIO

seq = 'Q9UKY0.fasta'
url = ('https://raw.githubusercontent.com/AshokHub/'
        'Datasets/main/Q9UKY0.fasta')
if not os.path.exists(seq):
    urlretrieve(url, seq)
for record in SeqIO.parse(seq, 'fasta'):
    id = record.id
    seq = record.seq
    n = len(seq)
kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5,
       'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5,
       'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6,
       'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 }
x = range(1, n+1)
y = []
for res in seq:
    y.append(kd[res])
fig = plt.figure()
ax = fig.add_axes([0, 0, 1, 1])
ax.plot(x, y, 'b', lw = 0.9)
ax.set_xlim(1, n)
ax.set(xlabel = 'Residue Number',
        ylabel = 'Hydrophobicity',
        title = 'Hydrophobicity plot for ' + id)
ax.grid(color = 'gray', linestyle = '-', linewidth = 0.5)
plt.show()

Query Sequence

>sp|Q9UKY0|PRND_HUMAN Prion-like protein doppel OS=Homo sapiens GN=PRND PE=1 SV=2
MRKHLSWWWLATVCMLLFSHLSAVQTRGIKHRIKWNRKALPSTAQITEAQVAENRPGAFI
KQGRKLDIDFGAEGNRYYEANYWQFPDGIHYNGCSEANVTKEAFVTGCINATQAANQGEF
QKPDNKLHQQVLWRLVQELCSLKHCEFWLERGAGLRVTMHQPVLLCLLALIWLTVK

The query sequence (Q9UKY0.fasta) used in this program was hosted at GitHub repository.

Comments

Most Popular Posts

TNEB Bill Calculator

TNEB Bill Calculator (New)

Technical Questions