Hydrophobicity Plot using BioPython

Hydrophobicity Plot

Hydrophobicity is the property of being water repellent, tending to repel and not absorb water. Calculation of hydrophobicity in proteins is important in identifying its various features. This can be membrane spanning regions, antigenic sites, exposed loops or buried residues. Usually, these calculations are shown as a plot along the protein sequence, making it easy to identify the location of potential protein features. The hydrophobicity is calculated by sliding a fixed size window (of an odd number) over the protein sequence. At the central position of the window, the average hydrophobicity of the entire windows is plotted.

A hydrophobicity plot is a quantitative analysis of the degree of hydrophobicity of amino acids of a protein. It is used to characterize or identify possible structure or domains of a protein. The plot has amino acid sequence of a protein on its x-axis, and degree of hydrophobicity on its y-axis.

In hydrophobicity plot, the degree of hydrophobicity is taken from the hydrophobicity scale. There are several hydrophobicity scales have been published for various uses. Many of the commonly used hydrophobicity scales are: Kyte-Doolittle scale, Engelman scale (GES scale), Eisenberg scale, Hopp-Woods scale, Cornette scale, Rose scale, and Janin scale. Many more scales have been published in the literature throughout the last three decades

AAAmino AcidKyte-DoolittleHopp-WoodsCornetteEisenbergRoseJaninEngelman (GES)
AAlanine1.80-0.500.200.620.740.301.60
CCysteine2.50-1.004.100.290.910.902.00
DAspartic acid-3.503.00-3.10-0.900.62-0.60-9.20
EGlutamic acid-3.503.00-1.80-0.740.62-0.70-8.20
FPhenylalanine2.80-2.504.401.190.880.503.70
GGlycine-0.400.000.000.480.720.301.00
HHistidine-3.20-0.500.50-0.400.78-0.10-3.00
IIsoleucine4.50-1.804.801.380.880.703.10
KLysine-3.903.00-3.10-1.500.52-1.80-8.80
LLeucine3.80-1.805.701.060.850.502.80
MMethionine1.90-1.304.200.640.850.403.40
NAsparagine-3.500.20-0.50-0.780.63-0.50-4.80
PProline-1.600.00-2.200.120.64-0.30-0.20
QGlutamine-3.500.20-2.80-0.850.62-0.70-4.10
RArginine-4.503.001.40-2.530.64-1.40-12.3
SSerine-0.800.30-0.50-0.180.66-0.100.60
TThreonine-0.70-0.40-1.90-0.050.70-0.201.20
VValine4.20-1.504.701.080.860.602.60
WTryptophan-0.90-3.401.000.810.850.301.90
YTyrosine-1.30-2.303.200.260.76-0.40-0.70

The Kyte-Doolittle scale is widely used for detecting hydrophobic regions in proteins. Regions with a positive value are hydrophobic. This scale can be used for identifying both surface-exposed regions as well as transmembrane regions, depending on the used window size. Short window sizes of 5-7 generally works well for predicting putative surface-exposed regions. Large window sizes of 19-21 is well suited for finding transmembrane domains if the values calculated are above 1.6.

Program Implementation

In this tutorial, I have used Python 3.4 software, and BioPython 1.63, MatPlotLib, PyParsing 2.0.1, Python-DateUtil 2.2, PyTZ 2014.1, Six 1.6.1, NumPy-MKL 1.8 modules implemented under Windows 8.1 Enterprise operating system. The 7 modules are chosen based on the compatibility of Python and OS. The input given in the program is a protein sequence in fasta format.

Program

from pylab import *
from Bio import SeqIO
fh = open("E:\\BioPython\\Q9UKY0.fasta")
for record in SeqIO.parse(fh, "fasta"):
 id = record.id
 seq = record.seq
 num_residues = len(seq)
fh.close()
kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5,
       'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5,
       'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6,
       'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 }
values = []
for residue in seq:
 values.append(kd[residue])
x_data = range(1, num_residues+1)
plot(x_data, values, linewidth=1.0)
axis(xmin = 1, xmax = num_residues)
xlabel("Residue Number")
ylabel("Hydrophobicity")
title("K&D Hydrophobicity for " + id)
show()

Query

>sp|Q9UKY0|PRND_HUMAN Prion-like protein doppel OS=Homo sapiens
MRKHLSWWWLATVCMLLFSHLSAVQTRGIKHRIKWNRKALPSTAQITEAQVAENRPGAFI
KQGRKLDIDFGAEGNRYYEANYWQFPDGIHYNGCSEANVTKEAFVTGCINATQAANQGEF
QKPDNKLHQQVLWRLVQELCSLKHCEFWLERGAGLRVTMHQPVLLCLLALIWLTVK

Comments

  1. Thank you for your posts. They are really helpful.

    I have a question: to calculate hydrophobicity of the entire TM helix, do we have to take mean value of all residues?

    ReplyDelete
    Replies
    1. { 'Ala': 1.290, 'Arg': 0.960, 'Asn': 0.900, 'Asp': 1.040, 'Cys': 1.110, 'Gln': 1.270, 'Glu': 1.440, 'Gly': 0.560, 'His': 1.220, 'Ile': 0.970, 'Leu': 1.300, 'Lys': 1.230, 'Met': 1.470, 'Phe': 1.070, 'Pro': 0.520, 'Ser': 0.820, 'Thr': 0.820, 'Trp': 0.990, 'Tyr': 0.720, 'Val': 0.910 }

      Reference: http://web.expasy.org/protscale/pscale/alpha-helixLevitt.html

      Delete
  2. Hello Ashok. I will like to calculate hydrophobicity on aligned sequences. I have aligned sequences in FASTA format. There are gaps in aligned sequences. I need output as values for every sequence which i can then avaerage. Can it be done ?
    Thanks.

    ReplyDelete
    Replies
    1. Dear Rohit Jain,
      This program is to generate hydrophobicity plot using dataset (Hydrophic values of 20 amino acids. Kyte-Doolittle scale values are used in this program) from the amino acid sequence. The basic concept is plotting 2D graph through x and y coordinate numbers.
      Generating a plot using the alignment gap is meaningless. It will break the graph plot.

      Delete

Post a Comment

Most Popular Posts

TNEB Bill Calculator

TNEB Bill Calculator (New)

Technical Questions