Hydrophobicity Plot using BioPython

May 08, 2014

Hydrophobicity is the property of being water repellent, tending to repel and not absorb water. Calculation of hydrophobicity in proteins is important in identifying its various features. This can be membrane spanning regions, antigenic sites, exposed loops or buried residues. Usually, these calculations are shown as a plot along the protein sequence, making it easy to identify the location of potential protein features. The hydrophobicity is calculated by sliding a fixed size window (of an odd number) over the protein sequence. At the central position of the window, the average hydrophobicity of the entire windows is plotted.

A hydrophobicity plot is a quantitative analysis of the degree of hydrophobicity of amino acids of a protein. It is used to characterize or identify possible structure or domains of a protein. The plot has amino acid sequence of a protein on its x-axis, and degree of hydrophobicity on its y-axis.

In hydrophobicity plot, the degree of hydrophobicity is taken from the hydrophobicity scale. There are several hydrophobicity scales have been published for various uses. Many of the commonly used hydrophobicity scales are: Kyte-Doolittle scale, Engelman scale (GES scale), Eisenberg scale, Hopp-Woods scale, Cornette scale, Rose scale, and Janin scale. Many more scales have been published in the literature throughout the last three decades

AA	Amino Acid	Kyte-Doolittle	Hopp-Woods	Cornette	Eisenberg	Rose	Janin	Engelman (GES)
A	Alanine	1.80	-0.50	0.20	0.62	0.74	0.30	1.60
C	Cysteine	2.50	-1.00	4.10	0.29	0.91	0.90	2.00
D	Aspartic acid	-3.50	3.00	-3.10	-0.90	0.62	-0.60	-9.20
E	Glutamic acid	-3.50	3.00	-1.80	-0.74	0.62	-0.70	-8.20
F	Phenylalanine	2.80	-2.50	4.40	1.19	0.88	0.50	3.70
G	Glycine	-0.40	0.00	0.00	0.48	0.72	0.30	1.00
H	Histidine	-3.20	-0.50	0.50	-0.40	0.78	-0.10	-3.00
I	Isoleucine	4.50	-1.80	4.80	1.38	0.88	0.70	3.10
K	Lysine	-3.90	3.00	-3.10	-1.50	0.52	-1.80	-8.80
L	Leucine	3.80	-1.80	5.70	1.06	0.85	0.50	2.80
M	Methionine	1.90	-1.30	4.20	0.64	0.85	0.40	3.40
N	Asparagine	-3.50	0.20	-0.50	-0.78	0.63	-0.50	-4.80
P	Proline	-1.60	0.00	-2.20	0.12	0.64	-0.30	-0.20
Q	Glutamine	-3.50	0.20	-2.80	-0.85	0.62	-0.70	-4.10
R	Arginine	-4.50	3.00	1.40	-2.53	0.64	-1.40	-12.3
S	Serine	-0.80	0.30	-0.50	-0.18	0.66	-0.10	0.60
T	Threonine	-0.70	-0.40	-1.90	-0.05	0.70	-0.20	1.20
V	Valine	4.20	-1.50	4.70	1.08	0.86	0.60	2.60
W	Tryptophan	-0.90	-3.40	1.00	0.81	0.85	0.30	1.90
Y	Tyrosine	-1.30	-2.30	3.20	0.26	0.76	-0.40	-0.70

The Kyte-Doolittle scale is widely used for detecting hydrophobic regions in proteins. Regions with a positive value are hydrophobic. This scale can be used for identifying both surface-exposed regions as well as transmembrane regions, depending on the used window size. Short window sizes of 5-7 generally works well for predicting putative surface-exposed regions. Large window sizes of 19-21 is well suited for finding transmembrane domains if the values calculated are above 1.6.

Program Implementation

In this tutorial, I have used Python 3.4 software, and BioPython 1.63, MatPlotLib, PyParsing 2.0.1, Python-DateUtil 2.2, PyTZ 2014.1, Six 1.6.1, NumPy-MKL 1.8 modules implemented under Windows 8.1 Enterprise operating system. The 7 modules are chosen based on the compatibility of Python and OS. The input given in the program is a protein sequence in fasta format.

Program

from pylab import *
from Bio import SeqIO
fh = open("E:\\BioPython\\Q9UKY0.fasta")
for record in SeqIO.parse(fh, "fasta"):
 id = record.id
 seq = record.seq
 num_residues = len(seq)
fh.close()
kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5,
       'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5,
       'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6,
       'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 }
values = []
for residue in seq:
 values.append(kd[residue])
x_data = range(1, num_residues+1)
plot(x_data, values, linewidth=1.0)
axis(xmin = 1, xmax = num_residues)
xlabel("Residue Number")
ylabel("Hydrophobicity")
title("K&D Hydrophobicity for " + id)
show()

Query

>sp|Q9UKY0|PRND_HUMAN Prion-like protein doppel OS=Homo sapiens
MRKHLSWWWLATVCMLLFSHLSAVQTRGIKHRIKWNRKALPSTAQITEAQVAENRPGAFI
KQGRKLDIDFGAEGNRYYEANYWQFPDGIHYNGCSEANVTKEAFVTGCINATQAANQGEF
QKPDNKLHQQVLWRLVQELCSLKHCEFWLERGAGLRVTMHQPVLLCLLALIWLTVK

Comments

UnknownSeptember 12, 2017 at 6:18 PM
Thank you for your posts. They are really helpful.

I have a question: to calculate hydrophobicity of the entire TM helix, do we have to take mean value of all residues?
ReplyDelete
Replies
UnknownSeptember 19, 2017 at 3:23 PM
Thank you very much.
ReplyDelete
Replies
UnknownMarch 21, 2019 at 11:49 PM
Hello Ashok. I will like to calculate hydrophobicity on aligned sequences. I have aligned sequences in FASTA format. There are gaps in aligned sequences. I need output as values for every sequence which i can then avaerage. Can it be done ?
Thanks.
ReplyDelete
Replies
UnknownMarch 21, 2019 at 11:49 PM
My name is Rohit Jain
ReplyDelete
Replies
UnknownMarch 21, 2019 at 11:49 PM
Looking forward to your reply.
ReplyDelete
Replies

Add comment

Search This Blog

BioGem Blog