# Hydrophobicity Plot using BioPython

Hydrophobicity is the property of being water repellent, tending to repel and not absorb water. Calculation of hydrophobicity in proteins is important in identifying its various features. This can be membrane spanning regions, antigenic sites, exposed loops or buried residues. Usually, these calculations are shown as a plot along the protein sequence, making it easy to identify the location of potential protein features. The hydrophobicity is calculated by sliding a fixed size window (of an odd number) over the protein sequence. At the central position of the window, the average hydrophobicity of the entire windows is plotted.

A hydrophobicity plot is a quantitative analysis of the degree of hydrophobicity of amino acids of a protein. It is used to characterize or identify possible structure or domains of a protein. The plot has amino acid sequence of a protein on its x-axis, and degree of hydrophobicity on its y-axis.

In hydrophobicity plot, the degree of hydrophobicity is taken from the hydrophobicity scale. There are several hydrophobicity scales have been published for various uses. Many of the commonly used hydrophobicity scales are: Kyte-Doolittle scale, Engelman scale (GES scale), Eisenberg scale, Hopp-Woods scale, Cornette scale, Rose scale, and Janin scale. Many more scales have been published in the literature throughout the last three decades

 AA Amino Acid Kyte-Doolittle Hopp-Woods Cornette Eisenberg Rose Janin Engelman(GES) A Alanine 1.80 -0.50 0.20 0.62 0.74 0.30 1.60 C Cysteine 2.50 -1.00 4.10 0.29 0.91 0.90 2.00 D Aspartic acid -3.50 3.00 -3.10 -0.90 0.62 -0.60 -9.20 E Glutamic acid -3.50 3.00 -1.80 -0.74 0.62 -0.70 -8.20 F Phenylalanine 2.80 -2.50 4.40 1.19 0.88 0.50 3.70 G Glycine -0.40 0.00 0.00 0.48 0.72 0.30 1.00 H Histidine -3.20 -0.50 0.50 -0.40 0.78 -0.10 -3.00 I Isoleucine 4.50 -1.80 4.80 1.38 0.88 0.70 3.10 K Lysine -3.90 3.00 -3.10 -1.50 0.52 -1.80 -8.80 L Leucine 3.80 -1.80 5.70 1.06 0.85 0.50 2.80 M Methionine 1.90 -1.30 4.20 0.64 0.85 0.40 3.40 N Asparagine -3.50 0.20 -0.50 -0.78 0.63 -0.50 -4.80 P Proline -1.60 0.00 -2.20 0.12 0.64 -0.30 -0.20 Q Glutamine -3.50 0.20 -2.80 -0.85 0.62 -0.70 -4.10 R Arginine -4.50 3.00 1.40 -2.53 0.64 -1.40 -12.3 S Serine -0.80 0.30 -0.50 -0.18 0.66 -0.10 0.60 T Threonine -0.70 -0.40 -1.90 -0.05 0.70 -0.20 1.20 V Valine 4.20 -1.50 4.70 1.08 0.86 0.60 2.60 W Tryptophan -0.90 -3.40 1.00 0.81 0.85 0.30 1.90 Y Tyrosine -1.30 -2.30 3.20 0.26 0.76 -0.40 -0.70

The Kyte-Doolittle scale is widely used for detecting hydrophobic regions in proteins. Regions with a positive value are hydrophobic. This scale can be used for identifying both surface-exposed regions as well as transmembrane regions, depending on the used window size. Short window sizes of 5-7 generally works well for predicting putative surface-exposed regions. Large window sizes of 19-21 is well suited for finding transmembrane domains if the values calculated are above 1.6.

Program Implementation
In this tutorial, I have used Python 3.4 software, and BioPython 1.63, MatPlotLib, PyParsing 2.0.1, Python-DateUtil 2.2, PyTZ 2014.1, Six 1.6.1, NumPy-MKL 1.8 modules implemented under Windows 8.1 Enterprise operating system. The 7 modules are chosen based on the compatibility of Python and OS. The input given in the program is a protein sequence in fasta format.

Program
from pylab import *
from Bio import SeqIO
fh = open("E:\\BioPython\\Q9UKY0.fasta")
for record in SeqIO.parse(fh, "fasta"):
id = record.id
seq = record.seq
num_residues = len(seq)
fh.close()
kd = { 'A': 1.8,'R':-4.5,'N':-3.5,'D':-3.5,'C': 2.5,
'Q':-3.5,'E':-3.5,'G':-0.4,'H':-3.2,'I': 4.5,
'L': 3.8,'K':-3.9,'M': 1.9,'F': 2.8,'P':-1.6,
'S':-0.8,'T':-0.7,'W':-0.9,'Y':-1.3,'V': 4.2 }
values = []
for residue in seq:
values.append(kd[residue])
x_data = range(1, num_residues+1)
plot(x_data, values, linewidth=1.0)
axis(xmin = 1, xmax = num_residues)
xlabel("Residue Number")
ylabel("Hydrophobicity")
title("K&D Hydrophobicity for " + id)
show()

Input: Q9UKY0.fasta
>sp|Q9UKY0|PRND_HUMAN Prion-like protein doppel OS=Homo sapiens GN=PRND PE=1 SV=2
MRKHLSWWWLATVCMLLFSHLSAVQTRGIKHRIKWNRKALPSTAQITEAQVAENRPGAFI
KQGRKLDIDFGAEGNRYYEANYWQFPDGIHYNGCSEANVTKEAFVTGCINATQAANQGEF
QKPDNKLHQQVLWRLVQELCSLKHCEFWLERGAGLRVTMHQPVLLCLLALIWLTVK

Output