Sunday, October 1, 2017

Frequency Plot of Protein Sequence using PHP and R

Frequenc Plot

About Protein Frequency Plot

A frequency plot is a graphical data analysis technique for summarizing the distributional information of a variable. The response variable is divided into equal sized intervals (or bins). The number of occurrences of the response variable is calculated for each bin. In this tutorial, the number of occurrences of each amino acids in the protein sequence (response variable) is calculated and sorted in ascending order.

The frequency plot then consists of:

Vertical Axis = Amino acids
Horizontal Axis = Frequencies of the amino acids

There are 4 types of frequency plots:

  1. Frequency plot (absolute counts);
  2. Relative frequency plot (convert counts to proportions);
  3. Cumulative frequency plot;
  4. Cumulative relative frequency plot.

The frequency plot and the histogram have the same information except the frequency plot has lines connecting the frequency values, whereas the histogram has bars at the frequency values.

Frequency plot using PHP and R

In this tutorial, the programming language R, PHP, and BioConductor packages SeqinR & Biostrings are used to generate a frequency plot from the protein sequence. SeqinR is used to read or manipulate sequences, and Biostrings is used to convert sequence to array. The PHP language is used to execute Rscript at background using exec() function of PHP and the image generated through R (Rscript) is retrieved and displayed through the IMG HTML tag. The execution process acts similar to PHP/CGI or Perl/CGI. For generating a frequency plot, we need a protein sequence in .fasta|.fas file format as input. A simple protocol for generating a frequency plot is given below:

Step 1: Download and install R software according to your system platform.

Step 2: Download SeqinR and Biostrings module from CRAN and install. The brief explanations for Step (1) & (2) can be downloaded from

Step 3: Create an R script as given bellow using an ASCII editor (Eg. Notepad) and save it with .R file extension. Here args function is used to get path of the FASTA file formatted protein sequence through command line.

R Source Code (Freq.R)

args <- commandArgs(TRUE) fas_file <- args[1] library("seqinr") library("Biostrings") seqfile <- read.fasta(file = fas_file) fastaseq <- seqfile[[1]] seqstring <- c2s(fastaseq) seqstring <- toupper(seqstring) seqchar <- s2c(seqstring) tab <- table(seqchar) taborder <- tab[order(tab)] names(taborder) <- aaa(names(taborder)) png(filename="freq.png", width=500, height=500) dotchart(taborder, pch=19, main="Frequency of Amino Acids", xlab="Frequency", ylab="Amino Acid")

Step 4: Create an PHP program to get protein sequence and execute the Rscript in commandline using exec() function. Example, exec("\"C:\Program Files\R\R-3.3.1\bin\Rscript.exe\" Freq.R $input");

FreqPlot PHP/R Input

PHP Source Code (Freq.PHP)

<!DOCTYPE html"> <html xmlns=""> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Frequency Plot using PHP and R</title> <style type="text/css"> html { box-sizing: border-box; -webkit-text-size-adjust: 100%; -webkit-font-smoothing: antialiased; } html, body { height: 100%; margin: 0px; padding: 0px; } .form{ position: absolute; top: 50%; left: 50%; width: 510px; height: 320px; margin-top: -160px; /* Half the height */ margin-left: -255px; /* Half the width */ vertical-align: middle; border: 1px solid blue; box-shadow: 0px 0px 3px #ccc, 0 10px 15px #eee inset; border-radius: 2px; font-family: "Times New Roman", Georgia, Serif; } .output{ position: absolute; top: 50%; left: 50%; width: 512px; height: 584px; margin-top: -292px; /* Half the height */ margin-left: -256px; /* Half the width */ vertical-align: middle; border: 1px solid blue; font-size: 12px; box-shadow: 0px 0px 3px #ccc, 0 10px 15px #eee inset; border-radius: 2px; } .space { padding: 10px; } .effect { border: 1px solid #aaa; box-shadow: 0px 0px 3px #ccc, 0 10px 15px #eee inset; border-radius: 2px; } .heffect { border: 0; height: 1px; background: blue; } a { text-decoration: none; } </style> </head> <body> <?php if(isset($_POST['submit'])) { function img2data($image) { $type = pathinfo($image, PATHINFO_EXTENSION); $data = file_get_contents($image); return 'data:image/' . $type . ';base64,' . base64_encode($data); } $input = pathinfo($_FILES['seqfile']['tmp_name'], PATHINFO_FILENAME) . ".fasta"; move_uploaded_file($_FILES['seqfile']['tmp_name'], $input); exec("\"C:\Program Files\R\R-3.3.1\bin\Rscript.exe\" Freq.R $input"); $s = img2data("freq.png"); unlink("freq.png"); unlink($input); ?> <div class='output'> <h2 style="text-align: center;">Frequency Plot using PHP and R</h2> <hr class='heffect' /> <div class='space'><?php print "<img src='" . $s . "' />"; ?></div> </div> <?php } else { ?> <div class="form"> <h3 style="text-align: center;">Frequency Plot using PHP and R</h3> <hr class="heffect" /> <form action="" method="post" name="freqform" enctype= "multipart/form-data"> <p style="padding: 0 10px 0 10px;"> Upload protein sequence (<i>fasta file</i>) <input class="effect" type="file" name="seqfile" /> </p> <p style="padding: 0 10px 0 10px;"> <input class="effect" type="reset" name="reset" value="Reset" /> <input class="effect" type="submit" name="submit" value="Generate Plot" /> </p> </form> </div> <?php } ?> </body> </html>

Program Output

FreqPlot PHP/R Output


Post a Comment