RNA to Protein Translation in PERL

In PERL programming, an RNA sequence can be translated to a protein sequence by substituting equivalent amino acid characters to triplet characters of RNA. This method has followed to find six reading frames (three in the forward direction, and three in the reverse direction). In this program, I have used the associative array (also known as a hash array) to associate triplet characters with amino acid characters.

The associate array corresponding to codon table is arranged to 20 amino acid character. The triplet codon table is shown below:

Codon Table

Source Code:

print "Enter the RNA sequence: ";
$rna = <>;
chomp($rna);
$rna =~s/[^acgu]//ig;
my $rna = uc($rna);
my(%genetic_code) = (
  'UCA' => 'S', # Serine
  'UCC' => 'S', # Serine
  'UCG' => 'S', # Serine
  'UCU' => 'S', # Serine
  'UUC' => 'F', # Phenylalanine
  'UUU' => 'F', # Phenylalanine
  'UUA' => 'L', # Leucine
  'UUG' => 'L', # Leucine
  'UAC' => 'Y', # Tyrosine
  'UAU' => 'Y', # Tyrosine
  'UAA' => '_', # Stop
  'UAG' => '_', # Stop
  'UGC' => 'C', # Cysteine
  'UGU' => 'C', # Cysteine
  'UGA' => '_', # Stop
  'UGG' => 'W', # Tryptophan
  'CUA' => 'L', # Leucine
  'CUC' => 'L', # Leucine
  'CUG' => 'L', # Leucine
  'CUU' => 'L', # Leucine
  'CCA' => 'P', # Proline
  'CAU' => 'H', # Histidine
  'CAA' => 'Q', # Glutamine
  'CAG' => 'Q', # Glutamine
  'CGA' => 'R', # Arginine
  'CGC' => 'R', # Arginine
  'CGG' => 'R', # Arginine
  'CGU' => 'R', # Arginine
  'AUA' => 'I', # Isoleucine
  'AUC' => 'I', # Isoleucine
  'AUU' => 'I', # Isoleucine
  'AUG' => 'M', # Methionine
  'ACA' => 'T', # Threonine
  'ACC' => 'T', # Threonine
  'ACG' => 'T', # Threonine
  'ACU' => 'T', # Threonine
  'AAC' => 'N', # Asparagine
  'AAU' => 'N', # Asparagine
  'AAA' => 'K', # Lysine
  'AAG' => 'K', # Lysine
  'AGC' => 'S', # Serine
  'AGU' => 'S', # Serine
  'AGA' => 'R', # Arginine
  'AGG' => 'R', # Arginine
  'CCC' => 'P', # Proline
  'CCG' => 'P', # Proline
  'CCU' => 'P', # Proline
  'CAC' => 'H', # Histidine
  'GUA' => 'V', # Valine
  'GUC' => 'V', # Valine
  'GUG' => 'V', # Valine
  'GUU' => 'V', # Valine
  'GCA' => 'A', # Alanine
  'GCC' => 'A', # Alanine
  'GCG' => 'A', # Alanine
  'GCU' => 'A', # Alanine
  'GAC' => 'D', # Aspartic Acid
  'GAU' => 'D', # Aspartic Acid
  'GAA' => 'E', # Glutamic Acid
  'GAG' => 'E', # Glutamic Acid
  'GGA' => 'G', # Glycine
  'GGC' => 'G', # Glycine
  'GGG' => 'G', # Glycine
  'GGU' => 'G'  # Glycine
);
my ($protein) = "";
for(my $i=0;$i<length($rna)-2;$i+=3)
{
  $codon = substr($rna,$i,3);
  $protein .= $genetic_code{$codon};
}
print "Translated protein sequence is $protein";
<>;

This program can also used for six reading frame, by changing the three character shift in forward and reverse of the RNA sequence.

Comments

  1. (.=) what is the function of this operator? can you please explain

    ReplyDelete
    Replies
    1. It is the combination of concatenation and assignment operator. For example, $protein .= $genetic_code{$codon}; is equivalent to $protein = $protein . $genetic_code{$codon};

      Delete
  2. Is it feasible to have spaces between the amino acids when they are printed ? I would really appreciate it, if you give me a response

    Have a nice day

    ReplyDelete
    Replies
    1. It is possible to split the amino acid sequence into words and/or lines, similar to NCBI BLAST sequence alignment output. Just add additional lines at end of the program to print spaces between the amino acids.

      For example,
      $substr_eq = length($protein) / 50;
      $substr_rem = length($protein) % 50;
      if ($substr_rem == 0) $substr_eq--;
      print "Translated Protein Sequence: \n\n\t";
      for ($i = 0; $i < $substr_eq + 1; $i++) {
      for ($j = $i * 50; $j < length($protein) && $j < ($i+1) * 50; $j += 1) {
      if (($j + 1) % 10 == 0) {
      print $protein[$j] . " ";
      } else {
      print $protein[$j];
      }
      }
      print "\n\t";
      }

      Delete

Post a Comment

Most Popular Posts

Technical Questions

Simple Server and Client Chat using Python