RNA to Protein Translation in PERL

March 09, 2010

In PERL programming, an RNA sequence can be translated to a protein sequence by substituting equivalent amino acid characters to triplet characters of RNA. This method has followed to find six reading frames (three in the forward direction, and three in the reverse direction). In this program, I have used the associative array (also known as a hash array) to associate triplet characters with amino acid characters.

The associate array corresponding to codon table is arranged to 20 amino acid character. The triplet codon table is shown below:

Source Code:

print "Enter the RNA sequence: ";
$rna = <>;
chomp($rna);
$rna =~s/[^acgu]//ig;
my $rna = uc($rna);
my(%genetic_code) = (
  'UCA' => 'S', # Serine
  'UCC' => 'S', # Serine
  'UCG' => 'S', # Serine
  'UCU' => 'S', # Serine
  'UUC' => 'F', # Phenylalanine
  'UUU' => 'F', # Phenylalanine
  'UUA' => 'L', # Leucine
  'UUG' => 'L', # Leucine
  'UAC' => 'Y', # Tyrosine
  'UAU' => 'Y', # Tyrosine
  'UAA' => '_', # Stop
  'UAG' => '_', # Stop
  'UGC' => 'C', # Cysteine
  'UGU' => 'C', # Cysteine
  'UGA' => '_', # Stop
  'UGG' => 'W', # Tryptophan
  'CUA' => 'L', # Leucine
  'CUC' => 'L', # Leucine
  'CUG' => 'L', # Leucine
  'CUU' => 'L', # Leucine
  'CCA' => 'P', # Proline
  'CAU' => 'H', # Histidine
  'CAA' => 'Q', # Glutamine
  'CAG' => 'Q', # Glutamine
  'CGA' => 'R', # Arginine
  'CGC' => 'R', # Arginine
  'CGG' => 'R', # Arginine
  'CGU' => 'R', # Arginine
  'AUA' => 'I', # Isoleucine
  'AUC' => 'I', # Isoleucine
  'AUU' => 'I', # Isoleucine
  'AUG' => 'M', # Methionine
  'ACA' => 'T', # Threonine
  'ACC' => 'T', # Threonine
  'ACG' => 'T', # Threonine
  'ACU' => 'T', # Threonine
  'AAC' => 'N', # Asparagine
  'AAU' => 'N', # Asparagine
  'AAA' => 'K', # Lysine
  'AAG' => 'K', # Lysine
  'AGC' => 'S', # Serine
  'AGU' => 'S', # Serine
  'AGA' => 'R', # Arginine
  'AGG' => 'R', # Arginine
  'CCC' => 'P', # Proline
  'CCG' => 'P', # Proline
  'CCU' => 'P', # Proline
  'CAC' => 'H', # Histidine
  'GUA' => 'V', # Valine
  'GUC' => 'V', # Valine
  'GUG' => 'V', # Valine
  'GUU' => 'V', # Valine
  'GCA' => 'A', # Alanine
  'GCC' => 'A', # Alanine
  'GCG' => 'A', # Alanine
  'GCU' => 'A', # Alanine
  'GAC' => 'D', # Aspartic Acid
  'GAU' => 'D', # Aspartic Acid
  'GAA' => 'E', # Glutamic Acid
  'GAG' => 'E', # Glutamic Acid
  'GGA' => 'G', # Glycine
  'GGC' => 'G', # Glycine
  'GGG' => 'G', # Glycine
  'GGU' => 'G'  # Glycine
);
my ($protein) = "";
for(my $i=0;$i<length($rna)-2;$i+=3)
{
  $codon = substr($rna,$i,3);
  $protein .= $genetic_code{$codon};
}
print "Translated protein sequence is $protein";
<>;

This program can also used for six reading frame, by changing the three character shift in forward and reverse of the RNA sequence.

Comments

It's just meOctober 22, 2017 at 5:59 PM
(.=) what is the function of this operator? can you please explain
ReplyDelete
Replies
B(io)-anni$cientistMarch 19, 2018 at 6:11 PM
Is it feasible to have spaces between the amino acids when they are printed ? I would really appreciate it, if you give me a response

Have a nice day
ReplyDelete
Replies
EbsFebruary 18, 2023 at 11:48 PM
Thank you for the example script. I am not sure about the meaning of my $rna =~s/[^acgu]//ig
The meaning of the ig flags is not easy to look up in the documentation. Could you elaborate?
ReplyDelete
Replies