Biological Sequence Pattern Matching using Perl

September 24, 2018

This article is a simple Perl programming tutorial for matching patterns in the biological sequence using regular expressions. In this tutorial, I have used ActiveState Perl 5.24.3 software for compiling the Perl script.

Pattern Matching

In Bioinformatics, string matching or pattern matching is a fundamental and popular method used in a wide range of applications ranging from sequence alignment to functional prediction. Pattern matching is classified into exact pattern matching and approximate pattern matching. The exact pattern matching method does not allow any insertion, deletion, or substitution of characters while matching with the target sequence, whereas the approximate pattern matching method allows with certain limitations. In Computational Biology, a pattern is an expression as a sequence of characters with a defined set of symbolic representation. Example: N{P}-[ST]{P}A(2,3).

Source Code

system('cls');
print "\n+-----------------------------------+";
print "\n| Matching Patterns in the Sequence |";
print "\n+-----------------------------------+\n";
print "\nEnter the sequence:\n\n";
$s = <>; 
chomp($s);
$s =~ s/[^a-zA-Z]//g;
$s = uc($s);
$s1 = $s;
print "\nEnter the search pattern: ";
$q = <>;
chomp($q);
$q = uc($q);
if ($s =~ m/$q/) {
  system('cls');
  print "\nPattern \"$q\" found in the sequence!\n\n\nResult\:\n\n\t";
  $s1 =~ s/$q/1/gi;
  $s1 =~ s/[a-z]/-/gi;
  $s1 =~ s/1/$q/g;
  $x = $s;
  $y = 50;
  $len = length($x);
  $qt = int($len / $y);
  $rm = $len % $y;
  if ($rm == 0 and $qt > 0) { $qt--; }
  for ($i = 0; $i <= $qt; $i++) {
    if ($rm > 0) {
      if ($i == $qt) {
        print substr($s, $i * $y, $rm), "\n\t";
        print substr($s1, $i * $y, $rm), "\n\n\t";
      } else {
        print substr($s, $i * $y, $y), "\n\t";
        print substr($s1, $i * $y, $y), "\n\n\t";
      }
    } else {
      print substr($s, $i * $y, $y), "\n\t";
      print substr($s1, $i * $y, $y), "\n\n\t";
    }
  }
  <>;
} else {
  system('cls');
  print "\nPattern \"$q\" not found in the sequence!\n";
  <>;
}

Input/Output

Pattern Matching Input

Search This Blog

BioGem Blog