perl regex script for IP traffic analysis

Perl is my language of choice for just about everything: shell scripting, windows applications using Tk, Iot, raspberry pi GPIO, and sometimes even web via CGI (an anachronism, I know).  It’s just so easy to do something really powerful with little effort.

I have been using Wireshark since it was ethereal, back in the early 2000’s when I was caught up deep into the Cisco certification racket.  Lately, I prefer the granular control of tcpdump from the command line.  One of Perl’s many great strengths is of course it’s regular expression capabilities.  One night, just for fun, I wanted to analyze my TCP/IP traffic as I was casually browsing some of my usual sites.  I ran the following command and had the output piped to a text file:


sudo tcpdump -i eth2 -n -# dst 192.168.0.3 | tee dump.txt

Pretty simple.  It tells tcpdump to monitor my ethernet interface, not to resolve hostnames, number each packet, record only packets destined for my PC, and write the results in dump.txt.   You immediately see results on the screen:

tcpdump perl
tcpdump results

I wrote a very simple script off the top of my head to scan the results file, dump.txt, for IPv4 addresses, and then tell me the number of packets received from that address, the name of the organization that it originated from, and the country of origin.

The Perl regex that I came up with for finding an IP address in a text file is embarrassing simple, but very effective.

while($_ =~ /(\d+\.\d+\.\d+\.\d+)/g){
...
}#end while

Does it test for invalid addresses? No.  But after using this for some time,  it has never failed to find every IP address in the output of a tcpdump capture.   My script looks at each IP address and runs a whois command for each unique address.  I then use more regexes to find the organization and countries of origin.  below is a sample output.

This particular capture was very brief, and only contained 7 unique addresses.  This script, however, can work for hundreds or thousands of IP addresses.

I also used this script to parse my auth.log file on one of my internet-facing home servers.  I stupidly had my home server with port 22 open for ssh.  I was constantly  being hit with attempts to login to my server with well known usernames and passwords literally all day long.  (I changed the ssh port later and nearly all of this stopped!)  Most of these login attempts were from foreign countries, no doubt running a script.  I was first alerted to this problem by running

netstat -t

and seeing a lot of tcp connections to strange addresses that I was sure were unsolicited, and that I did not initiate.
Here is the script in it’s entirety.

#!/usr/bin/perl -w
use strict;
$| = 1;

=pod
parses tcpdump file for ip addresses

example for creating file:
sudo tcpdump -i eth0 > dump.txt

=cut

open FH, "dump.txt" or die $!;

#array for IP addresses
my @ips;
my @uips;            #unique IP address array
my $ipex_f = 0; #flag to test t/f ip exists in array

my $ln = 1;
while(){

while($_ =~ /(\d+\.\d+\.\d+\.\d+)/g){
push @ips, $1;
#print "$1\ton line $ln\n";

#reset each line
$ipex_f = 0;
my $dexist = 0;

#see if IP in array, if not push on unique IP array
foreach my $ip (@ips){
if($ip eq $1){
$dexist++;
}#end if
}#end foreach

if($dexist == 1){
push @uips, $1;
}#end if
}#end if

$ln++;
}#end while

my $n_ip = @uips;
print "$n_ip IP addresses found......\n-----------------------------------------\n";
foreach my $addr (@uips){
print $addr."\n";
}#end foreach

my $n=1;
foreach my $ipa (@uips){
print "\n$n--------------------------------------------\n";
my $n_occur = 0;

foreach my $n (@ips){
if($n eq $ipa){
$n_occur++;
}#end if
}#foreach

print "$ipa\t$n_occur \n";
my $whois = `whois $ipa > whois.txt`;
open WT, "whois.txt" or die $!;
while(){
if($_ =~ /Organization/i){ print $_; }
if($_ =~ /Country/i){ print $_; }
}#end while
close WT;

$n++;
}#end foreach

close FH;

Simple, but effective.  You could use this on ANY text document that contains IP addresses.

Leave a Reply

Your email address will not be published. Required fields are marked *