Here's a trivial code snippet to parse AWS VPC Flow Logs. This is extremely useful when setting up permissive security groups and then tightening them up later.
This script will (probably) fail if there are too many VPC flow log files (and therefore the Python interpreter would run out of memory). However it's nice to see that Pandas read_csv can read S3 URL's directly (even gzip'ped CSV files).
You can also filter for REJECT rule and find out all the IP's that have been attempting to attack you.
This script will (probably) fail if there are too many VPC flow log files (and therefore the Python interpreter would run out of memory). However it's nice to see that Pandas read_csv can read S3 URL's directly (even gzip'ped CSV files).
You can also filter for REJECT rule and find out all the IP's that have been attempting to attack you.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | from boto.s3.connection import S3Connection import pandas as pd import os srcbucket = 'flowlogs-bucket-orly' aws_access_key = 'AKIAxxx' aws_secret_key = 'oRCIxxx' os.environ['AWS_ACCESS_KEY_ID'] = aws_access_key os.environ['AWS_SECRET_ACCESS_KEY'] = aws_secret_key cols = [ 'timestamp', 'version', 'accountid', 'interfaceid', 'srcaddr', 'dstaddr', 'srcport', 'dstport', 'protocol', 'packets', 'bytes', 'start', 'end', 'action', 'logstatus'] # iterate over all the VPC flowlogs in the bucket conn = S3Connection(aws_access_key, aws_secret_key) bucket = conn.get_bucket(srcbucket) count = 0 for key in bucket.list(): keystr = key.name.encode('utf-8') if 'eni-' in keystr: s3url = 's3://' + srcbucket + '/' + keystr count = count + 1 if (count == 1): df = pd.read_csv(s3url, delim_whitespace=True, header=None, names=cols, low_memory=True) df = df [ df['action'] == 'ACCEPT'] else: df2 = pd.read_csv(s3url, delim_whitespace=True, header=None, names=cols, low_memory=True) df2 = df2 [ df2['action'] == 'ACCEPT'] df = df.append(df2, ignore_index=True) print 'Processed ' + keystr src = df.groupby(['srcaddr', 'dstaddr', 'dstport']).size().reset_index() src.columns = [ 'srcaddr', 'dstaddr', 'dstport', 'count' ] n = src.sort_values(by = ['count', 'srcaddr', 'dstaddr'], ascending=False) print(n) |
No comments:
Post a Comment