PCAP Analysis with Zeek | Digital Forensics and Incident Response

Introduction

Zeek (previously called bro) is a useful tool that enables high-level PCAP analysis at the application layer. I have mostly been doing my packet capture analysis in Wireshark and while Wireshark is still my number one tool for PCAP analysis, Zeek was a great find for me. Zeek is very suitable for performing automated analysis for quickly zeroing in on information. This post provides a quick introduction to Zeek and its capabilities.

We will be using a sample PCAP in this post. Grab a sample PCAP file here.

Obtaining Zeek log files

Zeek produces several .log files pertaining to various types of information contained in the PCAP. To generate these logs files, feed the PCAP to Zeek:

zeek -r <pcap>

The -r option specifies offline PCAP file analysis whereas -w specifies live network capture.

Depending on the size of the PCAP, this could take a while. When done, Zeek creates the following log files (depending on the type of traffic discovered):

  • dns.log
  • http.log
  • ssl.log
  • dhcp.log
  • (etc.)

The format within these log files is self-explanatory with column names being indicative of the information contained within the columns. Columns are tab-separated and are described in Zeek docs.

Parsing Zeek logs with zeek-cut

zeek-cut is a useful utility that ships with Zeek and provides the ability to extract desired information contained within the Zeek *.log files. I usually use zeek-cut to grep and awk and/or export data in CSV format. Some examples:

zeek-cut -u ts method host uri < http.log | grep "<string>" | awk '{print $1$}'
zeek-cut -F ',' -u ts method host uri < http.log | grep "<string>" | awk '{print $3 }'
cat conn.log | zeek-cut id.orig_h id.orig_p id.resp_h id.resp_p > temp.txt

Analyzing information in Zeek log files using ZAT

An alternative to manually converting Zeek log files to CSV format using zeek-cut mentioned above is the Zeek Analysis Toolkit (ZAT). ZAT can help automate the process of taking the Zeek log files and turning them into Pandas dataframes. I would advise that some familiarity with Pandas is needed but after learning the basics of Pandas dataframe manipulation, gleaning information from the log files becomes trivial. To begin, let’s load up the zat module and read the Zeek log files in a dataframe:

from zat.log_to_dataframe import LogToDataFrame
log_to_df = LogToDataFrame()
zeek_df = log_to_df.create_dataframe('dns.log')
pd.set_option('display.max_columns', None)
zeek_df

Since the information is now contained in a convenient dataframe, we can write queries to better understand the logs. Some examples are provided below.

zeek_df['query'].value_counts()

Automated anomaly detection in Zeek logs

I tried using this tool that relies on pyOD to detect outliers in multivariate data within the conn.log file. However, the tool is not well-documented yet and in my opinion it’s better to write our own scripts to run anomaly detection models on Zeek logs for better control and comprehension of the process and results.

Note: My .ipynb pertaining to some of the examples mentioned is available here.

Pranshu Bajpai
Pranshu Bajpai
Principal Security Architect

Pranshu Bajpai, PhD, is a principle security architect..