Email Header Analysis with Python
A hands-on project to analyze email headers, extract IP addresses, and identify suspicious patterns.
Email Header Analyzer: A Security Engineer’s Deep Dive into Email Forensics
How I built a tool that reveals the hidden stories behind every email
The Mission
In the world of cybersecurity, emails are often the first line of attack. As a security engineer, I’ve seen how a single malicious email can compromise entire organizations. But what fascinates me most isn’t just the attacks, it’s the digital fingerprints that every email leaves behind.
Enter my Email Header Analyzer: a Python tool that peels back the layers of email headers to reveal the truth about where emails really came from, who sent them, and whether they’re legitimate or malicious. This project represents my passion for digital forensics and threat intelligence - two cornerstones of modern security engineering.
As someone deeply passionate about AI and its potential to revolutionize security practices, I’m excited to explore how machine learning could enhance this tool to automatically detect sophisticated email threats and predict attack patterns.
The Anatomy of Email Headers: A Security Engineer’s Perspective
Why Email Headers Matter
Every email is like a digital passport - it contains stamps from every server it visited on its journey. As a security engineer, these headers are my primary evidence when investigating phishing attacks, business email compromise (BEC), and other email-based threats.
The Tool Architecture
1
2
3
4
def load_email_headers(path):
with open(path, 'r') as file:
msg = message_from_file(file)
return msg
Security Correlation: This simple function is the foundation of email forensics. Just like how I analyze network packet captures, email headers provide a complete audit trail of an email’s journey through the internet.
Threat Detection: The Security Engineer’s Toolkit
Authentication Analysis
1
2
3
4
5
6
7
8
9
10
11
12
def analyze_threat_indicators(msg):
# SPF/DKIM/DMARC checks
auth_results = msg.get("Authentication-Results")
received_spf = msg.get("Received-SPF")
if auth_results:
results = auth_results.lower().split(";")
for result in results:
if any(proto in result for proto in ["spf=", "dkim=", "dmarc="]):
verdict = result.strip().split()[0]
if "fail" in verdict or "softfail" in verdict:
print(" ⚠️ Possible spoofing or misconfigured authentication")
Security Correlation: This implements email authentication verification, a critical security control. SPF, DKIM, and DMARC are the three pillars of email security. As a security engineer, I use these protocols to prevent email spoofing and phishing attacks. When these checks fail, it’s often the first indicator of a malicious email.
Header Mismatch Detection
1
2
3
4
5
6
7
# Mismatch check: From vs Return-Path
if return_path and from_addr and return_path not in from_addr:
print(f" ⚠️ Mismatch between 'From' and 'Return-Path': {from_addr} vs {return_path}")
# Suspicious Reply-To
if reply_to and reply_to not in from_addr:
print(f" ⚠️ 'Reply-To' is different from 'From': {reply_to} vs {from_addr}")
Security Correlation: This is header consistency analysis - a fundamental technique in email forensics. Attackers often manipulate headers to hide their true origin. When the “From” address doesn’t match the “Return-Path” or “Reply-To”, it’s a classic indicator of email spoofing or phishing attempts.
IP Intelligence: Connecting the Dots
IP Address Extraction
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def extract_ip_addresses(headers):
received_headers = headers.get_all('Received', [])
ip_pattern = re.compile(r'\b(?:\d{1,3}\.){3}\d{1,3}\b')
ip_addresses = []
for header in received_headers:
matches = ip_pattern.findall(header)
for ip in matches:
octets = ip.split('.')
# Filter: all 4 parts must be 0-255, no leading zeroes
if len(octets) == 4 and all(
octet.isdigit() and
0 <= int(octet) <= 255 and
(octet == "0" or not octet.startswith("0"))
for octet in octets
):
ip_addresses.append(ip)
return ip_addresses
Security Correlation: This is IP address validation and extraction - a crucial step in threat intelligence. The regex pattern and validation logic ensure we only extract legitimate IP addresses. In security engineering, this is how I build threat intelligence feeds and correlate attacks across different sources.
Reputation Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def check_ip_reputation(ip):
url = "https://api.abuseipdb.com/api/v2/check"
querystring = {
"ipAddress": ip,
"maxAgeInDays": "90"
}
headers = {
"Accept": "application/json",
"Key": ABUSEIPDB_API_KEY
}
response = requests.get(url, headers=headers, params=querystring)
data = response.json()["data"]
abuse_score = data["abuseConfidenceScore"]
total_reports = data["totalReports"]
country = data["countryCode"]
Security Correlation: This implements threat intelligence integration - a key component of modern security operations. By checking IP addresses against reputation databases, I can quickly identify known malicious infrastructure. This is the same approach used in enterprise security tools to block threats in real-time.
Digital Forensics: The Security Engineer’s Methodology
Header Chain Analysis
The tool analyzes the complete chain of “Received” headers, which tells the story of an email’s journey:
1
2
3
Received: from mail-server.example.com (192.168.1.100)
Received: from smtp.gmail.com (74.125.224.72)
Received: from client.example.com (10.0.0.50)
Security Correlation: This is email routing analysis - a fundamental forensics technique. By analyzing the header chain, I can:
- Identify the true origin of the email
- Detect unauthorized mail servers
- Spot anomalies in routing patterns
- Correlate with known threat actor infrastructure
Timeline Analysis
1
print("Date:", msg.get("Date"))
Security Correlation: Timeline analysis is crucial in security investigations. Email timestamps help me:
- Correlate attacks with other security events
- Identify patterns in attack timing
- Determine the speed of attack propagation
- Build chronological attack narratives
Technical Skills Demonstrated
Programming & Security
- Python - Core development language
- Regular Expressions - Pattern matching for IP extraction
- API Integration - Threat intelligence services
- Data Validation - Input sanitization and verification
Security Engineering Skills
- Email Forensics - Header analysis and interpretation
- Threat Intelligence - IP reputation checking
- Digital Forensics - Evidence collection and analysis
- Authentication Analysis - SPF/DKIM/DMARC verification
- Pattern Recognition - Identifying suspicious header patterns
- Incident Response - Rapid threat assessment
Cybersecurity Knowledge
- Email Security Protocols - SPF, DKIM, DMARC
- Phishing Detection - Header manipulation identification
- Threat Hunting - Proactive threat identification
- Security Automation - Automated threat analysis
Why This Matters for Security Engineering
Building this email header analyzer reinforced several critical security principles:
- Evidence Preservation - Every piece of data matters in investigations
- Automation is Key - Manual analysis doesn’t scale in security operations
- Threat Intelligence Integration - External data sources enhance detection
- Pattern Recognition - Consistent analysis reveals attack patterns
- Documentation is Crucial - Clear output helps in incident response
Future Enhancements
This project has opened my eyes to the potential of automated email forensics. I’m planning to add:
- Machine Learning Integration - Automatic threat classification
- Bulk Analysis - Process multiple emails simultaneously
- Visualization - Email routing diagrams and threat maps
- Integration with SIEM - Real-time email threat detection
- Advanced Pattern Recognition - Detect sophisticated spoofing techniques
- Geolocation Analysis - Map email origins geographically
Security Lessons Learned
1. The Devil is in the Details
Email headers contain a wealth of information that most people ignore. As a security engineer, I’ve learned that the smallest details often reveal the biggest threats.
2. Automation Enables Scale
Manual email analysis is impossible in enterprise environments. This tool demonstrates how automation can make security analysts more effective.
3. Threat Intelligence is Essential
No security tool operates in isolation. Integration with external threat intelligence sources provides context that local analysis cannot.
4. Documentation Drives Response
Clear, structured output is crucial for incident response teams. This tool provides actionable intelligence, not just raw data.
Real-World Applications
This tool has practical applications in:
- Incident Response - Rapid email threat assessment
- Security Operations - Proactive threat hunting
- Compliance - Email security audit trails
- Forensics - Evidence collection and analysis
- Threat Intelligence - Attack pattern analysis
Final Thoughts
Building this email header analyzer taught me that email security is more than just spam filters. It’s about understanding the digital DNA of every email that enters your organization. The same analytical skills I use to investigate network intrusions apply to email forensics.
The most valuable lesson? Every email tells a story - you just need to know how to read it. As security threats become more sophisticated, tools like this become essential for protecting organizations from email-based attacks.
This project demonstrates that security engineering is about building tools that make complex analysis accessible and actionable. Whether you’re analyzing network traffic, investigating incidents, or building security tools, the fundamental principles remain the same: collect, analyze, correlate, and respond.
GitHub: Project Repository
Skills: Python, Email Forensics, Threat Intelligence, Digital Forensics, Security Automation
What security tools have you built that started as simple scripts but evolved into essential security infrastructure? I’d love to hear your experiences!