Email Header Analysis with Python

A hands-on project to analyze email headers, extract IP addresses, and identify suspicious patterns.

Posted Jul 27, 2025

By Sujay Sundar Raj

7 min read

Email Header Analyzer: A Security Engineer’s Deep Dive into Email Forensics

How I built a tool that reveals the hidden stories behind every email

The Mission

In the world of cybersecurity, emails are often the first line of attack. As a security engineer, I’ve seen how a single malicious email can compromise entire organizations. But what fascinates me most isn’t just the attacks, it’s the digital fingerprints that every email leaves behind.

Enter my Email Header Analyzer: a Python tool that peels back the layers of email headers to reveal the truth about where emails really came from, who sent them, and whether they’re legitimate or malicious. This project represents my passion for digital forensics and threat intelligence - two cornerstones of modern security engineering.

As someone deeply passionate about AI and its potential to revolutionize security practices, I’m excited to explore how machine learning could enhance this tool to automatically detect sophisticated email threats and predict attack patterns.

The Anatomy of Email Headers: A Security Engineer’s Perspective

Why Email Headers Matter

Every email is like a digital passport - it contains stamps from every server it visited on its journey. As a security engineer, these headers are my primary evidence when investigating phishing attacks, business email compromise (BEC), and other email-based threats.

The Tool Architecture

        
      
def load_email_headers(path):
    with open(path, 'r') as file:
        msg = message_from_file(file)
    return msg

Security Correlation: This simple function is the foundation of email forensics. Just like how I analyze network packet captures, email headers provide a complete audit trail of an email’s journey through the internet.

Threat Detection: The Security Engineer’s Toolkit

Authentication Analysis

        
      
def analyze_threat_indicators(msg):
    # SPF/DKIM/DMARC checks
    auth_results = msg.get("Authentication-Results")
    received_spf = msg.get("Received-SPF")
    
    if auth_results:
        results = auth_results.lower().split(";")
        for result in results:
            if any(proto in result for proto in ["spf=", "dkim=", "dmarc="]):
                verdict = result.strip().split()[0]
                if "fail" in verdict or "softfail" in verdict:
                    print("      ⚠️ Possible spoofing or misconfigured authentication")

Security Correlation: This implements email authentication verification, a critical security control. SPF, DKIM, and DMARC are the three pillars of email security. As a security engineer, I use these protocols to prevent email spoofing and phishing attacks. When these checks fail, it’s often the first indicator of a malicious email.

Header Mismatch Detection

        
      
# Mismatch check: From vs Return-Path
if return_path and from_addr and return_path not in from_addr:
    print(f"  ⚠️ Mismatch between 'From' and 'Return-Path': {from_addr} vs {return_path}")

# Suspicious Reply-To
if reply_to and reply_to not in from_addr:
    print(f"  ⚠️ 'Reply-To' is different from 'From': {reply_to} vs {from_addr}")

Security Correlation: This is header consistency analysis - a fundamental technique in email forensics. Attackers often manipulate headers to hide their true origin. When the “From” address doesn’t match the “Return-Path” or “Reply-To”, it’s a classic indicator of email spoofing or phishing attempts.

IP Intelligence: Connecting the Dots

IP Address Extraction

        
      
def extract_ip_addresses(headers):
    received_headers = headers.get_all('Received', [])
    ip_pattern = re.compile(r'\b(?:\d{1,3}\.){3}\d{1,3}\b')
    ip_addresses = []
    
    for header in received_headers:
        matches = ip_pattern.findall(header)
        for ip in matches:
            octets = ip.split('.')
            # Filter: all 4 parts must be 0-255, no leading zeroes
            if len(octets) == 4 and all(
                octet.isdigit() and
                0 <= int(octet) <= 255 and
                (octet == "0" or not octet.startswith("0"))
                for octet in octets
            ):
                ip_addresses.append(ip)
    
    return ip_addresses

Security Correlation: This is IP address validation and extraction - a crucial step in threat intelligence. The regex pattern and validation logic ensure we only extract legitimate IP addresses. In security engineering, this is how I build threat intelligence feeds and correlate attacks across different sources.

Reputation Analysis

        
      
def check_ip_reputation(ip):
    url = "https://api.abuseipdb.com/api/v2/check"
    querystring = {
        "ipAddress": ip,
        "maxAgeInDays": "90"
    }
    headers = {
        "Accept": "application/json",
        "Key": ABUSEIPDB_API_KEY
    }
    
    response = requests.get(url, headers=headers, params=querystring)
    data = response.json()["data"]
    abuse_score = data["abuseConfidenceScore"]
    total_reports = data["totalReports"]
    country = data["countryCode"]

Security Correlation: This implements threat intelligence integration - a key component of modern security operations. By checking IP addresses against reputation databases, I can quickly identify known malicious infrastructure. This is the same approach used in enterprise security tools to block threats in real-time.

Digital Forensics: The Security Engineer’s Methodology

Header Chain Analysis

The tool analyzes the complete chain of “Received” headers, which tells the story of an email’s journey:

Received: from mail-server.example.com (192.168.1.100)
Received: from smtp.gmail.com (74.125.224.72)
Received: from client.example.com (10.0.0.50)

Security Correlation: This is email routing analysis - a fundamental forensics technique. By analyzing the header chain, I can:

Identify the true origin of the email
Detect unauthorized mail servers
Spot anomalies in routing patterns
Correlate with known threat actor infrastructure

Timeline Analysis

        
      
print("Date:", msg.get("Date"))

Security Correlation: Timeline analysis is crucial in security investigations. Email timestamps help me:

Correlate attacks with other security events
Identify patterns in attack timing
Determine the speed of attack propagation
Build chronological attack narratives

Technical Skills Demonstrated

Programming & Security

Python - Core development language
Regular Expressions - Pattern matching for IP extraction
API Integration - Threat intelligence services
Data Validation - Input sanitization and verification

Security Engineering Skills

Email Forensics - Header analysis and interpretation
Threat Intelligence - IP reputation checking
Digital Forensics - Evidence collection and analysis
Authentication Analysis - SPF/DKIM/DMARC verification
Pattern Recognition - Identifying suspicious header patterns
Incident Response - Rapid threat assessment

Cybersecurity Knowledge

Email Security Protocols - SPF, DKIM, DMARC
Phishing Detection - Header manipulation identification
Threat Hunting - Proactive threat identification
Security Automation - Automated threat analysis

Why This Matters for Security Engineering

Building this email header analyzer reinforced several critical security principles:

Evidence Preservation - Every piece of data matters in investigations
Automation is Key - Manual analysis doesn’t scale in security operations
Threat Intelligence Integration - External data sources enhance detection
Pattern Recognition - Consistent analysis reveals attack patterns
Documentation is Crucial - Clear output helps in incident response

Future Enhancements

This project has opened my eyes to the potential of automated email forensics. I’m planning to add:

Machine Learning Integration - Automatic threat classification
Bulk Analysis - Process multiple emails simultaneously
Visualization - Email routing diagrams and threat maps
Integration with SIEM - Real-time email threat detection
Advanced Pattern Recognition - Detect sophisticated spoofing techniques
Geolocation Analysis - Map email origins geographically

Security Lessons Learned

1. The Devil is in the Details

Email headers contain a wealth of information that most people ignore. As a security engineer, I’ve learned that the smallest details often reveal the biggest threats.

2. Automation Enables Scale

Manual email analysis is impossible in enterprise environments. This tool demonstrates how automation can make security analysts more effective.

3. Threat Intelligence is Essential

No security tool operates in isolation. Integration with external threat intelligence sources provides context that local analysis cannot.

4. Documentation Drives Response

Clear, structured output is crucial for incident response teams. This tool provides actionable intelligence, not just raw data.

Real-World Applications

This tool has practical applications in:

Incident Response - Rapid email threat assessment
Security Operations - Proactive threat hunting
Compliance - Email security audit trails
Forensics - Evidence collection and analysis
Threat Intelligence - Attack pattern analysis

Final Thoughts

Building this email header analyzer taught me that email security is more than just spam filters. It’s about understanding the digital DNA of every email that enters your organization. The same analytical skills I use to investigate network intrusions apply to email forensics.

The most valuable lesson? Every email tells a story - you just need to know how to read it. As security threats become more sophisticated, tools like this become essential for protecting organizations from email-based attacks.

This project demonstrates that security engineering is about building tools that make complex analysis accessible and actionable. Whether you’re analyzing network traffic, investigating incidents, or building security tools, the fundamental principles remain the same: collect, analyze, correlate, and respond.

GitHub: Project Repository
Skills: Python, Email Forensics, Threat Intelligence, Digital Forensics, Security Automation

What security tools have you built that started as simple scripts but evolved into essential security infrastructure? I’d love to hear your experiences!

Projects, Security

This post is licensed under CC BY 4.0 by the author.