When Something Doesn't Look Right in Logs: A Practical Guide to Log Analysis
The Question Everyone Asks at 3 AM
“Why does this IP talk to that IP?”
It’s 3 AM. You’re on-call. A log entry pops up that doesn’t match anything in your baseline. The server that usually talks to three internal services is suddenly making outbound connections to a foreign IP. Your gut says something is wrong. Your question is simple: is it, or is it just noise?
This is the heart of log analysis. Not the tools. Not the dashboards. The question itself.
Let me walk you through how to answer it.
Step 1: Establish What “Normal” Looks Like
You can’t detect anomalies without a baseline. This isn’t optional.
What to Track
For every system, know:
- Typical outbound connections — which IPs, which ports, which protocols
- Typical connection frequency — how many connections per hour/day
- Typical data volume — how much data flows in/out
- Typical timing — when connections happen (business hours vs. off-hours)
- Typical users/services — who is connecting to what
Quick Baseline Script
#!/bin/bash
# baseline_connections.sh — Track outbound connections for 24 hours
echo "Timestamp,Source IP,Source Port,Dest IP,Dest Port,Protocol,Bytes" > baseline.csv
# For Linux (requires ss or netstat)
while true; do
ss -tunapo | awk '{print strftime("%Y-%m-%d %H:%M:%S"), $5, $6, $7, $8}' >> baseline.csv
sleep 300 # Every 5 minutes
done &
Run this for a week. You now have a baseline. Anything outside it is suspicious.
Step 2: The “Why Does This IP Talk to That IP?” Investigation
Let’s say you see this in your logs:
2026-06-07 03:14:22 web-server-01 10.0.1.50:44832 -> 185.220.101.42:443 HTTPS
2026-06-07 03:14:23 web-server-01 10.0.1.50:44833 -> 185.220.101.42:443 HTTPS
2026-06-07 03:14:24 web-server-01 10.0.1.50:44834 -> 185.220.101.42:443 HTTPS
First Questions to Ask
- Is this IP known? Run it through threat intelligence:
# Use VirusTotal (free tier) curl -s "https://www.virustotal.com/api/v3/ip_addresses/185.220.101.42" \ -H "x-apikey: YOUR_KEY" # Or use Shodan (free tier) curl -s "https://api.shodan.io/shodan/host/185.220.101.42?key=YOUR_KEY" -
Is the timing normal? 3 AM is not a normal time for a web server to make outbound HTTPS connections.
-
Is the frequency normal? Three connections in three seconds to the same IP is unusual for a web server.
- What service is making the connection?
# Find the process lsof -i :443 -p $(pgrep -f "nginx|apache|node") # Or use ss with process info ss -tunapo | grep 185.220.101.42
Step 3: Common Anomaly Patterns
Here are the patterns I’ve seen in real incidents. Learn them.
Pattern 1: Data Exfiltration
Normal: 10.0.1.50 -> 93.184.216.34:443 (legitimate CDN)
Anomaly: 10.0.1.50 -> 185.220.101.42:443 (known Tor exit node)
What to look for:
- Connections to known Tor exit nodes
- Unusual data volumes (large uploads at odd hours)
- Connections to IPs in unexpected geolocations
Pattern 2: C2 Communication
Normal: 10.0.1.50 -> 13.107.42.14:443 (Microsoft update)
Anomaly: 10.0.1.50 -> 45.33.32.156:8443 (unusual port)
What to look for:
- Connections to unusual ports (not 80, 443, 53)
- Beaconing behavior (regular intervals to the same IP)
- DNS queries to unusual domains
Pattern 3: Lateral Movement
Normal: 10.0.1.50 -> 10.0.1.100:22 (SSH to internal server)
Anomaly: 10.0.1.50 -> 10.0.2.100:22 (SSH to unexpected subnet)
What to look for:
- SSH/RDP to unexpected internal hosts
- Connections between subnets that don’t normally communicate
- New services appearing on internal hosts
Pattern 4: DNS Tunneling
Normal: DNS query to dns.google (8.8.8.8)
Anomaly: DNS query to suspicious-domain.xyz (unknown resolver)
What to look for:
- DNS queries to unexpected resolvers
- Unusually long DNS queries (data in the domain name)
- High volume of DNS queries from a single host
Step 4: Tools for Log Analysis (All Free)
Splunk (Free Up to 500MB/day)
index=* "185.220.101.42" | stats count by src_ip, dest_ip, dest_port
Elastic Stack (Free, Open Source)
{
"query": {
"match": {
"dest_ip": "185.220.101.42"
}
}
}
Zeek (Open Source Network Analysis)
# Analyze network logs
zeek -r capture.pcap log_connections
# Look for anomalies
zeek -r capture.pcap log_http | grep "185.220.101.42"
Suricata (Open Source IDS)
# Run Suricata on your network traffic
suricata -c /etc/suricata/suricata.yaml -r capture.pcap
# Check alerts
cat /var/log/suricata/eve.json | grep "185.220.101.42"
Step 5: Automate Detection
Don’t rely on manual investigation. Automate the patterns above.
Simple Anomaly Detection Script
#!/usr/bin/env python3
# anomaly_detector.py — Simple log anomaly detector
import csv
from datetime import datetime, timedelta
from collections import defaultdict
# Load baseline (from baseline.csv)
baseline = defaultdict(set) # {src_ip: {dest_ip, dest_port}}
with open('baseline.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
src_ip = row['Source IP']
dest_ip = row['Dest IP']
dest_port = row['Dest Port']
baseline[src_ip].add((dest_ip, dest_port))
# Load current logs
current_connections = defaultdict(set)
with open('current_logs.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
src_ip = row['Source IP']
dest_ip = row['Dest IP']
dest_port = row['Dest Port']
current_connections[src_ip].add((dest_ip, dest_port))
# Find anomalies
for src_ip in current_connections:
new_connections = current_connections[src_ip] - baseline.get(src_ip, set())
if new_connections:
print(f"ANOMALY: {src_ip} connecting to new destinations:")
for dest_ip, dest_port in new_connections:
print(f" {src_ip} -> {dest_ip}:{dest_port}")
Run this every hour. It’ll flag anything that breaks your baseline.
The Key Insight
Log analysis isn’t about tools. It’s about knowing what normal looks like and recognizing when something breaks that pattern.
Your baseline is your most important security asset. Build it. Maintain it. Respect it.
What’s Next
In the next post, I’ll cover building a home lab for AI security testing — how to set up a safe environment for learning, practicing, and testing AI security tools without touching production.
This post is part of the Cyber-AI initiative — free, open-source cybersecurity and AI education for everyone.
Found this useful? Share it with someone who’s on-call. They’ll thank you at 3 AM.