There is no way to verify an email’s contents except through cryptography. Until every email client includes encryption and reliable authentication, we should always doubt an email’s source.
We can increase our confidence in an email a little, though, by tracing its path through the mail system. I use this technique more-or-less daily to look at potential phishing emails. If the final Received header didn’t come from my bank, then I know it’s fake.
When we send an email, we connect to a server to deliver it for us. These servers are generally called mail transfer agents (MTAs) and they use the Simple Mail Transfer Protocol (SMTP).
I used the following technique to assess whether the #PodestaEmails leaked during the 2016 presidential elections were “obvious fakes,” as claimed by one observer. While there’s no way to prove they were legitimate, the headers looked believable.
Each time an MTA processes an email message it adds a header (the “Received:” header) to the front of the email message. The recipient can see the email’s whole path by tracing the Received headers back to the sender. Here is a typical Received header:
Received: from mail-lf0-f50.google.com (mail-lf0-f50.google.com [18.104.22.168]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx5.messagingengine.com (Postfix) with ESMTPS for <firstname.lastname@example.org>; Wed, 25 Apr 2018 17:43:40 -0400 (EDT)
As email has evolved, a lot of headers have appeared, many of whose names contain the word “Received.” For this exercise, we only pay attention to “Received:” headers. We ignore the ones that start with “X-” or end with “-SPF” or just embed the word “Received” somewhere in the middle.
The Received header contains several clauses, usually started with a preposition. For email tracing, we care about the “from” clause and the “by” clause. Ignore the “with” and “for” clauses, or any others. Here are the relevant clauses from the above example:
from mail-lf0-f50.google.com (mail-lf0-f50.google.com [22.214.171.124]) by mx5.messagingengine.com (Postfix)
Each of these clauses contains a host address and possibly other information in parentheses. The host addresses are highlighted in red. The host address is usually in domain name format, though the IP address may also be provided, usually in square brackets as shown in the “from” clause.
A Sample Trace
Bob sends an email to Alice. Bob’s client carries the name “124567.client.someisp.net” in the mail system. His email software always sends email via the MTA host named “MTA.net.” Alice always retrieves her email from the host “MTA3.net.” Bob’s email follows this path:
Here are the resulting email headers, grossly simplified for this example:
Received from MTA2.net by MTA3.net Received from MTA1.net by MTA2.net Received from 124567.client.someisp.net by MTA1.net From: Bob To: Alice
When we trace the email’s route from Alice back to Bob, the hosts appear in this order:
Simplifying the Header Mess
A modern email header may contain dozens of lines of anti-spam and authentication header data in addition to the Received headers. Before starting to trace an email, copy the headers into a text editor and remove all headers except the Received headers.
Rearranging each Received Header
This is almost but not quite the order of host names in the Received headers. We should rearrange the “from” clause in each Received header to appear after the “by” clause.
This might seem like unnecessary clerical work, but it greatly simplifies tracing when there are a dozen or more Received headers.
Here are the rearranged headers:
Received by MTA3.net from MTA2.net Received by MTA2.net from MTA1.net Received by MTA1.net from 124567.client.someisp.net From: Bob To: Alice
The host names are now in a consistent order. The first host name indicates the final MTA visited; the next host name is the final but one, and so on. If the same host name appears in adjacent Received headers, then the reordering makes them adjacent in the listing.
Tracking Email Geographically
We can track emails geographically. We find the IP address for each MTA and look them up with “IP geolocation” services. Use a search engine to find such services. There are many such services on the Web. They typically find the information using an IPv4 or IPv6 address, or a domain name. The services return the ISP’s name and the estimated physical location of that host on the network.
Here are some results returned for the genuine host addresses listed above:
- Google ISP at Mountain View, CA
- Google ISP at Ashburn, VA
- Google ISP at Buffalo, NY
- NYI ISP at New York, NY
- NYI ISP at Flushing, NY
The vendors of geolocation data recognize that locations are approximate. One vendor says it is intended to be accurate to the zipcode level, more or less. In practice, however, the services often provide a precise latitude and longitude. The data may be precise, but it isn’t accurate. This has led to unfortunate results. When checking the above addresses, one service located the Google address in a lake outside Wichita (lat 37.750999450684, lon -97.821998596191). Services often pick a random location in the middle of the continent to serve as the “default” address for “on this continent.”