Danger Zones: 4 Things You Need to Know when Testing Emails

Posted by ahpromes

Remember back in January, when we asked you to help us run an experiment with the Marketing Experiments Blog testing the effectiveness of different 
email subject lines? The results are in, and we have a subject line winner! We’ll talk about the test methodology and the winning submission, but before getting to that, I wanted to go over some of the common pitfalls and danger zones when it comes to email subject line testing (and, really, testing in general). Think of it like this:

(Image licensed from Getty Images)

Boundary #1: Make sure you’re measuring the right thing

Generally speaking, the impact that email subject lines have on the performance of an email campaign is concentrated on
open rate; more effective and intriguing subject lines drive more opens. This is because the subject line is the primary thing that you see when you make it to your inbox – and how much of that subject line a reader will or won’t see is heavily influenced by that individual’s choices in how they’ve set up their browser and reading panes.

Using my own email accounts as a visual example, you can see that the Gmail inbox can be generous; here it’s showing up to 63 characters of the subject line and body text:

My Outlook web interface cuts at 52 characters, although this is heavily influenced by my setup – because my reading pane is set to “right,” (vs. “bottom,” or “off,” Outlook’s other two choices), I have less screen devoted to email previews and can see fewer subject line characters.

My Yahoo! Mail setup is the least generous, cutting subject lines at 49 charcters (but let’s be real; it’s unlikely that many of your potential customers are still using Yahoo! Mail).

If this is giving you the sneaking suspicion that email subject line length also has an influence on email subject line effectiveness, you’re right. In our subject line test, we have line lengths ranging from 38 characters to 94 characters. The best performing subject line, in terms of driving the highest open rate? Smack in the middle at 51 characters.

Does this mean 51 characters is the ideal, maximum subject line length? Not necessarily. Too short can be an issue as well, as too few characters means fewer words at your disposal to entice an open and convey meaning. The three best performing subject lines in this test (average of 17.5% opened) averaged 51 characters long; the three with the lowest open rates (average of 15.9% opened) averaged 71 characters long. The two control group subject lines (average of 16.4% opened), at our shortest 38 characters, landed squarely in the middle in terms of open rate.

Boundary #2: If email subject lines only influence open rates, why should I track clicks?

An email subject line can also impact overall click-to-open rate for an email. This, by the way, is a better measure for performance than click-through rate alone: A high click-through rate but a lower click-to-open rate means that your body copy is strong but that you have opportunity to drive even more traffic by modifying your subject line for better open rates, thus increasing the size of the audience exposed to your awesome body copy.

A subject line sets up an expectation in the mind of the email reader of what is to come; how well the actual content of the email delivers against this expectation leads to either reader satisfaction or disappointment. Strong email subject line-content alignment generally leads to more clicks vs. a subject line that poorly represents the body content of the email.

I can illustrate this with an example of an email test that I ran years ago while working at an online travel company (without all of the specific numbers, which are proprietary), where we tested different subject lines offering varying percent discounts on the purchase of our products. Our test went something like this, but with a dozen or so different test cells sent to millions of customers:

  • Subject Line 1: Get 15% off vacation packages!
  • Body of Email 2: Blah, blah, blah, Get 15% off vacation packages!
  • Subject Line 2: Open to discover your vacation package discount!
  • Body of Email 2: Blah, blah, blah, Get 15% off vacation packages!
  • Subject Line 3: [etc.]

What we learned was that we had better click-to-open rates on the emails where we had strong subject-body agreement, like in example 1; where we had vague subject lines we could drive a lot of interest (read: opens), but our body content seemed to disappoint in that our click-to-open rates were lower than in our matchy-matchy test cells.

For this VolunteerMatch email test, the body copy of all emails was identical except for one sentence; that one sentence had four different variations that were written to map to the six test (and one control) subject lines.

Our highest click-to-open rate (6.3%) in this email test, ”
Volunteering matters: We have the proof.” was also the subject line that delivered the highest click-through rates (1.08%), even though it placed only second in terms of overall opens (17.3%). This indicates that the body copy of the email delivered on the promise of the subject line pretty well, and that an area of opportunity here would be to work on increasing overall opens (e.g., more potential people to click).

Our highest open rate subject line (18.2%), ”
The volunteer app your coworkers will talk about” did not win in terms of either overall clicks (0.98%) or click-to-open rate (5.4%). This tells me two things:

  • The email body copy did not do a strong job of delivering on the expectations set by the subject line, and
  • The more I can refine that body copy to closely match the expectations set by the subject line, the more likely I am to drive total clicks.

Boundary #3: Are you measuring or categorizing tangible things?

I call this the “specious lens” test. When you’re looking at test results, be wary about what you use to classify or categorize your results. The subject line character length category is a tangible thing, perceivable by both testers and email recipients. Let’s look at some other subject line classifications for this email test to see if anything else has a real impact on open rates:

  • Use of special characters (e.g., punctuation marks)
  • Use of title case vs. sentence case

Both use of special characters and use of case are tangible to customers. But from the chart above, you can see that there really isn’t any correlation between either of these classifications and higher (or lower) open rates. The best performing subject line
and one of the test’s bottom three both excluded any kind of punctuation. Same for case; both the highest and worst performing subject lines used sentence case. Neither of these classifications appear to have any real, measurable impact, in this example, on customer email open rates.

If you are applying
value categorizations to your test results, however, you need to be especially wary when trying to draw conclusions; this is because the value categories that you create are less likely to be tangibly perceptible by your customers. If I group the tested subject lines by the value or sentiment that they primarily convey, I create the following four buckets:

  • Focuses on Caring as a sentiment
  • Focuses on Mobile App
  • Focuses on Quantifiable results
  • Focuses on Values (Good/Bad)

If you are classifying your test results based on you or your team’s value judgments, as I did here, and you can’t see any performance difference between your classifications, as is true here, ask yourself, “Are these classifications tangible to the customer? Do they fail to have a real
impact on outcomes, or are they simply not real?”

In this case, my answer is, “It’s not that value or sentiment don’t have an impact on outcomes, it’s that these sentiment classifications are likely not perceptible to the customer and thus aren’t a valid way in which to categorize this test.” It’s also risky to classify results after you already know the test outcomes; this can lead to you fitting a hypothesis to the test results vs. letting your test results prove or disprove your hypothesis.

Boundary #4: Statistics is your friend (i.e. math is important)

The last boundary to be aware of is statistics. Run all of your results data through some kind of statistical tool to make sure that the variations you’re seeing between your test segments are more than just random background noise. There are a lot of factors that go into determining statistical significance, such as overall sample sizes, overall “action” rates, the differences between action rates, and how confident you’d like to be in your results (e.g., it’s often easier to measure the difference between 1.1% and 0.1% than it is to measure the difference between 101% and 100%).

For this test, I’ve mentioned several times that two control emails were used. These both went to approximately the same number of people (36,000), and had identical subject lines and identical body copy. These two segments had similar, but not identical, overall open rates of 16.4% and 16.5%. In order to make sure that overall results are valid and there is no unintentional selection skew when creating (what should be random) segments, it’s imperative to make sure that the variation between these two control segments is nothing other than random noise.

In the chart below, you can see that these slight variations in open rate between the two test cells are not statistically significant; a very important signal that the total data set from the test is valid, too.

If you don’t have your own stats or analytical resources to help you with this last step, there are a lot of great tools and worksheets online to get you started, including the one that I’ve used here, from

And now to the contest results!

The methodology

First things first, let’s go over what was actually tested:

  • 6 subject line “test cells” that each received a different email subject line
  • 2 subject line “control cells” that received the same email subject line
  • Just under 36,000 emails delivered to each test and control cell
  • 287,117 emails delivered, overall
  • Email body copy differed by one sentence in each test cell; otherwise was identical

Metrics recorded included:

  • Emails delivered
  • Email opens
  • Email clicks

These three metrics were then used to calculate:

  • Open rate (opens / delivered)
  • Click-through rate (clicks / delivered)
  • Click-to-open rate (clicks / opens)

The actual subject lines that were used in the test, along with all of the corresponding metrics:

Spread the Only “Good” Office Virus was used as the subject line for the two control cells (why use two control cells? The Marketing Experiments Blog wrote up their takeaways from the experiment a few weeks ago, and you can read the details and rationale there).

The winning, reader-submitted subject line (that drove the highest rate of clicks) was submitted by Moz Blog reader Jeff Purdon, an In-House Web Marketing Specialist for a manufacturing company. Jeff wins a ticket to the MarketingSherpa Email Summit 2015 and a stay at the ARIA Resort in Las Vegas. Congratulations, Jeff!

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!

Moz Blog

February 19, 2015  Tags: , , , , , ,   Posted in: SEO / Traffic / Marketing

Leave a Reply

You must be logged in to post a comment.

TechNetSource on Facebook

TechNetSource » Danger Zones: 4 Things You Need to Know when Testing Emails