Tell me, do you read long articles? Let’s learn about content testing
Do you watch long YouTube videos? Do you listen to podcast episodes over 30 or even 60 minutes?
People question whether anyone reads my articles because they are so long or wasting a lot of time working with them.
Replace the word “long” with comprehensive, profound, or relevant. Not because these are synonyms, but because we’re talking about quality instead of just quantity. And then the question sounds completely different, doesn’t it? I also like to differentiate between having read and understood. The latter is even more difficult to achieve.
How long a blog article or text should be or how short a video has to be so that people consume this content is not the decisive question, in my opinion. Much more important is how much “scope” is necessary to achieve the goal – please think of your “business goal” and that of your target group – and how you manage to design your content in such a way that people want to have to consume it.
How much “scope”?
The answer to this is so individual that it cannot give across the board. Even if many supposed experts like to do that, it’s not working. Of course, there are recommendations from platforms and best practice experiences from competitors, but these are anything but general. It is better to find out what works and what doesn’t: through systematic testing based on specific hypotheses.
And not just with a view to the scope, but also the effectiveness of your content in every respect.
I am writing this article about “Content Testing” because I am convinced that many questions in content marketing can be answered much better through targeted experiments – because they are more individual and data-based. We just have to get into the habit of actively asking these questions and always questioning the answers.
In this article, you will find a total of 11 examples, including the following questions:
- Does the visual design of texts impact the achievement of goals, for example, the newsletter subscription rate? ( for example )
- Is the length of a text decisive? ( for example )
- How can I design long texts to motivate users to read them in full? ( for example )
- Which headline is the best? ( for example )
- Which cover picture works better? ( for example )
Experiments in content marketing, why not?
Experimentation is a great way to measure the effectiveness of your content. The findings will help you with the production, distribution, and, above all, the continuous optimization of your content portfolio. A / B tests, in particular, are important to find out which content works, which doesn’t, and, above all, why. So how (so) different elements such as headings, formatting, visual design, or other types of content affect traffic, customer behavior, and the conversion rate.
Testing is almost common in e-commerce, especially in usability and other user experience (UX) tests. Amazon, Shopify, Booking, Netflix – they all benefit from testing. But the content (in marketing) can also be tested excellently, for example, to validate assumptions about your target group, to formulate positively resonating texts (brand promises, unique selling points, purchase arguments, etc.),
or to determine the headlines with the highest clicks for blog articles and advertising texts.
“When it comes to testing, people love concentrating on UX, design, layout, and call-to-actions, yet they often ignore copy. In my opinion, you should concentrate on nailing your value proposition; everything else is just window dressing.”
Sandra Wu, Director of Growth, Himalaya
The strengths of content testing
In two cases, in particular, we can develop effective solutions through experiments: on the one hand, when we do not know what to do, for example, because we lack information, and on the other hand when we do not know or can not estimate what effect an intended change will have on a goal will. Or, to put it briefly: through testing, we can do the what, how? and above all, the why? Respond.
Whether blog articles, email marketing, social media, copywriting, visual design, landing pages, or search engine optimization – quantitative and/or qualitative tests can integrate practically anywhere.
We can test many variables:
- Title and meta title
- The description of structured data
- The article length
- The introductory paragraph
- The integration of image and video content
The primary distribution channel
According to Edward Wood, Chief Marketing Officer at CareerFoundry and previously Head of Content Marketing at Babbel, the key factor is the primary distribution channel. For example, if that’s an organic search, we should probably test the meta title, the inclusion of rich media, and the addition to the FAQ schema. On the other hand, if it is a paid distribution via content discovery networks like Outbrain and Taboola, we focus better on the iteration of titles/headings and the article length and template.
Content testing is particularly exciting on “Value Pages,” i.e., on those websites through which you boost your business – be it through traffic generation via blog articles, product communication via product pages, or acquisition via contact pages. Just take a look in your web analytics tool to see which pages are accessed particularly often or have a particularly high conversion rate.
At Babbel, we divided blogs into three broad categories; Performance, Organic, and Social. While many companies and agencies will probably concentrate on the latter two, the Performance blogs brought in the leads, the traffic, and the sales. They were the concentrate of our optimization efforts. At CareerFoundry, organic – and particularly organic search – content is internal to the company’s achievement. Therefore, we spend more time testing there than in content discovery.
Edward Wood, Chief Marketing Officer, Career Foundry
The different types of testing
Many of the following examples are based on A / B / n tests in which different variants are compared in randomized segmented target groups. It happens dynamically on the same page with the help of appropriate testing tools. In this case, we speak of split testing through the targeted distribution of traffic to different pages.
But there are also other types of testing. For this article, for example, I did what is known as “preliminary testing.” This form of testing is typical for TV advertising, which is shown to a small audience before it, broadcast to avoid blunders and ensure that the advertising message gets across.
What exactly did I do?
I sent this article to a few volunteer “beta readers” in advance (which is not ideal for sample selection but completely sufficient in my case) and asked them for feedback. It was particularly interesting for me to know.
- What terms appear that are unclear; I then explained that.
- Which statements/passages/images caused AHA moments; I highlighted these visually and used them as a teaser for the promotion.
- Whether there are still questions on the topic open at the end; (Which resulted in the fact that I will write a second article about content testing.)
- How motivating the article is, on the one hand, to read it in its entirety and on the other hand to start testing yourself; so that I have adjusted the structure of the article slightly.
More about the different types of (content) testing follows in another article.
Before we look at specific examples, I would like to outline the proven process for content testing briefly.
Content testing: step by step to the first experiment
Once you have identified promising pages or perhaps entire page types (for example, “blog articles,” “pillar pages,” or “service pages”), you can start.
- Specify your problem: Based on which observable, thus measurable symptoms do you suspect potential through optimization? It could be, for example, an increasing bounce rate as in the screenshot below or a low or falling newsletter subscription rate.
We should take a closer look at above-average and above all increasing bounce rates (screenshot: Google Analytics)
- Identify test fields through user research, such as jobs-to-be-done interviews, copy testing or usability tests. Such qualitative research methods are important because only they provide information about why, for example, the bounce rate is increasing.
While analytics provide metrics on how a site or its elements are performing, it doesn’t give you any input on what resonates with your readers and creates friction. Based on this open-ended feedback, we can work out a hypothesis for A / B testing. You can’t customize copy without qualitative user research.
Viljo Vabrit , Founder & Partner at Speero
- Formulate a hypothesis based on your findings: Which changes in your content (textual/linguistic, visual, structural, etc.) will positively influence the target metrics? What leads to this change in your users’ behavior: inside; which cognitive distortion or which psychological behavior pattern (we speak of “behavior pattern“) is this underlying?
The assumption was that the length of the content might even have triggered fears and uncertainty, which is why users jump off. The hypothesis was then:
- If we shorten the article without changing the story, we can increase the likelihood that users will read to the end and convert.
- Since that guess didn’t come true, they turned the experiment 180 degrees, so to speak, and doubled the content and page length to increase engagement. So it was obvious to test images, quotes, headlines, etc. Adding more images to make the page longer and more engaging will help increase engagement and drive more conversions.
- A simple scheme for good hypotheses is “If …, then …, because …” – the last part, in particular, is important to generate insights and thus impulses for further tests from each test, regardless of the result.
A good hypothesis also describes the reason for the expected result.
- Create the concept for your variant (s), carry out the test, and evaluate it. I simply summarize these three steps here because a detailed explanation would go beyond the scope. What we need is a high-contrast variant, in which we ideally only test a specific change to be able to prove not just a correlation but, at best, a causality. It refers primarily to “principles” (behavior pattern) and not individual text or visual elements. When testing The Motley Fool, it was less about the length itself than about the emotions it generated.
- I also recommend these basics about A / B testing statistics and these tips by Dr. Julia Engelmann, Head of Data Analytics at konversionsKRAFT.
An often neglected step
Undo the changes (optional). If you want to know exactly, you should validate by restoring the control variant after testing whether the change in the target metric, caused by the changes to the content/design. If so, the numbers should go back to what they were before the test. It would validate the hypothesis, and you can roll out the new variant again without hesitation.
When experimenting, always be open to unexpected results; often, the reasons for a change are not what you anticipated!
We can solve an issue in different ways. It is therefore important that you test individual solutions one after the other. And it is best to prioritize your hypotheses in advance based on the expected business impact (e.g., the percentage increase in the conversion rate) and the associated effort for implementation.
It ensures that you optimize the most important aspects and benefit from content testing right from the start.
Important: prioritize your ideas!
I don’t want to open a barrel because this task is not as trivial as it sounds, and I prefer to dedicate a separate article to the topic in the future. But at least I will create awareness that the potential of individual ideas varies and that we should invest our resources very specifically in selected optimization projects. You may be familiar with frameworks such as ICE, PIE, or the Moscow method, but I also find the “PXL Framework” from Speero exciting. Viljo Vabri explains it this way:
This structure helps us measure signal strength based on how many data points we have to help a hypothesis, combined with an “ease of implementation” metric. With objective at its core, this model requires data to impact the scoring. It breaks down what it wants to be ‘easily implemented.’
I find it exciting because it can, adapted for different content use cases, such as SEO or lead generation. You can find details and a template here on the Speero blog.
Now let’s look at a few examples and empirical values. It will surely also inspire your experiments.
Examples and inspiration for goal-oriented and data-driven content strategy
In our book Content Design, Ben Harmanus and I describe many ways to visually design texts through headings, paragraphs, lists, line spacing, etc. These tested design elements make a text appear more harmonious, but please never do that more effectively. You will only be able to answer this statement objectively through appropriate tests!
Content Design (2nd edition)
Robert Weller, Ben Harmanus
With this book, you will learn to consider the conception and audiovisual design of content holistically and implement it in a targeted manner.
Some of my favorite examples of content testing come from Blinkist. The company tests an incredible amount, mainly generating leads through their magazine, thus forming the foundation for user acquisition. Every percentage point more concerning the conversion rate means real money. The same applies to The Motley Fool, which markets subscription products and premium reports through their magazine.
You can find a summary of the most important learnings at the end of this article; Feel free to jump right there if that’s what you’re looking for.
Does the visual design of texts impact the achievement of goals, for example, the newsletter subscription rate?
Yes, the look can have a positive as well as a negative influence on the conversion rate. With an increase – we call this uplift – of 16.5%, the test at Blinkst, in which they primarily experimented with subheadings, was extremely positive (see screenshot below). The Motley Fool was also able to measure a significant uplift in experiments (see point 4).
Hardly anyone reads everything, so focus points, and a meaningful information hierarchy (the most important things first in case of doubt) are important (source: Sandra Wu, Blinkist)
Is it worth revising texts for the sake of good language?
Not in 75% of the cases tested (!) By Blinkist. As long as the quality of the writing is acceptable, there is often no reason to embellish the language because it does not create any additional value for the user.
Say what you have to speak in a way that everyone will understand. A good text doesn’t necessarily need more (Screenshot: Sandra Wu, Blinkist)
Is the length of a text decisive?
So there is the question every content marketer asks himself sooner or later; Blinkist and The Motley Fool are no exception. We argued for short texts in the previous example: Hardly anyone reads everything but scans extensive texts searching for relevant key terms. But the counter-argument is simple: What if users want more information or simply need more information to explain a product?
The answer is just as simple in a certain way, and it can even objectify through appropriate tests:
As long as more text means more value for the user: inside (ergo relevant), the length does not matter.
In over two dozen experiments, Blinkist has expanded the scope of texts with mostly positive effects on the conversion rate (screenshot: Sandra Wu, Blinkist)
In the test at The Motley Fool, long-form content won over the short version (screenshot: WiderFunnel)
How can I design long texts to motivate users to read them completely?
As the tests from the previous example show, relevance is crucial for a basic interest of the user: inside. For James Flory, Director Experimentation Strategy, and his colleagues at WiderFunnel, this first test was the starting point for further experiments. With graphics within the articles, visually high-contrast quotes, infoboxes, and other content elements that, on the one hand, add value for users: present inside, but simultaneously loosen up the text and make the “content experience” more varied. In total, they have generated a sales uplift of 26% to date. Long-form content can become more and more effective iteratively – and with the help of targeted experiments, we also learn why.
Iteratively, long-form content can become more and more effective – and with the help of targeted experiments, we also learn why (screenshot: WiderFunnel)
Which headline is the best?
The question is legitimate, but at the same time, we should also ask: When is the title the only thing that has to motivate a person to click, for example, in a search or on an overview page, and when can you provide more context with a supplementary article text?
It is not anymore a hidden truth that the New York Times, for example, tests the titles of its articles. However, it is exciting to see how many articles she tests, how many variants she tests, and how successful she is with them. Tom Cleveland, the software engineer at Stripe, analyzes exactly that with his NYT tracker. Unfortunately, this analysis can already recognize a pattern of headlines that become more and more dramatic over time.
Development of an NYT headline over time.
Overall, the data shows that tested items are 80% more likely to land on the most popular items list. In addition, the number of tests correlates with the engagement rate (e.g., reactions in the form of comments or social media shares). Nevertheless, the proportion of tested headlines is quite low at 29%, and most tests (79%) only include two alternatives. Cleveland suspects one reason for this rather rudimentary testing is that the NYT earns less than a third of its sales from advertising (and the trend is downward). A front page full of clickbait headlines are likely to be a deterrent to potential subscribers, with whom the company generates almost two-thirds of its sales.
Wait, do headline tests even work?
As Alex Birkett (former Growth Manager at HubSpot) rightly notes in his article on cxl.com, a headline usually only has a very limited lifespan – especially in media that publish with high frequency, such as the NYT. An A / B comparison that, for example, needs to run for four weeks to get a reasonable confidence level to achieve is not effective. So, strictly speaking, the NYT example is not an A / B test. Rather, it is about finding out as quickly as possible which title generates the strongest response. In doing so, we face the risk of succumbing to the so-called “confirmation bias” because, on the one hand, we cannot control external variables (especially segmented target groups) and, on the other hand, we consider more than one metric (clicks, shares, etc.).
Confirmation bias describes the tendency to select, determine and interpret information so that it confirms one’s expectations.
Testing headlines using advertisements is the better alternative, but it does involve additional costs and time.
The premier class is certainly so-called “multi-armed bandit tests,” but I refer to this article to not go beyond the scope here.
The bottom line is the following:
Headline tests are possible, but the time window is extremely small, and it requires an enormous amount of traffic to the individual (!) Articles/pages to achieve meaningful results.
How aggressively can I advertise products within my articles? Will advertising be tolerated at all?
Many companies want to boost their sales through content – in the long term and perhaps more indirectly, but often also directly and in the short term. Depending on what goal you are pursuing with a blog article, for example, your “advertising” looks different: If your primary aim is branding, then your focus is more on a sympathetic story. If, on the other hand, you will generate leads or convert visitors in another way, then you need effective calls-to-action.
Blinkist has also dealt with how aggressively – or let’s better call it self-confidently – they can advertise their product and when they first point it out within an article. Your conclusion:
Don’t be afraid to say about your product. Be proud of it!
Sandra Wu
CXL
Another example of this type of “copy testing” is CXL, which optimized the pitch on their homepage last year through qualitative user feedback. What previously came across as elitist and arrogant (on the left in the picture is then understood as helpful information on the necessary work effort (center). And a look at today’s variant (far right in the picture below) shows that it is now fully on social proof in the form of leave a testimonial.
Testing pitches, introductory texts, and offer descriptions can reveal valuable information. (Source: cxl.com)
How can I make my content more talking?
Is there anyone who doesn’t ask this question regularly? But how do we define “interaction”? For many, scrolling is an interaction, while others understand it to be commenting on a post or sharing on social media. It is also the case for a customer of WiderFunnel, who was able to achieve positive results in a test with a view to the offer
and the design (including positioning) of share buttons:
Preformulated tweets, sticky social sharing icons, and mouse-over effects – if you want to generate clicks, you should try out what triggers users the most (source: WiderFunnel)
What is the best email subject line?
Besides the short description, the subject line is the first and is often the only thing that users see from an email. It is, therefore, the most important decision-making basis for whether users open an email or not. Does a personal address via name tokens make sense? How do emojis work? Are Questions Effective? How long should / may a subject line be?
In his role as a content editor at Wishpond at the time, James Scherer asked himself exactly these questions with his team and prepared their experiences in an article: The Highest-Impact Email A / B Tests We’ve Ever Run. The results are sometimes surprising, and I find the following test on upper and lower case particularly exciting! The hypothesis: If we remove capital letters from the subject lines, the email will look more personal and make it look like a real person wrote it and clicked “send” too quickly.
A simple test showed but in the case of Wishpond with an extremely positive effect.Emails were originally very personal. What if we imitate that quality? (Source: Wishpond)
Newsletter tools have long offered such features. And I would always test the subject lines of newsletters. To understand what influences the opening rate in the long term.
Which cover picture works better?
Better in what way? Attract attention? Trigger emotions? Pictures, too, should always have a function, a specific goal.
As you may have seen, I no longer have any cover pictures in my articles. Images for link previews in social media posts. Yes, but no longer as the “standard” at the beginning of an article. How come? Because they simply didn’t add any concrete added value at that point and kept the users away from the actual content (I recall the long-form content tests from above).
In this case, no cover picture means that readers get to the content faster – or see more content at first glance. And can benefit from my articles more quickly.
One possible metric, in this case, would be “Time to Value.” I didn’t test it (unfortunately). But since I haven’t seen a downcast in the numbers since then. I rate this change as a success. * duck and away *
How quickly can users derive added value from our (text) content?
Read more: {Content marketing}
(When) do users need a summary, a table of contents, and/or jump labels?
The idea behind it is: Do such elements help users navigate the content, understand it more easily, and get to the information they are looking for more quickly? Such content elements can increase the usability of your articles. And ensure that users find the instruction they are looking for (easier / faster). WiderFunnel also went in a similar direction with their tests for The Motley Fool. And with the publication of this article, a corresponding test is also running here.
How can I improve the click rate on search results pages?
Crazy that we don’t just talk about rankings, right? Often the “snippet optimization” (keyword: title tag, meta description, and schema markup) is a real quick win to increase the value of existing rankings. Because what we ultimately want – besides pure visibility in search engines – are clicks.
The SearchPilot team (an “SEO A / B testing software”) experimented with this in a local context and integrated the name of a – admittedly very well-known, but anonymized in the example – brand into the page title – with a positive result on the click rate, more precisely 15% uplift.
If your brand strength is influential, putting it prior in the title tag for users to see could influence more of them to click through to the page. Just because your name has more views, we trust this content testing performed as well as it did. Because portraying a powerful brand in the search results guided make these pages more noticeable to the user. Resulting in developed click-through rates.
Rida Abidi, Web Analyst at SearchPilot
More on SearchPilot
Edward and his team also test the CTR in the search results with a view to the effect of emojis in meta titles, the FAQ dropdown scheme, the use of “power words,” the alignment of terms with the Insights from the Search Console and the Use of brackets – all aspects from a HubSpot study on data-driven headline writing.
The SearchPilot example also clarifies how important a good hypothesis is. And how tough it can be to show the results of a test without carrying out additional tests to validate these findings. If we want to be sure, we must never stop testing content.
You can already see that we can add countless questions to this list. That’s what I meant initially. As content marketers, we have so many questions, but we rarely bother to find real (objective) answers. The potential is enormous! Just imagine if you could cumulate the uplifts from all these examples.
Findings from content testing
- Structure your texts so that users can scan them and make them easy to understand. As long as it represents added value, the scope of the content testing does not matter.
- Don’t be afraid to promote your service. Especially in corporate blogs, every reader expects it. The decisive factor is how you pack your “pitch”: native, that is, “editorial-looking,” or visually contrasting “in your face” – it can do both.
- Build new tests on the insights of previous tests. Regardless of the result of a “long-from vs. short-form content” test; This can be the starting point for further tests in the corresponding direction.
- Use the findings from your tests to define quality criteria for future content. And your content design with a view to article templates, for example. It enables you to brief (external) creatives and authors better and achieve better results over time.
My thanks for the many examples and insights go in particular to James Flory, Director Experimentation Strategy at WiderFunnel, who shared his experience from a good five years of experimenting for The Motley Fool. Also to Sandra Wu, formerly Paid Content Marketing Lead at Blinkist and today Director Growth at Himalaya. She was responsible for many of the experiments described in Blinkist Magazine and presented them at CopyCon 2020.
Advertising the way, at growth marketing SUMMIT, Sandra will share further insights from her experiments through which she founded content marketing as an acquisition channel for apps like Blinkist and 8fit.
An outlook: will we see even more content testing in the future?
I think the most necessary learning I’ve found in content testing is that UX / UI almost always comes second to content. Best practices, stylistic design, modern layouts won’t trump content testing. And in many cases, it can detract from the efficacy of good content.
James Flory, Director Experimentation Strategy, WiderFunnel
I agree with James Flory: design doesn’t work without content. But without design, content doesn’t work either. Unfortunately, there are no blueprints for this, and rather the design is an iterative, experimental process. There is nothing wrong with orienting oneself towards design standards. And following conventions, but, as the name suggests, the user experience, decided by the user. And they are different from case to case, so that effective content, designed individually.
So test your content and your design whenever you can.
Comments