Zappi Amplify Digital
Solution Summary
Zappi Amplify Digital, part of the Zappi Amplify Ad System, predicts how well your ads will deliver ROI, both via short-term sales and long-term brand equity as well as providing deep diagnostics to understand the ‘why’ and how to optimize. Zappi has taken the very best of existing legacy approaches and worked with leading global brands to develop a game-changing insight tool for developing advertising.
Underpinned by the Zappi platform, the insights can be used time and time again to learn what works and doesn’t for your brand. Insights are framed and communicated in a way that ensures they are easily understood by stakeholders at all levels of the business.
Solution Basics
Consumer engagement is captured & diagnosed across a range of approaches including system 1 measurement, direct questioning and behavioral measurement and insights delivered in an intuitive and interactive platform that allows you to get a quick view of success & potential or dive deeper on the fly.
In-Context Availability: Amplify Digital is currently available for in-context environments for Facebook feed, YouTube video, as well as TikTok and Instagram Feed.
Stimuli: 1-15 video ads per test
Evaluation: Monadic
Sample default: YouTube and Facebook 400 category consumers (with the option to set the sample size as anything between 200-800 respondents). TikTok and Instagram Feed 200 category consumers (with the option to set the sample size as anything between 200 and 400 respondents). We recommend the lower sample size for TikTok and Instagram Feed as we are looking for a younger audience in line with use of the platforms and therefore narrowing our audience yet further. Increasing sample size may result in issues with completing the project.
Norms: Yes; market-wide norms are activated when total stimuli researched reaches N=20 and your own customer norm per market activates when you've researched 20 stimuli in each market. Norms are not shared across all the platforms. You will have Facebook norms for Facebook ads, YouTube norms for YouTube ads and then TikTok and Instagram with their matching younger audiences feed share one set of norms.
Getting Started with Amplify Digital
- Understanding key measures
- Research process
- Configuration checklist & guide
- Best practice recommendations
- The in-context environment
- Audience & Sample
- In-context demo video
Research Analysis | Understanding & Interpreting Results
- Understanding the metrics that matter
- Norms and interpretation
- Interpreting data and taking decisions
- Interpreting cultural sensitivity
- Using AI Quick Reports (Zappi’s automated, AI-generated reports)
Key Measures
Note: this includes both current and legacy solutions, starting with the former (Amplify 1.0). To jump to the most up-to-date version, click here.
Amplify 1.0: the 5 Rs framework
The Zappi Amplify framework is comprised of measures grouped into five key areas for ad effectiveness:
- Reach: cut through the clutter, link to the brand, and build distinct memory structures
- Resonance: engage and trigger an emotional response, and communicate key messages and category drivers
- Response: increase brand purchase and appeal
- Risk: avoid insensitivity and damaging virality
- Return: deliver sales uplift and build brand equity
The reporting output focuses on a blend of behavioral and survey data, “Metrics that matter”, which ladder into two important KPIs:
- Creative Sales Impact (CSI): how likely is this ad to deliver ROI via short-term sales uplift?
- Creative Brand Impact (CBI): how likely is this ad to deliver ROI via long-term brand equity building?
Amplify 2.0: the 3 Rs framework
The Zappi Amplify framework is comprised of measures grouped into three key areas for ad effectiveness:
- Reach: cut through the clutter, link to the brand, and build distinct memory structures
- Resonance: engage and trigger an emotional response, and communicate key messages and category drivers (includes metrics from the previous Risk chapter)
- Response: increase brand purchase, brand appeal and increase the brand’s association to category entry points
The Creative effectiveness summary chapter focuses on a blend of behavioral and survey data, which ladder into two comprehensive indicators of likely in market success:
- Sales Impact Score: The sales impact score measures the potential of the creative to drive short term sales.
- Brand Impact Score: The brand impact score measures the potential of the creative to build the brand and drive sales into the future.
How are Sales and Brand Impact scores calculated?
Sales and Brand Impact are composite scores that measure the potential of the creative to drive short term sales (sales impact) and longer term brand equity (brand impact). They are available as absolute scores which can be sig tested against the norm (using cross tabs) and are displayed in platform as a percentile scores calculated using the absolute score and the norms scope you have selected in the platform.
Sales Impact includes the following measures:
Research process
- Upload between 1-15 video ads per project
- A default of 400 category-consumer respondents are exposed to each ad within the the YouTube/Facebook in-context environment and 200 within the TikTok/Instagram Feed environment, with the flexibility to increase or decrease the sample size to between 200-800 respondents (400 max for TikTok and Instagram Feed) depending on the specification of the project.
- Results are provided in the context of the available norm (e.g. country-level, social media platform level with Instagram Feed and TikTok sharing a norm)
Configuration checklist & guide
- Videos: you can upload up to 15 videos in order, within the same country and category, each of which will generate a separate survey.
- Facebook feed guidelines (Source: Meta)
- Youtube skippable ad guidelines (Source: Google Support)
- Instagram feed guidelines (Source: Meta)
- TikTok guidelines (Source: TikTok)
- Ensure you add the following information about your stimuli:
- Ad name
- Platform choice
- For Facebook, Instagram Feed and TikTok only:
- Brand name to be used for in-context social media post
- Brand logo for in-context social media post
- Accompanying text for social media post (description that is shown alongside the advert in the in-context social media environment)
- For TikTok only:
- Song title to be used for in-context social media post. This is an optional field and can be added if your ad includes music
- For Facebook, Instagram Feed and TikTok only:
- Brand competitive set images
- Brand information
- Target audience profile
- 2-20 brand/category attributes that reflect category drivers (category entry points)
- 255 character limit per attribute
- 1-4 key messages for your stimuli
- 255 character limit per message
- Presence of music and celebrities
- Select a thumbnail
- Tags
- Tagging your stimuli allows you to categorize your content efficiently, as well as unlocking additional analytic capabilities
- Standardized tags - taxonomy defined by your organization
- Custom and smart tags
Step-by-Step Configuration Process Guide
Best Practice Recommendations
Researching animatics
We do not recommend researching animatic digital ads with Amplify Digital. Since it is a simulated in-context environment, animatic ads are likely to stand out or look odd enough to the respondent to impact your results in unpredictable ways. Instead we recommend researching your digital animatics out-of-context using the Amplify Storyboard (if you have earlier stage, static stimulus) or Boardomatic tools, iterating until your story is strengthened and then researching finished cut-quality digital ads in Amplify Digital.
Researching short ads
One of our measures of 'reach' looks at the % of people who watch a certain amount of the ad in context: 9 seconds for a YouTube ad, 6 seconds for a Facebook ad, 5 seconds for a TikTok or Instagram Feed ad. This measure usually accounts for 5% of the Sales impact score. Where your ad is below the specified length for each platform, we would have 0% people watching 9/6/5 seconds of the ad as it doesn't run for that long. This would be unhelpful as a data point, and misleading on ad effectiveness within the Sales Impact calculation. So when you research ads which are shorter than the 6/9 seconds, we ignore this measure and calculate the Sales Impact score using just the remaining measures.
The in-context environment | Understand the basics & FAQs
What is the in-context environment?
The environment is a mocked up social media platform which looks and behaves like the real social media platform. The ad is placed in this context and we are able to monitor respondent’s behavioural engagement with the ad.
What platforms and formats do we support?
We currently support Facebook feed video ads, YouTube pre-roll ads, TikTok and Instagram Feed video ads.
What is the survey experience for the respondents like?
We tell respondents that they will be redirected to the respective social media platforms and ask them to either scroll/browse as they normally would (Facebook/Instagram Feed/TikTok) or choose a video to watch (YouTube). Importantly, we do not tell them that they will be exposed to an ad, nor do we tell them to look out for one.
How long are they on each platform?
Respondents are in Facebook, TikTok and Instagram Feed for 60 seconds. For YouTube, they will be on the platform for 20 seconds + length of the ad, up to a maximum of 3 mins. However, best practice is to ensure YT ads aren’t longer than 90s.
What is the content on the platform, who manages it and how often does it get changed?
We have tech partners who own the in-context technology, they are responsible for curating content which is market specific. They refresh the content every 1-2 months to ensure it’s as up to date as possible.
What do we ask after the in-context exposure?
Following the in-context exposure we ask 3 questions: unaided brand recall, post-purchase uplift, and unaided key message recall (when we reveal the brand the survey is relating to).
What if someone scrolls away or skips?
If someone scrolls away or skips the ad they are still asked the memorability questions and then played the ad in full again in a ‘forced exposure’ to get more qualitative/system 2 responses to the ad.
Looking at the Viewer Retention chart on the platform, the percentage doesn’t start at 100%, how can this be?
While we prompt respondents to scroll on Facebook/Instagram Feed/TikTok and ask them to choose a video on YouTube, not all respondents follow these instructions and therefore it is possible for a small subset of respondents to not see the ads. We wouldn’t want to force expose to the ads as this would defeat the point of a “natural” in-context exposure.
Why do we use % Watched 6 seconds (Facebook), 5 seconds (Instagram Feed/TikTok) and 9 seconds (YouTube) respectively instead of Watched Full Ad?
We conducted some research-on-research to look at different thresholds for time spent watching an ad for each platform and found that 5, 6 and 9 seconds were the most predictive of performance in each platform, with a good distribution and deviation in scores across ads. These lengths additionally reflect the skip time on YouTube and the length of ads which tend to be used in each platform (a longer length on e.g. Instagram Feed/TikTok would mean we couldn’t use this measure for many of the ads)
What is the Unaided Brand Recall score based on? And how do I configure brands and sub brands to ensure my unaided brand recall score is as accurate as possible?
The percentage for Unaided Brand Recall is based on auto-coding for Parent brand and Sub brand. This means that some ads will only have a Parent brand present but others will have their score based on their Sub brand too.
- For auto-coding to pick up your brand as accurately as possible, be sure to also submit alternative brand names and slang terms. The auto-coding accounts for fuzzy matching; e.g. simple spelling errors will be picked up.
A few notes on spelling:
- CAPITAL LETTERS: Don Simón vs don Simon = NO NEED TO ADD. With the exception for Cyrillic language!
- ACCENT: Don simón vs Don simon = NEED TO ADD (ó is a different character to o from a computers perspective)
- SPACE: Don Simon vs DonSimon = NO NEED TO ADD
Brand recall only shows coded brand recall for my tested brand, but can I see misattribution?
While on the unaided brand recall chart, you can export the raw data from the export tab and code this up manually to get a sense, however there is no current quick way to check this.
Should the respondents that don’t recall any of the brands be removed?
If a respondent doesn’t submit an answer for brand recall, this in itself shouldn’t have to be a red flag. It may be that none of the ads were memorable. They are then exposed to the ad in full before completing the rest of the survey.
Audience and sample
What is our approach to sample? Who are we interviewing?
We interview a broad category relevant sample (400 respondents for You Tube and Facebook and 200 for Tik Tok and Instagram) driven by the conclusions of Byron Sharp in How brands grow (that we should continuously reach all buyers of the category rather than just drive frequency among a small group of consumers). Having Profiles (subgroup norm) analysis available to us means that we are able to check resonance with narrower audiences, but defaulting reporting to category consumers/users ensures that we are both benchmarking consistently, and providing a high threshold for creating great advertising.
While other testing approaches allow a user to configure a bespoke sample each time and database everything together, Zappi Amplify Digital applies a standard approach to sample, based upon consistent quotas and broad category relevant sampling. This consistent sample means:
- Any category will find both its target audience in a dataset but also be able to understand a broad consumer reaction
- Any test can be compared, confidently, to any other test at both a total population and sub-group level.
How do we ensure that the sample composition of each test is comparable?
We apply several different weights to our data to ensure consistency across studies. These weights were defined as a result of the analysis we did during the development of Amplify to ensure consistency in data:
Sampling:
Data for Amplify Digital is collected with sample targets set for age nested within gender, and SEC.
For age and gender we ensure that within each age grouping there is a 50/50 M/F split) based upon census data. Note for TikTok and Instagram we recruit a younger audience in line with usage of the platforms (18-24: 36%. 25-34: 33%. 35+: 31%)
Weighting:
Weighting is applied on four axis: Age nested within gender, SEC, brand usage, and category usage.
The targets for category and brand usage are calculated dynamically based on a norm:
- Category usage targets are calculated independently and across different customers for each category within a country. A norm is created for each usage frequency response option for all cells in the database.
- Brand usage targets are also calculated independently, In this case for each brand within a country > category combination. A norm is then created for each usage frequency response option for all cells in the database.
Our sampling and weighting approach means that every comparable project has the exact same weighted fall out of each of our 5 variables: Age, gender, SEC, Category Usage, Brand Usage
Understanding metrics that matter
Reporting deliverables focus on the “Metrics that matter” to drive short-term “Sales Impact” and long-term “Brand Impact”. Based on performance among these metrics, users are guided to the linked diagnostic sections of the report to better understand how they can improve their ads.
Norms and interpretation
What is a percentile?
A percentile score is a method of ranking that takes the score for an ad, and reports back where the results for that ad sit within the total distribution of the norm that it is being compared to. For example a score in the 70th percentile suggests that the ad has performed in the top 30% of ad scores.
The norm that is being used is an important part of this calculation, and changing the norm will change the percentile scores. An ad may be in the 50th percentile when looked at all ads in the market but in the 99th percentile when looking at ads for a specific category.
At Zappi we perform a specific type of norms calculation that smooths out the database and removes any skews. This takes the mean and the standard deviation of the norm that is being used, and uses them to create a normative database that follows a normal distribution. Each ad’s performance is then plotted against this distribution. This is known as a cumulative distribution function.
Norms
Definitions for norms:
- Country level - Ads can be compared to the norm for the country in which it was tested
- Social media channel - the social media channel used for the ad’s in-context placement. You Tube and facebook have separate norms. Instagram and Tik Tok share a norm.
- Language - The specific language that the fieldwork for the ad was done in.
- Parent Category - The vertical or industry the brand is from (ie. fast food company would go in the restaurant parent category. Another parent would be ‘beverages’ or financial services). This norm includes the data from all our other Zappi customers in the category.
- Child Category - this norm will encompass all the ads tested within the narrower category. For example, within beverages the children categories would be CSDs, sports drinks, bottled water etc. Within financial services, the children would be credit cards, insurance, personal banking etc. The category will have all the ads for both the client and competitors tested within the category.
- Brand - This will use only the client’s own ads as the normative comparison for the norm.
- User Defined norms - You are able to create a number of custom norms based on a number of criteria or on tags you have applied to your ads in the platform
You can choose to include only ads in your domain in the norm rather than across customers. This is set up once and becomes your default.
FAQs
Why has my Sales and brand impact percentile or color coding changed?
While your absolute score for sales and brand impact won’t change, how these scores compare to the database will change as the database changes. The database is dynamic in nature and monthly norms updates will result in changes. If, for example, lots of great ads are researched and added to the database, your ad may achieve a lower percentile (or different colour code) as a result. In order to see the same percentile score/colour code you need to ensure you have selected:
- Norm from the time of testing (ist of the month)
- The same norms scope each time
Key messages can be unique per project. How is the norm calculated?
The key message norm is the average across key messages asked across all projects.
I.e. average (Study1(KM1), Study2(KM1) ....) . The KM norm does not take into account the specific key message nor whether it was 1,2,3 or 4 because advertisers only include messages they are aiming to communicate (and hence sometimes include only 1 message, sometimes the full 4).
What is a profile and how does it differ from a filter?
Profiles represent the different datacuts that you use to look at your data (for example: men only. 18-34s only). Each profile that is created leverages all available data to create a CUSTOM PROFILE NORM for that Profile. A study contributes to a profile if it has a minimum of 30 respondents that match the description. There are two different types of Profiles that are available
- Universal Profiles - these are profiles based on questions that are asked about ALL ads such as age and gender. This profile uses all the available data of all ads tested to generate a norm for that subsegment of the data.
- Client Specific Profiles - these are profiles based on questions that are asked about only the client’s ads and in these cases the norm is only based on this group of people within the client’s own projects.
When you filter data and look at a norm, you are comparing the sub group in the filter to a total sample norm.
When you use profiles and look at a norm, you are comparing the sub group in the profile to a norm made up of only that sub group making it more meaningful as it accounts for the fact that a specific sub group may always be more positive or negative in their responses.
Interpreting data and taking decisions
There are a number of different comparisons and decisions people want to make when pre-testing ads. Most brands need at least some media support, so ‘using nothing’ is often not an option. Therefore, common decisions people need to make are:
- Is this ad/are any of my ads strong enough for good ROI?
- Is/are the ad(s) good or great?
- Which execution(s)/creative route(s) is/are strongest?
- Is this new ad stronger than the most recent advertising I’ve invested in? Is this new ad stronger than my competitor’s ad?
- Which iteration of the ad is strongest? (recommended for meaningful differences between creative, not small iterations)
Within the platform there are two different analysis routes that enable you to do all of the above and there is a simple toggle available on the right hand side of the page called ‘significance testing’ so you can switch between them. You should toggle to ‘Norm’ for ‘Questions 1 and 2 (comparing to a norm). You should toggle to ‘stimuli’ to sig test between the chosen ads and answer questions 3-5.
How does the ad to ad comparison work?
The ad to ad comparison sig tests between all the stimuli on each measure.
- Where a measure for a specific ad is significantly above another ad, the colour will be bolded to draw attention to this strength
- And then in place of showing the norm under the achieved score, there is a letter which denotes which column (ad) this ad is stronger than (is for example it says B, it means the ad in this column is stronger than the ad in column B).
- For sales/brand impact, you will still see the percentile score for each ad but we are using the absolute sales/brand impact score (from which the percentile is calculated) to inform whether one ad is significantly above another on the summary metric
To learn more about taking decisions with Amplify, read here.
Interpreting cultural sensitivity
Why measure it
It's crucial to approach the interpretation of sensitive or potentially offensive content with empathy and cultural awareness. This type of feedback is largely subjective, but by actively engaging with the feedback and striving to stay attuned to cultural nuances, you can refine your advertising to better align with the values and expectations of your audience, thus building trust and equity for your brand. Read more in our full best practice advice.
AI-Generated Reports
AI Quick Report: Automated reporting is available upon the completion of a survey which provides a concise summary of results and diagnosis of why. It is produced by combining human expertise in how to interpret and diagnose consumer responses from your survey with AI and provides a great start point for your analysis and recommendations.
For Zappi Customers Only
For additional information please reach out to Zappi Support or your CSM for access.