Amplify Digital | Reporting and Analysis

Overview

There are a number of different comparisons and decisions people want to make when pre-testing ads. Most brands need at least some media support, so ‘using nothing’ is often not an option.

There are two ways to analyze your static ad. You can compare against a norm, which is the default view for any chart, or you can compare to another ad.

Compare to the norm:

Is this ad strong enough for good ROI?
Is the ad good or great?

Compare to other ads:

Which executions or creative routes are strongest?
Is this new ad stronger than the most recent advertising I’ve invested in? Is this new ad stronger than my competitor’s ad?
Which iteration of the ad is strongest? (recommended for meaningful differences between creative, not small iterations).

Creative potential

The Creative Potential Score is a summary measure which looks holistically at the potential of the idea to be translated into a successful finished execution. It summarises whether the idea has potential to:

Reach people by breaking through the clutter and bringing the brand to mind
Resonate to hold attention, be relevant, relatable and make people feel good
Create a response by making people more likely to consider and feel good about the brand.

Learn more about the Creative Potential score and how it’s calculated.

Understanding metrics that matter

Reporting deliverables focus on the “Metrics that matter” to drive short-term “Sales Impact” and long-term “Brand Impact”. Based on performance on these metrics, users are guided to the linked diagnostic sections of the report to better understand how they can improve their ads.

Learn more about Brand and Sales Impact.

Emotional Response

Amplify Digital tracks the respondent's emotional engagement throughout the ad. The intuitive interface allows respondents to record their emotions as the ad plays.

Learn more about Second-By-Second emotion.

Cultural Sensitivity Question

You can choose to add the Cultural Sensitivity question during configuration. This can give you some insight into how consumers view your product or idea, and allows you to catch any issues before you go to market.

Learn more about the Cultural Sensitivity Question and how to interpret the results.

Audience and sample

What is our approach to sample? Who are we interviewing?

We interview a Representative Audience that reflects the real market for your category. Having Profiles (subgroup norm) analysis available means that we are able to check resonance with narrower audiences, but defaulting reporting to category consumers/users ensures that we are both benchmarking consistently, and providing a high threshold for creating great advertising.

While other testing approaches allow a user to configure a sample each time and database everything together, Zappi Amplify applies a standard approach to sample, based upon consistent quotas and broad category relevant sampling. This consistent sample means:

Any category will find its target audience in a dataset and understand a broad consumer reaction.
Any test can be compared, confidently, to any other test at a total population and sub-group level.

How do we ensure the sample composition of each test is comparable?

We apply several different weights to our data to ensure consistency across studies. The weighting reflects the actual makeup of the relevant country so the sample remains consistent.

Sampling:

Data for Amplify Static is collected with sample targets set for age nested within gender, and socio-economic class (SEC).

For age and gender we ensure that within each age group there is a 50/50 M/F split based upon census data.

Sample weighting:

Sample weighting is applied on four axes: Age nested within gender, socio-economic class, brand usage, and category usage.

The targets for category and brand usage are calculated dynamically based on a norm:

Category usage targets are calculated independently and across different customers for each category within a country. A norm is created for each usage frequency response option for all cells in the database.
Brand usage targets are also calculated independently, In this case for each brand within a country/category combination. A norm is then created for each usage frequency response option for all cells in the database.

Our sampling and weighting approach means that every comparable project has the exact same weighted fallout of each of our 5 variables: Age, gender, SEC, Category Usage, Brand Usage.

Norms and interpretation

What is a percentile?

A percentile score is a method of ranking that takes the score for an ad, and reports back where the results for that ad sit within the total distribution of the norm that it is being compared to. For example a score in the 70th percentile suggests that the ad has performed in the top 30% of ad scores.

The norm that is being used is an important part of this calculation, and changing the norm will change the percentile scores. An ad may be in the 50th percentile when looking at all ads in the market but in the 99th percentile when looking at ads for a specific category.

At Zappi, we perform a specific type of norms calculation that smooths out the database and removes any skews. We take the mean and the standard deviation of the norm, and use them to create a normative database that follows a normal distribution. Each ad’s performance is then plotted against this distribution. This is known as a cumulative distribution function.

Norms

Definitions for norms:

Country level - Ads can be compared to the norm for the country in which it was tested.
Language - The language that the fieldwork for the ad was done in. This can be different from the language of the ad itself, since you can test an English ad with Spanish-speaking respondents.  
Parent Category - The vertical or industry the brand is from (ie. fast food company would go in the restaurant parent category). This norm includes the data from all our other Zappi customers in the category.  
Child Category - This norm will encompass all the ads tested within the narrower category. For example, within beverages the children categories would be.

To compare two or more concepts

From the Analysis section, select two or more concepts to compare. You can only compare concepts tested in the same market.

Click ‘view analysis’. Select the Expanded Metrics at a Glance chart.

Charts default to comparing to the norm. To compare the concepts to each other, go to the configuration settings on the right and under ‘Significance Testing’ select ‘Stimuli’

The concept to concept comparison significance test shows differences between all the stimuli on each measure.

Where a measure for a specific concept is significantly above another concept, the color is bolded to draw attention to this strength.
In place of showing the norm under the achieved score, there is a letter which denotes which column this concept is stronger than (for example it says B, it means the concept in this column is stronger than the concept in column B).
For sales/brand impact, you will still see the percentile score for each concept but we are using the absolute sales/brand impact score (from which the percentile is calculated) to inform whether one concept is significantly above another on the summary metric.

Learn more about making decisions with Amplify.

Quick Reports^AI

Amplify TV supports Quick Reports^AI. Automated reporting is generated upon the completion of a survey which provides a concise summary of results and reasons. It’s produced by combining human expertise in response interpretation and diagnosing consumer responses from your survey with AI.

Select one or more projects, then click ‘Generate Report’ to get a clean, editable analysis of your data. Learn more about Quick Reports^AI.

Important Note

The Quick Report^AIis a snapshot using the norm available when your survey was completed. If you re-analyze your data later, numbers may differ because you selected a different norm or the original norm has been updated with new survey data.

Profiles

Profiles allow you to filter your survey results by pre-grouping segments into a single click. For example, a Profile might be “All adults under 55”.

When you select a Profile, it filters the entire dataset using the chosen demographics. So if you choose a profile for males, Reporting shows a norm based on all male respondents.

Profiles and the norms they reference are tied to the category. A profile created for "Beer" could not be applied to "Spirits". If you want to analyze multiple categories, you will need to test consistently in each category to create a separate profile for each.

Learn more about Profiles.

Cultural Sensitivity Question

Learn more about the Cultural Sensitivity Question and how to interpret the results.

FAQ

Why doesn't the Viewer Retention chart start at 100%? Respondents are prompted to scroll naturally on Feed platforms or choose a video on YouTube, so a small subset may not see the ads. We avoid forced exposure to maintain a natural, in-context experience.

Why use % watched 6s (Facebook), 5s (Instagram/TikTok), and 9s (YouTube) instead of Full Ad Watched? Research-on-research identified these thresholds as most predictive of performance on each platform, with good score distribution. They also align with YouTube's skip time and typical ad lengths per platform.

What is Unaided Brand Recall based on, and how do I configure brands accurately? Scores are based on auto-coding for Parent and Sub brand. To improve accuracy, submit alternative brand names and slang terms — the system accounts for fuzzy matching (e.g. minor spelling errors). Note on special cases:

Capital letters (Don Simón vs don simon): no need to add — except for Cyrillic
Accents (Don Simón vs Don Simon): add these, as accented characters are treated differently
Spaces (Don Simon vs DonSimon): no need to add

Can I see misattribution, not just my brand's recall? Not directly in the platform. You can export raw data from the Export tab and manually code it, but there's no quick built-in way to check misattribution.

Should respondents who don't recall any brand be removed? No, a blank brand recall response isn't a red flag; the ad may simply not have been memorable. These respondents still view the full ad before completing the rest of the survey.

Why has my Sales and Brand Impact percentile or colour coding changed? Absolute scores don't change, but percentiles and colour codes shift as the database updates monthly. If strong ads are added, your ad may rank lower as a result. To get consistent comparisons, always select the same norm date (1st of the month at time of testing) and the same norm scope.

How is the Key Message norm calculated? It's the average score across all key messages in all projects, regardless of the specific message or how many were included (1–4). Since advertisers only test messages they intend to communicate, the norm treats all key messages equally.

What is a Profile and how does it differ from a Filter? Both let you view data for a subgroup, but they use different norms:

Filter: Compares the subgroup against the total sample norm
Profile: Compares the subgroup against a norm built from that same subgroup only — more meaningful, as it accounts for groups that may consistently skew positive or negative

There are two profile types:

Global Profiles: based on questions asked for all ads (e.g. age, gender); norm draws from all available data
Custom Profiles: based on custom questions; norm draws only from that client's projects

A study contributes to a profile if it has at least 30 matching respondents.

Why is Emotional Intensity not included in Amplify Digital scoring? Digital measurement focuses on overall emotional response rather than moment-by-moment intensity, as what matters is how the ad makes viewers feel overall — not emotional peaks during specific moments.

Why is overall emotion more important for digital than TV?

Viewing context: TV is a lean-back, high-attention environment where sustained emotional connection matters. Digital ads appear in cluttered, scrollable feeds where viewers may only catch a few seconds — so immediate, overall positive emotion is what counts.

Ad length: TV ads (15–60s) allow time to build an emotional narrative. Digital ads (6–15s) are shorter and often non-linear, requiring immediate impact rather than emotional build-up.

Amplify Digital | Reporting and Analysis

Overview

Creative potential

Understanding metrics that matter

Emotional Response

Cultural Sensitivity Question

Audience and sample

Sampling:

Sample weighting:

Norms and interpretation

Norms

To compare two or more concepts

Quick ReportsAI

Profiles

Cultural Sensitivity Question

FAQ

Quick Reports^AI