Product Perspectives: What We’re Doing About Quality Right Now
As mentioned in our previous blog post on Addressing Quality, there are numerous categories of invalid activities that are happening in the industry today. Each category requires significant attention and investment on our part to keep invalid activity off our platform.
We are committed to those long-term investments in order to execute against our zero tolerance policy, and this post focuses on all the current activities that are taking place today to improve quality on our platform.
Across all the possible categories of invalid activity, there are two main components needed for any effective technical solution:
I. Know the true source of the underlying domain II. Accurately identify invalid activity
I. Using Data To See Through Obfuscated Domains
Finding the true source of the underlying domain is a challenge in digital advertising due to the common usage of ad servers as well as sellers who intentionally obfuscate domains to protect against channel conflict on exchanges. The majority of obfuscation is likely due to those two legitimate reasons, however, in some cases, sellers are using domain obfuscation to hide sites that would normally not be allowed on most exchanges due to their content.
To protect buyers, it is important for our platform to know the true underlying domain name, even when the obfuscation needs to remain in order to protect the seller in a channel conflict situation.
To solve this problem, we are working with various data providers to see where our ads actually show up from real sampled user data. This pursuit has led us to identify situations where obfuscated domains are either already blacklisted or should be blacklisted based on their content.
We blocked 10.3 billion impressions of traffic due to domain misrepresentation this past month alone. Additionally, we are looking at enhancing our ad tags to be able to find the underlying domain a higher percentage of the time when it is being obfuscated.
II. Invalid Traffic Detection:
Getting accurate referring domains is a big step towards making sure ads are showing up where they are supposed to for real users. With improved coverage here, we can also focus on accurately identifying invalid activity.
II.a. Invalid Site Content and Page Quality:
AppNexus has taken a leadership position in fighting against piracy, porn and other very low quality page content like excessive ads on the page. We are continuing to audit for such sites and are also making significant investments for long-term improvements. Here are some key stats on the progress we are making on this front:
We have a team of 12 human auditors dedicated towards looking at quality
25% of sites reviewed are rejected for not meeting our quality standards
Piracy: 47k suspicious sites reviewed in total and 13k sites blacklisted for promoting pirated content
Porn: 490k suspicious sites reviewed in total and we’ve blacklisted 113k sites for adult content
While those stats are impressive, we aim to increase our coverage even more so. We are currently in the final stages of negotiation with an outside vendor to provide additional page and site level categorization capabilities that will help us to:
Rapidly expand the amount of platform-audited URLs
Improve accuracy of categorization
Flag and blacklist piracy, porn, and other low quality sites
Detect non-human traffic and fraudulent URLs
II.b Non-Human Traffic:
Categorizing sites based on content alone is not enough. A legitimate looking site can be creating non-human traffic using bots in order to generate revenue. We’ve increased our engineering efforts using various detection techniques and algorithms to identify the dimensions that create fake users with fake ad impressions and provide these to our inventory audit team to be blacklisted.
Our recent efforts here have detected and blocked 30 billion impressions/month of non-human traffic.
II. c. Hidden Ads and Click Fraud Detection:
In order to find hidden ads and click fraud, we use a combination of automation and manual auditing to look at multiple variables including:
Volume of impressions and clicks, click-through rate, and conversion rate
Alexa rating to compare against the amount of traffic we are seeing
Site content review. Sites committing this activity will often have very little, or very generic content. They often give the initial impression of a legitimate site, but looking closely you can often see patterns or tell the content quality doesn’t fit with the amount of traffic being generated.
Site layout: These sites are often made en masse and tend to have a very poor site layout
Site maintenance: Once one of these sites has been created there is very little maintenance done to the site, because the creators will have often moved onto newer sites
Domain whois: These sites are often, but not always, privately registered in order to obfuscate who is running them.
Create correlation plots to see patterns - Click to IP ratio, click to user, clicks to domains etc.
When we ramped up our detection of this activity, we saw a decreasing number of sites found on the platform each week. While this shows progress against eradicating these problems, we are aware that rogue players are going to continue to try to circumvent detection techniques, and in many instances they may initially prevail. So we need to remain vigilant together as a company, and as an industry.
II. d. Automatic Detection of Undeclared Toolbar Tags:
We have strong policies that inform sellers that they need to declare any tags used in useful toolbars that aren’t malicious in any way. There are very legitimate browser plugins that enable users to secure their web traffic, don't manipulate publisher pages, and are great inventory sources for advertising content. AppNexus has zero tolerance for toolbars that manipulate the page, are not removable or installed without clear consent, or harm “grandma’s computer” in any way. To help augment the self-audit process that occurs for these toolbars, we are also automatically detecting undeclared toolbar tags.
This isn’t always malicious activity and can often be a tagging mistake or oversight, but we’ve detected and flagged 10 billion impressions/month of undeclared toolbar impressions.
IAB Traffic of Good Intent Initiative:
The sites and companies that use deceptive tactics give the online advertising industry a bad reputation. It’s clear that many companies are dealing with these issues, so to help fight this ecosystem problem the IAB created a Traffic of Good Intent (TOGI) task force in which we are a participating member.
Some of the recent activities of the task force, which also includes Google, AOL, OpenX, Magnetic, DoubleVerify and RocketFuel, to provide definitions for activities deemed to not be ‘traffic of good intent’. There has also been cooperative sharing of algorithms and techniques used to identify invalid activity.
We will continue to actively lead in these discussions to bring the industry together to establish a cooperative and self-regulated approach to improving quality within the display advertising ecosystem.
Client Escalation Process Improvements
Our services, product, and engineering teams have been working together in the past couple of weeks to clarify and improve our client escalation process when customers identify potentially invalid activity. This should speed up clarity and resolution of these issues.
Client Advisory Board For Quality
We are creating a Client Advisory Board for customers who want to communicate with our product team on a regular basis about issues relating to quality and jointly work on solutions to the long-term challenge our industry faces. If you are interested in participating in this, please email firstname.lastname@example.org or let your account manager know about your interest in participating.
SOLVING THE PROBLEMS IN THE LONG TERM
These immediate activities are making a large impact in identifying and enforcing our policies against invalid activity. However, this is an arms race that will require long-term investments in order to be truly successful.
In our next post we’ll dive into some of the long-term investments we are making to help improve Internet quality and advance our customers’ best interests by maintaining a quality ecosystem free of malware, invalid activity, piracy, and content that leads to undesirable outcomes.