Tech

To show how easy it is for plagiarized news sites to get ad revenue, I made my own

Key Points
  • In recent years, it's become common for fraudsters to make ad-supported "news" sites with content scraped from legitimate publishers. 
  • After realizing how common this is, I made my own site with content from CNBC to see if it would be approved by ad tech partners. 
  • Within days, I had the ability to monetize my site with legitimate advertisers.
The homepage of the "Tribune Times Today." All the news that's fit to copy.
Megan Graham

Last month, a story I'd written had just gone live. I punched a few keywords into Google search to pull it up so I could grab the link.

That was when I noticed a publication called the "New York Times Post" had also just published a story with the exact same headline. 

When I clicked the link, I noticed that it was my story in its entirety. And it had ads all over it. 

This website, the "New York Times Post," was running ads on a story they stole from CNBC.

These phony "news" sites with realistic names and stolen stories aren't new — they've been ripping off publishers and taking advertiser dollars for years.

But as the pandemic hits the publishing industry and news sites like Conde NastVice and Vox cut pay and lay off more employees, the issue feels more pressing than ever. 

Many advertisers don't want to advertise on publishers' coronavirus stories out of fear they'll face negative brand connotation for being alongside that content. Yet, through the muddy supply chain of digital media, many are ending up on that content anyway. Only here, it's stolen.

A two-year study by the Incorporated Society of British Advertisers and PwC articulated with new clarity how the digital media ecosystem hemorrhages cash on its way to publishers. It tracked 15 UK advertisers, including Disney and Unilever, and found that half a brand's digital marketing spend is absorbed by middlemen before reaching a publisher. Worse, it found that about one-third of the supply chain fees advertisers pay cannot be traced, meaning that it's impossible for advertisers to know exactly where their money is going.

It all underscores the fact that the ad tech space is so convoluted, it's easy to make money from legitimate advertisers just by setting up a web page. That means there's significant incentive to create sites with not just with low-quality clickbait or A.I.-generated nonsense, but sites filled with outright plagiarized content.

I was curious how bad the problem was. So I did an experiment to see if I could make a site using stories from CNBC and get ad tech partners to agree to show ads on it. 

It was shockingly easy.

Setting up a website 

I'm by no means a coding whiz, but this part was straightforward.

I bought a domain through GoDaddy and set up a managed Wordpress site, then set up an SSL certificate so I would have a secure website, which would prevent the site from triggering security warnings on browsers like Google's Chrome. I downloaded a theme that made my site look somewhat like a news website, made a favicon (the little image that shows up in Google search and in your browser tab) and gave myself a name: The "Tribune Times Today."

The homepage of the "Tribune Times Today." All the news that's fit to copy.
Megan Graham

To populate my site with content, I first copied and pasted text from CNBC stories manually. Then I learned how to speed the process with scrapers — simple software plug-ins you can download on Wordpress and can scrape stories using RSS feeds or individual links. A lot of fraudulent news sites will also scrape images from stories, but I avoided that for legal reasons. Instead, I stuck with stock images I was allowed to use on the site, or my own images from industry events I had saved on my phone. 

I spent a couple hours on a Sunday afternoon tweaking the site, setting up fancy-looking widgets to show my "top stories" or a carousel display of stories and pulling stories until I had more than 50 posts.

Then I was ready to find some advertisers.

Finding advertisers 

Websites often work with ad tech partners to get ads placed on their site.

To start, publishers usually go through a fairly simple process of sharing their website URL, contact info and sometimes traffic figures or revenue. From there, the company will often give the publisher a piece of code, which the publisher sticks on their web site. This lets the ad partner make sure the person trying to sell ads actually has access to the site, and isn't trying to sell ads on a site that isn't theirs.

I applied to nearly three dozen of these companies, and some approved me right away. These firms mostly sold "popunder" ads, which pop up a new link in a browser tab when you click something. They're one of the worst forms of online advertising, not to mention annoying and intrusive for the user. 

Others seemed eager to work with me but wanted to see how much traffic I had, or said I didn't have enough traffic or existing revenue to meet their thresholds. Some said I didn't meet their requirements for content. Conversant, for instance, didn't approve me because I applied using my Gmail address and because I didn't have enough traffic.

Ad tech partners Media.net and Infolinks took a bit longer to approve me, but they both did.

My denial email from Sovrn.
Megan Graham

One firm, Sovrn, initially declined the site because it didn't meet its standard for original content. But within 24 hours they sent another email saying I was approved.

My approval email from Sovrn.
Megan Graham

Google took days to give me an answer, but eventually answered that since I had "scraped content" on my site, I wasn't eligible for Adsense.

I asked the three companies that approved me how they vet sites. 

Sovrn said it is "the first, and remains one of the few exchanges to achieve a TAG Platinum certification," and says its site approval process is "stricter than most." The company said each site that applies to its platform goes through a four-step review process involving "proprietary checks, third-party tools such as IAS and buyer-level settings and filters." 

Despite that, the Tribune Times Today — populated entirely with "stolen" news articles — got through those steps. 

"Even with what we believe is the strongest site approval process in the industry, it is still possible that some bad actors can slip through," Sovrn acknowledged. "That's why we continuously monitor our exchange, and perform weekly audits—and removals—of sites that violate our controls."

Infolinks CEO Bob Regular said a domain goes through human review to ensure some basic criteria, like making sure it's not violent, pornographic, dangerous or pertaining to other explicit adult content. If it passes that level, there's an automated process that submits the site to other advertising companies to see if they want to advertise on my site, and it's up to them to approve it one by one. He said the company also submits each publisher to third-party fraud providers for review. 

Media.net said its compliance team immediately assess sites for "clear terms of services violations" like pornography, hate speech or violence and that they're instantly blocked from its network. If not, sites can "go live on a provisional basis." That's when the company typically discovers less obvious violations, including copyright infringement, and flags bad actors. It said this typically occurs between 30-60 days. 

The company said it doesn't immediately ban bad actors because it found that they simply try to get around it by submitting a ton of slightly different sites that also violate Media.net's terms of service. By letting sites slip by at first, then banning them before they get a payout, Media.net "disincentivizes bad actors from reattempting to join our network."

But these three media partners aren't the end of it. They work with other partners as "resellers." 

By looking at some technical information the partners sent me to add to my "ads.txt" file, I saw I was authorizing the ad space on Tribune Times Today to be sold by not just the three ad tech companies who approved me but also its partners, such as AppNexus, GumGum, OpenX, Rubicon Project and Google. That doesn't mean they had approved the site; They would have had to approve the domain based on their own criteria, and I didn't run the experiment long enough to see if they would do so.

Rubicon Project, for instance, said once a partner had approved me, that partner would send domains to Rubicon, which would then take a number of steps, including looking at industry associations like TAG to see if there had been reported plagiarism on the site, working with anti-fraud partners to make sure it's not fraudulent or spot-checking inventory itself.

Google said that just because a particular exchange works with Google in general does not mean they will send ad requests for every single publisher that is on their platform as a reseller; it also said that just because it's listed on ads.txt doesn't mean it's monetizing a certain site. (Google said it had no evidence of any ads placed via our platforms on the website created, including through AdManager, and I didn't see any Google ads when I briefly switched on advertising). 

"We have strict policies that prohibit bad actors from monetizing content that is stolen from other sites," a Google spokeswoman said. "Our ad tech partners must abide by these policies as well. Our enforcement systems and teams work to detect and block these illicit web pages before they can sell ad space. If we find a site or partner violates our policies, we take immediate action."

But I'd slipped through the cracks once, and I wondered which cracks I'd slip through again when it came to the resellers verifying the "Tribune Times Today" domain. 

The Tribune Times Today's ads.txt file.
Megan Graham

I didn't want to be taking ad revenue from legitimate advertisers, so I only briefly activated advertisements from the partners to see what surfaced and to take a few screenshots. I saw ads come through in for companies including Kohl's, Wayfair, Overstock and Chewy.

In a statement, Overstock said that as an advertiser it is negatively impacted by this fraud and does "everything in [its] power to prevent it."  

"To combat these kinds of fraudulent efforts, we partner with reputable ad-tech providers and are constantly auditing our ad placements to ensure they are appearing on legitimate sites," Overstock said. "However, even with those precautions, a fraudster occasionally slips through the cracks. In the rare event that this happens, we work with our partners to swiftly investigate and resolve the incident."

The other companies declined to comment or didn't respond to a request for comment. 

Chewy ad on "Tribune Times Today."
Megan Graham

If I were a bad guy...

I only put a few hours of work into this site, but I don't do this for a living. 

Real bad actors can get a lot farther than this with only a little more work. For instance, they can set up a site with actual original content, get approved, and only then start scraping content. Or, they can easily buy an existing website that's already monetizing with adtech partners, and just flood it with plagiarized content. They can buy fake traffic to conduct traffic arbitrage, a fancy way of saying that they pay less for traffic than they gain from the ad impressions. They can set up more automated means to keep scraping huge amounts of automated content to keep the website looking fresh.

Joshua Lowcock, who's global brand safety officer at UM, a media agency that's part of Interpublic, said he's run a similar experiment and found that a number of ad tech partners were similarly lax about their approvals.             

Like me, he didn't make too much of an effort to appear super sophisticated. 

"We weren't acting like a motivated bad actor," he said. "If anyone had done basic due diligence, we would have been caught out."

He added that sites can act as legitimate news publishers for months, gain social media followers then start publishing completely fake stories. 

Andreas Ramos, who teaches digital marketing at INSEEC and California Science and Technology University, says the size of fraud like this is "staggering."

He said some scammers set up tens of thousands of websites at once with a few keystrokes. 

"It's a money machine," he said.

It's easy to find examples. One afternoon, I spent a few minutes trying to find other sites that had copied CNBC stories in full without credit. In a matter of minutes I found "The Washington Time," "FR24News.com," "Bioreports.net," "AfricaZilla," and "USA News Hub." All of them were showing ads through various partners, including Google and Criteo. There are so many more of these sites that I don't have enough time in the day to report them, as much as I would like to. 

Criteo, which had also been showing ads on the "New York Times Post" (my very first example) said it had seen my tweets about the site and discovered the inventory had come through another platform, and requested those sites be added to a blacklist. 

"We constantly monitor our supply network to prevent such infractions as the ones found by you. In the event we find a partner is not adhering to our policies, we will terminate the relationship immediately," said a company spokesperson.

As of Thursday, FR24News.com, USA News Hub and AfricaZilla were each showing Google ads on stolen content from CNBC and other publications, and each listed Google as a direct seller on their ads.txt files.

Google said Friday it had demonetized USA News Hub and said it's investigating the other two sites.

Google recently announced it would be requiring all advertisers to go through an identity verification process to ensure they are who they say they are. Some argue they should be doing the same for publishers.

"There should be that same requirement on the publisher side, a proof of identity and demonstration that you're legitimate," Lowcock said. "And then up and down the digital supply chain, people should do spot checks to make sure that work is being done." 

Google ads appear on a FR24News story copied from CNBC.
Megan Graham

Bob Hoffman, a former advertising executive who has written numerous books about the industry, said the advertising ecosystem has never been 100% pure, but what's being seen now is a new level. 

"The extent to which it's happening now is way beyond anything I think we've ever seen before, where tens of billions of dollars are being stolen from advertisers," he said. "If you're a crook, this is like Christmas Day. And there are no consequences... If somebody finds you out, so what? You put up another phony site, or you put up a thousand other phony sites. The so-called ad tech fraud detection systems seem to be extremely ineffective." 

He said one solution is for advertisers buying directly from publishers. 

"So much of the fraud would disappear," he said. "All the middlemen would evaporate. Yes, you'd pay a little more, but you'd know what you're getting, if you bought directly from quality publishers." 

VIDEO9:0309:03
What Google's earnings reveal about the advertising industry during coronavirus