Armies of bots are still gathering social media info even as platforms tighten privacy rules

Key Points
  • To get around Instagram limiting developer data collection, marketing and media companies are using automated "bots" to collect public profile data and create their own social user databases.
  • The method, which is completely legal, shows that even if platforms block data collection it will be difficult to prevent access to user information.
Kevin Systrom, co‑founder of Instagram
Emmanuel Dunand | Getty Images

Facebook-owned Instagram is limiting the amount of data that developers can pull on users, but some businesses have found a simple workaround: Get a bot army to "read" public social media data just as any human would do.

Facebook and other social media companies are cracking down on how data can be used and shared after revelations that data analytics firm Cambridge Analytica used improperly acquired information about Facebook users to target political ads, and ahead of a new European law called the General Data Protection Regulation.

So some marketing and media companies are increasingly relying on automated programs, known as bots or spiders, to get information they want about users without explicit permission. They don't need to consult the users, who posted the info publicly, or the platforms where the information is posted.

It's easy. It's perfectly legal. And although the platforms try to discourage it, it's hard to stop.

The information collected by these bots is often not as detailed or as useful as the information that could be obtained directly from the platforms. Companies typically use it for specific and limited purposes, like to find new talent to work with or track the effectiveness of advertising campaigns. But with the right analysis and tools, it could be used for targeting ads.

Bypassing Instagram's crackdown

Instagram recently throttled the rate limit for its Platform API, meaning developers couldn't pull as much user information at the same speeds as they used to.

Companies could use this API to gather basic stats that anybody could read from an Instagram profile, like comments, likes, followers, and who is tagged in a photo — but the data is collected at a much faster rate. They could also get information on when and where the photos were taken. This API also let accounts connect to third-party services, which let businesses do various things including streaming Instagram posts at events or printing Instagram photos on products.

Things changed around the beginning of April. According to Recode, developers could previously make 5,000 calls per hour, but it went down to 200 on April 2. Some developers told Recode they couldn't access data at all. An advertising executive also told CNBC of the changes, and Instagram confirmed to CNBC it did lower rate limits on its older API platform as part of a switch to a new API platform.

In response, several companies said they have increased their reliance on bots to track the effectiveness of branded content campaigns and to find other business opportunities.

One media company now uses a browser extension it created and installed on its employees' computers to identify rising stars on social media that it might want to work with and track metrics. This program powers a bot-led search through social media profiles. It then records likes, comments, followers and other publicly available information in an internal database. It's used this tactic on Instagram and YouTube.

One marketing firm said it uses bots to skim social media platforms for public information to identify new trends and influencers, and also works with third-party companies that use the same tactics. It uses this information alongside data from users it works with who give the firm direct permission to mine their data.

Another marketing firm explained that the new Graph API will only share data from official Instagram businesses accounts, and 35 percent of the accounts it works with don't have that designation yet. Until everyone gets the right designation, it is using bots to record publicly available data on the influencers it's working with.

Easy and legal

"It's not like this is some big secret that you can scrape websites," said Mark Douglas, CEO of ad tech firm SteelHouse. The code needed to create something like this was so simple "a college student could do it," he said, noting that the company does not participate in data scraping.

It's also perfectly legal. A federal judge in San Francisco ruled in August that hiQ Labs — which collects public data from LinkedIn profiles to help companies find "skill gaps and turnover risks" using bots — was within its rights to do.

The practice of using bots to collect massive amounts of information is common, said Rich Kahn, CEO of online marketing and ad-fraud detecting firm eZanga.

How to download a copy of everything Facebook knows about you

Discount airline travel sites create their own bots to scan the internet for cheap tickets and then show those deals on an easy to navigate platform, Kahn said.

Even Google employs a similar method to crawl the internet and help it index sites for its search engine, Douglas pointed out. If bots were shut off, Google wouldn't be able to provide the quick and expansive results it does today.

Large amounts of bot traffic can be easily detected by social platforms, Douglas said. If the companies wanted to, they could block a company's extension from searching its site claiming it violated terms of service.

Instagram told CNBC it would take action against companies that use bots on its platform.

"We do not allow bots on the platform," an Instagram spokesperson said. "Instagram is committed to keeping activity on Instagram authentic. We work to detect and remove spam, and identify and eliminate fake accounts. In addition to technical measures, we pursue legal enforcement against services that violate our terms of use."

YouTube also said it banned the practice.

"YouTube Terms of Service and YouTube Developer Policies prohibit scraping of YouTube," a YouTube spokesperson told CNBC. "Once notified of an infringing tool or service we take appropriate action."

But a marketing executive — which did not use the bot tactic because it requires a lot of maintenance to deal with constant policy changes — noted that there are ways around the platforms' policing tactics.

For instance, using browser extensions lets companies rotate IP addresses that uniquely identify computers, making it harder to detect fake traffic.

Douglas also that networks have typically been lax about cracking down on this tactic because there are benefits to allowing content to be widely searchable — including making content more visible on third-party sites, which can help it go viral. It can also create more business opportunities for a platform's users, which in turn can make them more loyal to the company.

The only safeguard to protect personal information from these programs is to make a profile private, which means nobody could search it.

"Some people don't want to be found on every network they've been on, and they have a right to that," Kahn said. "But the average person, I would think they like the ability to have their profile name and location and a few basic pieces of information. It's a good thing because it does give them the ability to be found."