Social Media Aggregators & Web Scrapers
Your public posts aren't as public as you think
The Social Data Industry
A growing ecosystem of companies scrapes, indexes, and analyzes public social media posts, forum discussions, blog comments, and other online content at massive scale. Some serve corporate marketing teams. Others sell directly to law enforcement and intelligence agencies. The line between “social listening” and surveillance is increasingly blurred.
These companies argue that publicly available data is fair game. But when billions of profiles are aggregated into searchable databases and sold for employment screening, government surveillance, or political intelligence, the practical effect is mass surveillance of online speech.
Social Media Intelligence Platforms
Brandwatch (Cision)
What they are: One of the largest social media intelligence and analytics platforms in the world. Merged with Crimson Hexagon on October 4, 2018, combining two major social data aggregators into a roughly $100 million annual recurring revenue business.[1] In February 2021, Cision acquired Brandwatch for $450 million.[2]
What data they have: Crimson Hexagon’s data library contained over 1 trillion publicly available social media posts sourced from Twitter/X, Instagram, Facebook, Reddit, blogs, forums, and news sites.[1] The combined platform provides analysis of topics, trends, reactions, and the demographics, interests, and geography of social media audiences for over 2,000 brands and agencies.
Controversy: In 2013, Crimson Hexagon contracted with the Civil Society Development Fund, a Russian nonprofit with ties to the Kremlin, to monitor social media trends on LiveJournal, Twitter, and Russian social media platforms — the data was used to study Russian public opinion about President Putin’s government.[3] Crimson Hexagon was founded by Harvard professor Gary King, who also co-founded Social Science One, a Facebook data-sharing initiative. In July 2018, Facebook suspended Crimson Hexagon while investigating whether the company had violated its policies on surveillance, citing concerns about the Russian contracts and pitches to U.S. government agencies.[4]
Dataminr
What they are: AI-powered real-time event detection platform. One of the only companies with full access to the X (formerly Twitter) “firehose” — a direct, unfiltered feed of every public post the moment it is sent.[5] Valued at approximately $4.1 billion, with $1.24 billion in total funding raised and approximately $200 million in annual recurring revenue.[6]
Government contracts: Dataminr holds a five-year, $282 million contract with the U.S. Department of Defense.[6] Law enforcement clients include the NYPD, LAPD, Chicago Police Department, Louisiana State Police, and D.C. Metropolitan Police. In 2018, D.C. police purchased seven Dataminr licenses for $47,950; by 2020, the D.C. Office of the Chief Technology Officer purchased 50 licenses for $200,000.[7]
Protest surveillance: In July 2020, The Intercept revealed that Dataminr helped law enforcement monitor George Floyd protests in real time, tipping off police to social media posts with the latest whereabouts and actions of demonstrators — despite Twitter’s terms of service explicitly barring developers from “tracking, alerting, or monitoring sensitive events (such as protests, rallies, or community organizing meetings).”[5] A subsequent investigation found that Dataminr’s anti-gang monitoring amounted to “white people, tasked with interpreting language from communities that we were not familiar with,” coached by predominantly white former law enforcement officials with no institutional definition of “potential gang member.”[8]
Sprinklr
What they are: Social media monitoring, analytics, and customer experience management platform. Went public on the NYSE (CXM) in June 2021, raising $266 million at an implied valuation of $4 billion.[9] Revenue of approximately $462 million as of 2026.
Law enforcement use: Documents obtained by the Brennan Center for Justice through a two-year FOIA lawsuit (resulting in 700,000 pages of records and a $400,000 legal fees settlement from D.C.) revealed that D.C. Metropolitan Police used Sprinklr during President Trump’s 2017 inauguration “to monitor key terms that could have an impact on” the district, including hashtags like #DisruptJ20, #RefuseFascism, #ResistTrump, #Anticapitalist, and #Antifa.[7]
Kerala controversy: In April 2020, the state of Kerala, India hired Sprinklr to analyze COVID-19 patient data. The Kerala High Court intervened on April 24, 2020, ordering the state to anonymize all sensitive data before sharing it with Sprinklr, ensure informed consent, and prohibit Sprinklr from advertising that it had access to patient information.[10]
Meltwater
What they are: Media monitoring, social media analytics, and media intelligence platform serving approximately 27,000 clients globally with revenue of roughly $462 million and about 2,600 employees across 69 offices on six continents.[11] Acquired DataSift (privacy-focused social data platform) in March 2018 and Sysomos (social analytics) in April 2018 to strengthen AI-driven analytics.[12]
What data they have: Accesses the full X (Twitter) Firehose, processing and storing approximately 500 million new posts every day.[13] Retains social media data for 15 months and news content searchable back to 2009.
Copyright ruling: In March 2013, Judge Denise Cote of the Southern District of New York ruled in a 90-page opinion that Meltwater’s scraping and redistribution of Associated Press news content constituted copyright infringement, rejecting Meltwater’s fair use defense. The court held that Meltwater was not a search engine but rather an “expensive subscription service” that provided subscribers with significant portions of AP articles and “consciously markets itself as a substitute for news sites operated or licensed by AP.”[14] Both parties dismissed all claims in July 2013.
Surveillance & Government Social Media Tools
Babel Street (Locate X)
What they are: Government contractor based in Reston, Virginia. Sells Babel X (social media analysis across 200+ languages) and Locate X (precise smartphone location tracking using Mobile Advertising IDs).
Who buys it: CBP (approximately $4.8 million in contracts — $981,000 in 2017 and $3.8 million in 2021),[15] IRS ($150,000), Treasury Department’s OFAC ($155,000),[16] Secret Service, Defense Information Systems Agency, U.S. Coast Guard, Navy, Air Force, Army, and Special Operations Command.
Scandals: In October 2024, an investigator working with Atlas Privacy, a data removal company, gained access to a free Locate X trial by simply asserting they were a private investigator who “may work with the government in the future” — no verification was required.[17] Atlas demonstrated that Locate X could track a person traveling from their home in Alabama to an abortion clinic in Tallahassee, Florida, and back, and identified more than 700 unique devices at one north Florida abortion clinic over three years using the platform.[18] The Secret Service internally confirmed that “a warrant isn’t needed” to use the purchased location data.[19] Atlas Privacy filed a lawsuit against Babel Street on November 19, 2024, in the New Jersey District Court for alleged violations of New Jersey data privacy law.[20]
ShadowDragon
What they are: Wyoming-based OSINT (open-source intelligence) company whose software allows police to ingest data from social media and other internet sources to identify persons of interest and map their networks. Its SocialNet product collects data from over 200 sources, including Amazon, dating apps (Tinder, FetLife), gaming platforms (Fortnite), parenting sites (BabyCenter), Telegram, and PornHub.[21]
Who buys it: U.S. Immigration and Customs Enforcement (ICE) purchased ShadowDragon at least twice, and the Massachusetts State Police and New York State Police acquired it as well.[21]
Concerns: The Mozilla Foundation called on major companies — including Amazon, Apple, Discord, Facebook, Google, Nextdoor, OnlyFans, and YouTube — to block ShadowDragon’s scraper; none have apparently done so.[22] Civil liberties organizations have warned that such tools disproportionately impact Black and marginalized communities and people seeking abortion access.[21]
Banjo Inc.
What they were: AI-powered real-time surveillance technology that aggregated data from video surveillance cameras, social media, and other public sources. Held a five-year, $20.7 million contract with the Utah Department of Public Safety and a $750,000 contract with the Utah Attorney General.[23]
What happened: In April 2020, reporting revealed that founder/CEO Damien Patton had past connections with the Ku Klux Klan and participated in a drive-by shooting of a Nashville synagogue on June 9, 1990, at age 17.[24] Utah Attorney General Sean Reyes immediately suspended the contract. Patton resigned as CEO on May 8, 2020.[25]
The audit: A state audit completed in April 2021 found that Banjo was never capable of performing the services it had promised — the technology that Utah paid $20.7 million for “does not use techniques that meet the industry definition of artificial intelligence.”[23] The company’s LLC subsequently expired and is no longer active, according to Utah Division of Corporations records. A spokesman for the Utah Attorney General stated that the company “formally known as Banjo no longer exists.”[26]
Social Media Employment Screening
Ferretly
What they are: AI-powered social media background screening for employment decisions. Uses visual AI to identify nudity, drugs, extremist symbols, violent imagery, memes, and gestures across 50+ social media platforms. Uses avatar recognition, name correlation, and a billion-profile OSINT database to locate individuals even when they mask their identity.[27]
The problem: The core business model of AI-judging candidates’ social media raises fundamental concerns about bias, context misinterpretation, and the chilling effect on free speech. Claims FCRA, GDPR, and EEOC compliance. Also offers “Continuous Screening” that monitors employees’ public social media posts indefinitely for “potential risks” — extending employer surveillance beyond the hiring decision.[27]
Fama Technologies
What they are: AI-powered online screening for employment decisions. Searches 10,000+ online public sources to identify workplace misconduct risks. In April 2023, Fama acquired Social Intelligence Corp, combining AI screening with FCRA-trained human analysts to create what it calls the “most comprehensive online screening solution.”[28]
Oversight: Listed by the Consumer Financial Protection Bureau (CFPB) as a Consumer Reporting Agency, placing it under federal oversight.[29] The combination of AI screening and human analysis for employment decisions raises ongoing questions about accuracy, bias, and whether employers should judge candidates based on their social media activity at all.
hireEZ (formerly Hiretual)
What they are: AI-powered recruiting platform that scrapes publicly available professional data to build candidate profiles. Sources data from LinkedIn, GitHub, and other public platforms. Claims access to 750+ million professional profiles from over 45 open web platforms.[30]
Complaints: Multiple users on review platforms have reported their personal data was scraped without permission and made available to recruiters. People received unsolicited emails stating hireEZ had “harvested” their data and would include it in their database — with one reviewer characterizing the notification as “Hi, we are going to sell your personal data to recruiters.”[31] Despite complaints, hireEZ claims SOC 2 Type II, ISO 27001/27701, GDPR, and CCPA compliance.
Web Scraping Infrastructure
Bright Data (formerly Luminati Networks)
What they are: Founded in 2014 in Netanya, Israel, Bright Data operates the world’s largest proxy network with over 72 million IP addresses across 195 countries, including more than 150 million residential proxy IPs. The company processes over 2 billion web requests daily, serving over 20,000 enterprises in AI, eCommerce, finance, and market research.[32]
Origin: Bright Data originated from Hola VPN’s peer-to-peer network, which routed traffic through users’ devices — meaning consumers who installed the free Hola VPN unknowingly donated their bandwidth and IP addresses to Luminati’s commercial proxy network. EMK Capital acquired a majority stake in 2017 at an enterprise value of approximately $200 million.[32]
Meta lawsuit: In January 2023, Meta Platforms sued Bright Data for scraping data from Facebook and Instagram, alleging breach of contract and violations of the Computer Fraud and Abuse Act. In January 2024, U.S. District Judge Edward Chen ruled in Bright Data’s favor, finding that Meta’s terms of service do not apply to scraping of public data while logged out of an account: “The Facebook and Instagram Terms do not bar logged-off scraping of public data.”[33] Meta dismissed its remaining claims in February 2024 without the right to appeal, establishing a significant legal precedent for the web scraping industry.[34]
