Data Brokers

Data Brokers and WiFi Data Collection

What Your Phone Broadcasts

Every smartphone with WiFi enabled is a small radio transmitter. Even when not connected to a network, your phone sends out probe requests - short radio frames asking "Is [network name] nearby?" for every saved network in its memory. These probe requests contain your device's MAC address, a unique 48-bit identifier that is as distinctive as a serial number.

Modern operating systems (iOS 14+, Android 10+) randomize MAC addresses in probe requests, which reduces but does not eliminate the tracking problem. The randomization is imperfect - research has shown that the randomization algorithms in some implementations are predictable, and once your device connects to a network (rather than just probing), it typically uses its real MAC address. Information elements within probe requests, such as supported data rates and capability information, can also be used to fingerprint devices even when the MAC is randomized.

Beyond probe requests, connected devices transmit data that includes their MAC address, the access point's MAC address (BSSID), signal strength, and data volume. For an observer with access to the WiFi infrastructure, this creates a detailed record of when specific devices were present, how long they stayed, and approximately where they were within the coverage area.

How WiFi Data Gets Collected

WiFi data collection happens through several channels, some visible and some invisible to the end user.

Passive sniffing infrastructure. Dedicated sensors deployed in shopping malls, airports, stadiums, and city streets listen for WiFi probe requests and connected device traffic. Companies like Euclid Analytics (acquired by VMware), RetailNext, and Purple WiFi have built businesses around deploying these sensors and selling the aggregated data. A sensor does not need to provide WiFi service - it just needs to listen.

Free WiFi networks. Every "free" WiFi network at a coffee shop, hotel, or airport is a data collection point. The terms of service (which nobody reads) typically grant the operator permission to collect device identifiers, browsing data, and location information. Captive portal pages that require an email address or social media login before granting access create an explicit link between a device MAC address and a real identity.

Mobile apps with location permissions. Apps that request WiFi scanning permission (on Android) or precise location permission (on iOS) can enumerate nearby access points and their signal strengths. This data, combined with databases mapping access point BSSIDs to geographic locations (maintained by Google, Apple, and companies like WiGLE), enables indoor positioning without GPS. The app developer can then sell or share this location data.

graph TD
    subgraph Collection_Points["WiFi Data Collection Points"]
        A[Passive Sensors in Malls]
        B[Free WiFi Networks]
        C[Mobile App SDKs]
        D[ISP Infrastructure]
    end
    
    subgraph Raw_Data["Raw Data Collected"]
        E[MAC Addresses]
        F[Probe Requests / SSIDs]
        G[Signal Strength / Timing]
        H[Connected Session Data]
    end
    
    subgraph Processing["Data Processing"]
        I[Device Fingerprinting]
        J[Location Triangulation]
        K[Identity Resolution]
        L[Behavioral Profiling]
    end
    
    subgraph Buyers["Who Buys This"]
        M[Advertisers]
        N[Retailers]
        O[Hedge Funds]
        P[Government Agencies]
    end
    
    A --> E
    A --> G
    B --> E
    B --> H
    C --> F
    C --> G
    D --> H
    E --> I
    F --> J
    G --> J
    H --> K
    I --> L
    J --> L
    K --> L
    L --> M
    L --> N
    L --> O
    L --> P

The chain from WiFi signal collection to data broker sales involves multiple collection points, processing layers, and buyer categories

The Data Broker Supply Chain

The path from a WiFi signal to a marketed data product involves a supply chain of companies, each adding a layer of processing and enrichment.

At the bottom of the chain are collection companies that operate sensors or SDKs embedded in mobile apps. These companies gather raw device identifiers and location signals. They typically aggregate data from multiple sources to increase coverage and accuracy.

Middle-tier companies specialize in identity resolution - matching device identifiers to real people. A single person might have a phone MAC address, a laptop MAC address, a tablet MAC address, a mobile advertising ID (IDFA on iOS, GAID on Android), several browser cookies, and login credentials for dozens of services. Identity resolution companies like LiveRamp, Oracle Data Cloud, and Acxiom match these identifiers together into unified profiles using deterministic matching (shared login across devices) and probabilistic matching (devices that consistently appear at the same location at the same time are likely owned by the same person).

At the top of the chain are data brokers who package and sell enriched profiles. Companies like Clearview AI, Babel Street, Gravy Analytics, and Near Intelligence sell location data products to advertisers, retailers, financial firms, and government agencies. A 2023 FTC investigation found that one broker, Kochava, sold location data precise enough to track individual visits to sensitive locations including reproductive health clinics, addiction recovery centers, and places of worship.

From WiFi Probe to Location Profile

A single WiFi probe request tells an observer very little. But WiFi data collection does not work with single observations. It works at scale, across time, combining millions of data points into behavioral profiles.

When a sensor detects your phone's MAC address at a shopping mall at 2:15 PM on a Tuesday, that is one data point. When sensors at that mall detect the same MAC address every Tuesday afternoon for three months, that is a routine. When sensors at a nearby gym detect the same MAC on Monday and Wednesday mornings, and sensors at a specific office building detect it Monday through Friday from 9 to 6, the profile becomes remarkably detailed: this device's owner works at this office, exercises on these days, and shops weekly at this mall.

Cross-referencing this location profile with data from mobile app SDKs, which can include your real name from your account information, creates a named profile tied to physical behavior patterns. This is not speculation about capabilities - this is the product that data brokers sell.

WiFi Data in Ad Targeting

The advertising industry is the largest commercial consumer of location data derived from WiFi and other sources. The mechanism works through what the industry calls "geofencing" and "geotargeting."

A retailer creates a virtual geographic boundary (geofence) around a competitor's store. When a mobile advertising ID is observed within that boundary (through WiFi sensors, GPS data from mobile apps, or both), that ID is added to a targeting list. The retailer can then serve ads to that specific person across the web, in apps, and on connected TV, with messaging designed to pull them away from the competitor.

This extends beyond retail. Political campaigns use location data to target people who attended specific events, rallies, or protests. Insurance companies have explored using location data to assess risk (frequent visits to fast food restaurants, bars, or hospitals). Financial firms use foot traffic data to predict quarterly earnings for publicly traded retailers before the earnings are announced.

The precision is often overstated in marketing materials, but the aggregate capability is real. A 2023 report from The Markup found that the data broker Near Intelligence sold location data to the U.S. Department of Defense, with data derived in part from advertising SDKs embedded in popular prayer and weather apps.

Retail Analytics and Foot Traffic

Brick-and-mortar retailers use WiFi analytics to compete with the detailed user tracking that e-commerce platforms take for granted. Online stores see every page view, click, and cart addition. Physical stores have historically been blind to customer behavior beyond the point of sale.

WiFi analytics platforms close this gap. Sensors deployed throughout a store detect devices and estimate their position through signal strength triangulation or time-of-arrival measurements. This produces heat maps showing which areas of the store attract the most foot traffic, dwell time measurements for specific departments or displays, path analysis showing how customers move through the store, and visit frequency and loyalty metrics.

Some platforms also offer customer segmentation by analyzing device movement patterns. A device that enters through the main entrance, moves directly to electronics, spends 12 minutes there, and exits without visiting other departments represents a different customer profile than one that browses clothing for 45 minutes.

The privacy implications depend on whether the data is truly anonymized (aggregated and non-identifiable) or merely pseudonymized (tied to a device identifier that could be linked back to a person). Many retail analytics vendors claim anonymization but implement pseudonymization, which is a weaker guarantee.

SDK-Based Collection via Mobile Apps

The most pervasive WiFi data collection mechanism is not sensor-based - it is software-based. Mobile advertising SDKs embedded in apps collect WiFi scan data as part of their location determination process.

When you install an app that requests location permissions, you may be granting access not just to the app developer but to every SDK embedded within the app. A single app might contain SDKs from a dozen different companies, each collecting and transmitting data independently. The app developer often does not know or control what data these SDKs collect.

X-Mode Social (now Outlogic) was a prominent example. Their SDK was embedded in hundreds of apps, including prayer apps and dating apps. It collected precise location data (including from WiFi scans) and sold it to military and government contractors. After media reporting exposed the practice in 2020, X-Mode was banned from the Google Play Store and Apple App Store, though the company rebranded and continued operating.

Checking which SDKs are embedded in an app is possible through tools like Exodus Privacy (for Android) and class-dump analysis for iOS, but this requires technical skill and is not accessible to average users. The practical reality is that most people have no idea which companies are receiving their location data through the apps on their phone.

Regulatory Changes

Regulatory responses to location data collection have been slow but are accelerating. Several frameworks now directly affect WiFi-based data collection.

FTC enforcement. The U.S. Federal Trade Commission has taken enforcement actions against data brokers. In 2024, the FTC settled with Kochava over the sale of precise geolocation data, and proposed rules would require data brokers to maintain do-not-track lists. The FTC has also targeted individual apps for undisclosed data sharing with embedded SDKs.

State-level privacy laws. California's CCPA/CPRA gives residents the right to opt out of the sale of personal information, including location data. Similar laws in Virginia, Colorado, Connecticut, and other states create a patchwork of obligations for data brokers. The practical impact varies - most consumers do not exercise these rights, and enforcement is inconsistent.

GDPR. In the European Union, GDPR treats MAC addresses as personal data when they can be linked to an identifiable person. This requires a legal basis for collection (typically consent), purpose limitation, and data minimization. Several EU data protection authorities have specifically addressed WiFi tracking in retail environments, generally concluding that passive WiFi tracking without meaningful consent is not GDPR-compliant.

Technical responses. Apple and Google have implemented MAC address randomization and restricted background location access for apps. Apple's App Tracking Transparency (ATT) framework requires apps to ask permission before tracking across other apps and websites, which has significantly reduced the data available to advertising SDKs on iOS.

Protecting Yourself

Complete protection against WiFi-based tracking is difficult without simply turning WiFi off. But several measures reduce exposure significantly.

Disable WiFi when you do not need it. If you are walking through a shopping mall and do not need internet access, turn WiFi off. This stops probe requests entirely.

Forget networks you no longer use. Your phone probes for every saved network, broadcasting those network names to any listener. A phone that probes for "HiltonHonorsWiFi" tells an observer you have stayed at a Hilton. Periodically clean your saved network list.

Audit app permissions. On iOS, review which apps have "Precise Location" access and whether any have "Always" versus "While Using" access. On Android, check WiFi scanning permissions and location permissions. Remove permissions from apps that do not need them.

Use a VPN on public WiFi. While a VPN does not prevent MAC-level tracking, it prevents the WiFi operator from monitoring your browsing traffic and linking it to your device identifier.

For security researchers who want to understand what their own devices broadcast, tools like the BLEShark Nano can passively monitor WiFi probe requests and BLE advertisements in your environment, showing you exactly what data your devices (and the devices around you) are leaking.

Get the BLEShark Nano - $36.99+
Back to blog

Leave a comment