What Is OSINT? Open Source Intelligence for Security Researchers
OSINT stands for Open Source Intelligence. It's intelligence gathered from publicly available sources - no hacking, no access to private systems, no intercepted communications. Publicly accessible websites, DNS records, WHOIS data, social media, government databases, job postings, satellite imagery, court records. If it's accessible to anyone without special authorization, it's a potential OSINT source.
What Counts as OSINT
graph TD
subgraph "OSINT Collection Process"
REQ["1. Define Requirements
(what do you need to know?)"]
SRC["2. Identify Sources
(where could this info be?)"]
COLL["3. Collect Data
(scrape, query, observe)"]
PROC["4. Process Raw Data
(clean, normalize, deduplicate)"]
ANAL["5. Analyze
(correlate, contextualize)"]
DISS["6. Disseminate
(report, brief, alert)"]
end
REQ --> SRC --> COLL --> PROC --> ANAL --> DISS
DISS -.->|"New questions
arise"| REQ
subgraph "Source Categories"
PUB["Public Records
(WHOIS, DNS,
cert transparency)"]
SOC["Social Media
(LinkedIn, Twitter,
GitHub)"]
TECH["Technical
(Shodan, Censys,
WiGLE)"]
DOCS["Documents
(SEC filings,
job postings)"]
end
SRC --- PUB & SOC & TECH & DOCS
The OSINT intelligence cycle - a structured methodology from requirements to analysis and dissemination
Internet Infrastructure Data
DNS records are publicly queryable - A, MX, NS, TXT records reveal IP addresses, email infrastructure, and security configuration. Certificate Transparency logs (crt.sh, censys.io) show every SSL certificate issued for a domain, including subdomains. WHOIS records show domain registration information. Shodan is a search engine for internet-connected devices, indexing exposed services and software versions.
Social Media and Professional Networks
LinkedIn is the most useful OSINT source for corporate targets - it reveals organizational structure, employee names and roles, technology stacks (through job postings and skill endorsements). A posting for "Senior Cloud Engineer - AWS, Terraform, Kubernetes" reveals infrastructure choices.
Document and File Metadata
Documents published on company websites often contain metadata - PDF author fields, Office file paths revealing internal naming conventions, EXIF data in images containing GPS coordinates. Tools like FOCA automate metadata extraction.
Job Postings
Job postings reveal technology choices, vendor relationships, what problems the company is trying to solve, and specific security gaps. A security engineer posting mentioning "experience responding to incidents involving CrowdStrike" tells an attacker exactly what EDR product to research for evasion.
Passive vs Active OSINT
graph LR
subgraph "Passive OSINT (No Target Interaction)"
P1["WHOIS Lookups"]
P2["DNS Records"]
P3["Certificate Transparency"]
P4["Social Media Profiles"]
P5["Cached Web Pages"]
P6["Job Postings Analysis"]
P7["WiFi Network Mapping
(WiGLE database)"]
end
subgraph "Active OSINT (Target May Detect)"
A1["Port Scanning
(Nmap, Masscan)"]
A2["Web Crawling
(target website)"]
A3["WiFi Scanning
(BLEShark Nano)"]
A4["BLE Device Scanning
(BLEShark Nano)"]
A5["Email Verification
(SMTP queries)"]
A6["Social Engineering
(direct contact)"]
end
subgraph "Detection Risk"
LOW["Low Risk
(no logs generated)"]
HIGH["Higher Risk
(may trigger IDS/alerts)"]
end
P1 & P2 & P3 & P4 & P5 & P6 & P7 --- LOW
A1 & A2 & A3 & A4 & A5 & A6 --- HIGH
Passive OSINT leaves no trace on the target; active OSINT involves direct interaction that may be detected
Passive OSINT involves no direct interaction with the target's infrastructure - querying DNS, searching Shodan, reading LinkedIn. The target has no way of knowing this is happening.
Active OSINT involves direct interaction: making HTTP requests to their web server, running port scans, sending tracking emails. The target can potentially detect this in logs. The legal line gets more complex here.
OSINT for Wireless Security Work
graph TD
subgraph "Wireless OSINT Workflow"
SCAN["WiFi Scan
(BLEShark Nano)"]
BLESCAN["BLE Scan
(BLEShark Nano)"]
SCAN --> SSID["Collect SSIDs
and BSSIDs"]
SCAN --> ENC["Note Encryption
(WPA2/WPA3/Open)"]
SCAN --> CHAN["Map Channels
and Signal Strength"]
BLESCAN --> DEVS["Enumerate
BLE Devices"]
BLESCAN --> MFGR["Identify
Manufacturers"]
BLESCAN --> ADVD["Capture
Advertisement Data"]
SSID --> WIGLE["Cross-reference
WiGLE Database"]
SSID --> VENDOR["MAC Vendor
Lookup"]
ENC --> VULN["Identify Weak
Configurations"]
WIGLE --> GEO["Geolocate
Networks"]
VENDOR --> ORG["Map to
Organization"]
DEVS --> ASSET["Build Asset
Inventory"]
GEO & ORG & ASSET --> REPORT["OSINT Report"]
end
Wireless OSINT workflow using the BLEShark Nano to enumerate WiFi and BLE targets, then correlate with public databases
Wireless security assessments have their own OSINT angle. Passive wireless scanning is itself a form of OSINT - WiFi beacon frames are broadcast to everyone within range. Recording SSIDs, BSSIDs, encryption types, and signal strengths doesn't require connecting to anything.
The BLEShark Nano's WiFi scanner reads beacon frames broadcast by nearby APs. BLE scanning picks up advertisement packets broadcast by Bluetooth devices. Both are passive information gathering - the device is just recording what's being publicly broadcast.
OSINT Tools
theHarvester: Automated collection of email addresses, names, subdomains, IPs from public sources.
Shodan: Internet-connected device search engine.
Maltego: Graphical OSINT analysis tool building connection graphs.
Recon-ng: Modular OSINT framework with DNS enumeration and social media scraping modules.
Google Dorking: Advanced search operators finding specific exposed information.
Legal and Ethical Considerations
Passive OSINT from public sources is generally legal everywhere. A few cautions: GDPR in the EU applies to processing personal data regardless of where it was obtained. Data obtained through credentials-required access is not OSINT. In a professional testing context, the authorization document covers OSINT as part of the reconnaissance phase.