How WebSockets Work
Table of Contents
HTTP is built around a simple model: the client asks, the server answers. That works perfectly for loading web pages, but it falls apart when you need real-time, bidirectional communication. WebSockets solve this by upgrading an HTTP connection into a persistent, full-duplex channel where both sides can send data at any time.
The Problem with HTTP for Real-Time
Consider a live network scan running on the BLEShark Nano's web interface. The Nano is discovering BLE devices every few seconds, and you want to see results appear in your browser as they arrive. With plain HTTP, the browser has no way to receive data unless it asks for it first.
The naive solution is polling - sending a GET request every second to check for new results. This works, but it is wasteful. Most responses come back empty ("no new data yet"), and each request carries the full overhead of HTTP headers. At one request per second, that is 86,400 unnecessary round trips per day.
Long polling improves this slightly. The client sends a request, and the server holds it open until new data is available (or a timeout expires). When data arrives, the server responds, and the client immediately sends a new request. This reduces unnecessary responses but still creates a new HTTP request for every piece of data.
WebSockets eliminate this inefficiency entirely. After a single HTTP handshake, the connection upgrades to a persistent channel. Both sides can send messages whenever they want, with minimal framing overhead and no repeated headers.
The Upgrade Handshake
A WebSocket connection starts as a regular HTTP request. The client sends a GET request with special headers requesting an upgrade:
GET /scan/live HTTP/1.1
Host: 192.168.4.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
The key headers are:
-
Upgrade: websocket- requests the protocol switch -
Connection: Upgrade- signals this is not a normal HTTP request -
Sec-WebSocket-Key- a base64-encoded random value for handshake verification -
Sec-WebSocket-Version: 13- the WebSocket protocol version
If the server supports WebSockets, it responds with HTTP 101 Switching Protocols:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
The Sec-WebSocket-Accept value is computed by concatenating the client's Sec-WebSocket-Key with a fixed GUID, hashing it with SHA-1, and base64-encoding the result. This proves the server actually understood the WebSocket request and did not just blindly echo headers.
After this exchange, the HTTP connection is gone. The underlying TCP connection remains, but the protocol running over it is now WebSocket, not HTTP.
sequenceDiagram
participant C as Client - Browser
participant S as Server - BLEShark Nano
Note over C,S: Standard TCP connection exists
C->>S: HTTP GET /scan/live with Upgrade headers
S->>S: Validate Sec-WebSocket-Key
S->>C: HTTP 101 Switching Protocols
Note over C,S: Protocol is now WebSocket
C->>S: WebSocket frame - start scan
S->>C: WebSocket frame - device found BLE-001
S->>C: WebSocket frame - device found BLE-002
C->>S: WebSocket frame - update filter
S->>C: WebSocket frame - device found BLE-003
S->>C: WebSocket frame - scan complete
C->>S: Close frame
S->>C: Close frame
Figure 1 - The HTTP upgrade handshake followed by bidirectional WebSocket communication
Frame Format
WebSocket messages are transmitted as frames. Each frame has a compact binary header followed by the payload data. The header contains:
FIN bit (1 bit) - indicates whether this is the final frame of a message. Large messages can be split across multiple frames, with FIN=0 on all but the last.
Opcode (4 bits) - defines the frame type:
-
0x1- text frame (UTF-8 encoded) -
0x2- binary frame -
0x8- connection close -
0x9- ping (keepalive) -
0xA- pong (response to ping) -
0x0- continuation frame (part of a fragmented message)
Mask bit (1 bit) - must be 1 for all frames sent from client to server. The mask prevents proxy cache poisoning attacks by ensuring client frames cannot be predicted.
Payload length (7, 7+16, or 7+64 bits) - if the value is 0-125, that is the actual length. If 126, the next 2 bytes contain the length. If 127, the next 8 bytes contain the length. This variable-length encoding keeps small messages compact while supporting payloads up to 2^63 bytes.
Masking key (32 bits, client only) - a random value used to XOR the payload data.
A small text message from the server might have only 2 bytes of overhead - the FIN/opcode byte and the payload length byte. Compare this to HTTP, where headers alone can be several hundred bytes.
graph TD
subgraph "WebSocket Frame Structure"
A[FIN - 1 bit] --> B[RSV1-3 - 3 bits reserved]
B --> C[Opcode - 4 bits]
C --> D[Mask bit - 1 bit]
D --> E[Payload length - 7 bits]
E --> F{Length value?}
F -->|0-125| G[Actual length]
F -->|126| H[Extended 16-bit length]
F -->|127| I[Extended 64-bit length]
G --> J[Masking key - 32 bits if masked]
H --> J
I --> J
J --> K[Payload data]
end
subgraph "Opcodes"
O1[0x1 Text] --> O2[0x2 Binary]
O2 --> O3[0x8 Close]
O3 --> O4[0x9 Ping]
O4 --> O5[0xA Pong]
end
Figure 2 - WebSocket frame structure showing the compact binary header format
Full-Duplex Communication
The defining feature of WebSockets is full-duplex communication. Both the client and server can send frames at any time, independently of each other. There is no request-response pairing. The server does not need to wait for a client request before sending data, and the client does not need to wait for the server to finish before sending more.
This is fundamentally different from HTTP, even HTTP/2 with multiplexing. In HTTP/2, every response is tied to a request. Server push exists but is limited to pushing resources the client would have requested anyway. In WebSockets, the server can send any message at any time - a new scan result, a status update, an alert - without any client request triggering it.
The connection stays open until either side sends a close frame (opcode 0x8). The close frame contains a status code and optional reason text. The other side responds with its own close frame, and then the TCP connection is terminated. This orderly shutdown ensures both sides know the connection ended intentionally.
Ping and pong frames handle keepalive. Either side can send a ping frame, and the other must respond with a pong frame containing the same payload. This detects dead connections without sending application data.
Real-World Use Cases
The BLEShark Nano uses WebSockets for its web-based interface. When you start a BLE scan from the browser, the command goes to the Nano over a WebSocket. As the Nano discovers devices, each result is pushed to the browser as a WebSocket message in real time. You see devices appear in the scan list immediately, without refreshing or polling.
This is the natural pattern for any live data interface: the initial command goes from client to server, and then results stream back as they become available. Chat applications use the same pattern - messages from other users arrive as server-to-client frames without the recipient doing anything.
Other common uses include:
- Live dashboards - stock tickers, monitoring systems, sports scores
- Collaborative editing - Google Docs-style real-time collaboration
- Gaming - multiplayer game state synchronization
- Notifications - push notifications in web applications
WebSocket Security
WSS (WebSocket Secure) is the encrypted variant, running WebSocket over TLS - the same TLS used by HTTPS. The upgrade handshake happens after the TLS handshake completes, so the entire WebSocket connection is encrypted. Always use wss:// over ws:// on public networks.
Origin checking is the primary defense against Cross-Site WebSocket Hijacking (CSWSH). Unlike HTTP requests, WebSocket connections are not subject to the same-origin policy by default. A malicious page at evil.com could open a WebSocket to your-app.com and the browser would send cookies along with it. The server must check the Origin header in the upgrade request and reject connections from unexpected origins.
Authentication is tricky with WebSockets. Since the connection is upgraded from HTTP, cookies are sent with the initial handshake request. But after the upgrade, there is no built-in way to send new credentials. Most implementations authenticate during the handshake (via cookies or a token in the query string) and then trust the connection for its lifetime.
The client-to-server masking requirement exists to prevent a subtle proxy cache poisoning attack. Without masking, an attacker could craft WebSocket frames that, if interpreted as HTTP by an intermediary proxy, would inject malicious content into the proxy's cache. The random masking key ensures frames are unpredictable to intermediaries.
Alternatives - SSE and Long Polling
Server-Sent Events (SSE) provide a simpler alternative when you only need server-to-client streaming. SSE uses a regular HTTP connection with Content-Type: text/event-stream. The server keeps the connection open and sends events as text lines. SSE is simpler to implement, works naturally with HTTP/2 multiplexing, and handles reconnection automatically. The tradeoff: it is one-directional. The client cannot send messages over the SSE connection.
Long polling is the oldest approach. The client sends an HTTP request, the server holds it until data is available, responds, and the client immediately sends another request. It works through all proxies and firewalls but has higher latency and overhead than both WebSockets and SSE.
Choose WebSockets when you need bidirectional real-time communication. Choose SSE when you only need server-to-client events. Choose long polling when compatibility with restrictive network environments matters more than efficiency.
For the BLEShark Nano, WebSockets are the right choice. The browser needs to send commands (start scan, stop scan, change parameters) and receive results (discovered devices, signal strengths, connection status) - a bidirectional flow that maps naturally to the WebSocket model.
Get the BLEShark Nano - $36.99+