Remote Database - Universal codes decoded

IR Code Databases: How Universal Remotes Know So Many Devices

A universal remote that works with your 15-year-old Panasonic, your new LG, your Sony soundbar, and a projector from a brand you've never heard of - how does it know all those codes? The answer is decades of community-maintained IR code databases, structured formats, and a lot of reverse engineering. These databases are what powers everything from $10 budget universal remotes to professional AV control systems to the BLEShark Nano's IR feature set.

graph LR
    A[IR Code Database] --> B[Device Profiles]
    B --> C[IR Codes]
    C --> D[Commands]

Table of Contents

What Is an IR Code, Exactly?

An IR code is the combination of information needed to reproduce a specific button press from a specific remote control. At minimum, it contains:

  • Protocol (NEC, Sony SIRC, Philips RC-5, Samsung, Panasonic, etc.)
  • Device address (which device class within the protocol)
  • Command byte (which function)
  • Carrier frequency (typically 38kHz or 40kHz for Sony)

From these four values, any compatible IR transmitter can synthesize the correct timing to produce a signal that the target device will recognize and respond to.

Some database formats also store the raw timing directly - the actual microsecond durations of each pulse and space - rather than the decoded protocol values. Raw timing is protocol-agnostic and can represent codes that don't fit cleanly into any documented protocol. It's bulkier (each code might require 50-200 timing values instead of 3-4 decoded values) but more universal.

IRDB: The Open-Source IR Code Library

IRDB (irdb.globalcache.com, also maintained at github.com/probonopd/irdb) is the largest public domain IR code database. As of recent counts, it contains over 500,000 individual codes across more than 10,000 brands and device types.

The original IRDB was maintained by Global Cache, a company that makes IP-to-IR bridge hardware for professional AV control. They released their code database as a public resource, which was then expanded significantly by community contributions. The project is now largely community-maintained on GitHub.

IRDB is not the only database. WinLIRC, lirc.org (the Linux Infrared Remote Control project), and Pronto databases all maintain their own collections in varying formats. The LIRC database focuses on formats compatible with the LIRC daemon used on Linux systems to receive and transmit IR. Many of these databases overlap but none is comprehensive for every device ever made.

How IR Code Databases Are Structured

IRDB stores codes in CSV format, one file per brand/device combination. Each row contains:

functionname,protocol,device,subdevice,function
Power,NEC1,4,-1,8
VolumeUp,NEC1,4,-1,2
VolumeDown,NEC1,4,-1,3
Mute,NEC1,4,-1,9

The fields are:

  • functionname: Human-readable button label
  • protocol: The IR protocol name (NEC1, NEC2, RC5, RC6, Sony12, Sony15, Sony20, Samsung36, etc.)
  • device: The device address (equivalent to the address byte in NEC)
  • subdevice: Sub-device address (-1 if not used)
  • function: The command code

A database entry like "NEC1, device 4, function 8" means: use NEC protocol variant 1, address byte 4, command byte 8. A transmitter that understands NEC1 encoding can generate the correct 9ms+4.5ms preamble, transmit the 32-bit payload (address + complement + command + complement), and the device will respond.

The LIRC format is different - it uses a configuration file per remote that includes the protocol name, raw timing values for the preamble, and timing values for each bit state. LIRC can handle protocols that IRDB's simplified format can't represent, at the cost of more verbose configuration.

How Codes Get Added

New codes enter the database through a few pathways:

Manual capture with IR receiver hardware. Connect a demodulating IR receiver (TSOP4838 or similar) to a microcontroller or USB oscilloscope, press the button on the physical remote, decode the timing. Tools like IrScrutinizer (open-source Java tool) can capture from hardware and decode the protocol automatically, producing a database-ready entry. This is the most common approach for adding codes for physical remotes you have access to.

Community submissions. GitHub pull requests against the IRDB repository. Contributors who've done captures submit new .csv files for brands/models not yet in the database, or add missing functions to existing entries. Review is largely automated - PRs that pass format validation are typically merged.

Manufacturer documentation. Some manufacturers publish their IR code specifications. Sony has historically published SIRC protocol documentation. Denon and Marantz have published their IR code tables. These go directly into the database as authoritative entries. Most consumer electronics brands do not publish this documentation, which is why capture-based methods dominate.

Cross-referencing OEM databases. Many "new" brands are actually rebadged OEM products using the same IR codes as the original manufacturer. A Hisense TV sold under a house brand in one country might use the exact same IR codes. OEM cross-referencing is a way to extend coverage without additional captures.

The Long-Tail Problem

The most popular brands - Samsung, LG, Sony, Panasonic, Vizio, Philips - are extremely well-documented. If you have one of those brands, the code database almost certainly has a complete set of codes for your specific remote.

The long tail - the hundreds of smaller and regional brands - is where coverage gets spotty. A Chinese brand sold only in specific markets, a projector from a niche manufacturer, a smart home hub from a startup - these often have no database entries. The community can't capture what it doesn't own, and manufacturers of low-margin consumer electronics have no incentive to publish their IR documentation.

Regional brands are particularly challenging. TV brands popular in Eastern Europe, Latin America, or Southeast Asia that have little presence in the US or UK have poor coverage in the major databases. Global market dynamics mean that the IRDB reflects the consumer electronics brands popular in English-speaking countries disproportionately.

Estimated coverage: roughly 80-85% of TVs currently in use worldwide have IR codes in major databases. That sounds good, but the remaining 15-20% is hundreds of millions of devices - and they're disproportionately in markets with less access to open-source hardware communities.

The solution for uncovered devices is direct capture: get the physical remote, capture the codes with IR receiver hardware, add them to the database and to your own device. This is exactly what the BLEShark Nano's Receive app is designed to support.

Pronto Hex and Raw Timing Formats

The Pronto Hex format is a raw timing format developed by Philips for their Pronto programmable remote line. It encodes IR signals as a series of carrier frequency and timing pairs in hexadecimal. Every IR signal can be represented in Pronto Hex regardless of protocol, making it a useful universal storage format.

A Pronto Hex string looks like:

0000 006C 0022 0002 015B 00AD 0016 0016 0016 0016 0016 0041 ...

The fields are: preamble (always 0000), carrier frequency divisor, number of once-codes, number of repeat-codes, then pairs of mark/space values in carrier cycles.

Pronto Hex is the format used by most professional AV control systems (Crestron, AMX, Control4, RTI) because it's unambiguous and format-agnostic. The databases for these commercial systems are in Pronto Hex, which is why they can control anything that has an IR receiver regardless of protocol.

The main downside of Pronto Hex is size: a single code takes 20-60 hex values to represent. For a universal remote with 500 device codes, that's significant storage. For devices with limited flash (original Arduino, small microcontrollers), protocol-decoded formats that only need 3-4 values per code are more practical.

Commercial IR Databases

Alongside the open-source databases, there are commercial IR code libraries that power professional AV control systems. Crestron, Control4, and similar companies maintain private databases that are significantly more comprehensive than open-source alternatives. These databases cover obscure commercial displays, AV processors, and industrial equipment that never appears in consumer databases.

Access to commercial databases typically requires purchasing a professional AV control system - they're not publicly available. The codes themselves are the same IR signals, but the coverage and curation are substantially better for professional AV equipment categories.

For consumer device coverage, the open-source databases (IRDB, LIRC) are actually comparable to commercial offerings. The commercial databases have more coverage in the professional AV category but don't have significant advantages for TVs, blu-ray players, and consumer remotes.

BLEShark Nano and IR Code Capture

The BLEShark Nano approaches IR in two ways: database-driven (TV-B-Gone uses stored NEC/SIRC/RC-5 entries for power codes) and capture-based (the Receive and Clone apps capture any IR signal as raw timing).

The capture-based approach is the solution to the long-tail problem. If your device isn't in any database, point its original remote at the BLEShark's IR receiver and capture the signals you need. The BLEShark stores them as raw timing files accessible through the file portal. You can download these captures, convert them to IRDB format, and contribute to the open-source database.

The practical workflow for building a custom IR library on the BLEShark:

  1. Navigate to IR > Receive
  2. Point each remote at the BLEShark's IR receiver from about 20-30cm
  3. Press and hold each button you want to capture
  4. BLEShark stores the raw timing as a named entry
  5. Use the file portal to download captures or upload a custom code library
  6. The Transmit app can then replay any captured code

You can upload a custom code file through the BLEShark's file portal - this supports loading pre-built code sets from IRDB or custom libraries rather than having to capture every code manually. If you're setting up a BLEShark for a specific AV installation, you can prep the code library on a computer and upload it directly.

With Shiver mesh, a pack of BLEShark nodes can each hold a different portion of a code library, or all hold the same library for distributed coverage. You could have Node A cover the living room AV equipment and Node B cover the bedroom setup, controlling both from a single coordinator device. Each node in a Shiver pack handles IR independently, so commands are sent specifically to whichever node is in range of the target device.

IR code databases are community-maintained resources. IRDB is available under a public domain / CC0 license. Contributing back to open databases helps everyone with the same devices.

Get the BLEShark Nano - $36.99+

Back to blog

Leave a comment