Steganography, Watermarking, and Onion Routing

Steganography vs. Watermarking

Steganography hides the existence of a message within a cover object. The goal is undetectability--an observer should not realize hidden data is present. It is primarily used for one-to-one communication.

Three characteristics define steganography:

The cover object appears untouched (modifications are invisible and statistically difficult to detect)
The hidden data is a payload (arbitrary messages transmitted secretly)
The adversary model is detection (if an analyst suspects hidden data exists, the mission fails)

Watermarking embeds identifying information into content so it persists even if someone tries to remove it. The goal is persistence, not secrecy. It is primarily used for one-to-many communication.

Three characteristics define watermarking:

The watermark does not need to be hidden
The watermark is tied to the object (ownership, license, or tracking data), not a message payload
The adversary model is removal (the watermark should survive cropping, compression, and other transformations)

Fingerprinting is a variant of watermarking that embeds unique identifying information into each distributed copy, allowing leaked copies to be traced to specific recipients.

Classic Steganography Techniques

Null ciphers (concealment ciphers) hide messages within ordinary text using predefined patterns, such as taking specific letters after punctuation marks or the nth letter of each word.

Invisible ink conceals messages that become visible only under certain conditions (heat, UV light, or chemicals).

Microdots shrink messages to the size of a period, embedded in documents or images.

Chaffing and winnowing (Ron Rivest, 1998) pairs real messages with irrelevant data, each accompanied by a MAC. Only those with the key can verify which messages are authentic. This achieves confidentiality without encryption.

Digital Steganography

Image Steganography

Three common methods hide messages in images:

Metadata fields: EXIF, PNG text fields, or XMP data. Not true steganography since these fields are well-known, but can bypass content filters.
LSB steganography: Replaces least significant bits of pixel color values with message bits. Changes are imperceptible to humans.
Frequency domain steganography: Embeds data in high-frequency areas of images (noisy regions like leaves, grass, edges). Humans don't notice changes in these regions. However, lossy compression targets the same areas, so hidden data may not survive re-encoding.

The choice of cover medium matters. High-noise content (photographs, music) hides changes better than simple graphics with solid colors.

Audio Steganography

Audio steganography exploits psychoacoustic principles to hide data where humans won't notice distortion. Techniques include LSB encoding, echo hiding (adding imperceptible echoes), phase coding, and spread spectrum methods.

Network Steganography

Data can be hidden in network communication through packet headers, timing intervals between packets, TCP initial sequence numbers, or DNS queries and responses. DNS-based steganography is commonly used for malware command-and-control communication.

Watermarking Applications

Printer Tracking Dots

Many color laser printers embed nearly invisible yellow dots encoding the printer's serial number and timestamp on every page printed. This enables tracing documents to their source printer.

Fragile vs. Robust Watermarks

Fragile watermarks break if content is modified. Used for authentication and tamper detection (currency, passports, tickets).

Robust watermarks survive transformation. Used for tracking ownership and authorship (photos, videos, documents).

Steganography for Malware

Attackers use steganography to deliver malware past content-inspecting firewalls and to exfiltrate data through innocent-looking uploads. Malicious payloads can be hidden in images, audio, or other media files that appear harmless to security tools.

Steganalysis

Steganalysis is the practice of detecting hidden content in files. The more data hidden, the greater the risk of detectable artifacts.

Anonymous Communication

Limits of Private Browsing

Browser private modes (Incognito, InPrivate) don't send cookies, don't save history, and discard cached pages at session end. However, they do not provide true privacy:

Web servers see your IP address
ISPs know what domains you access
DNS servers log your queries
Proxies and firewalls see traffic

Commercial VPNs hide your IP from destinations but the VPN provider knows your activity.

The Dark Web

The surface web is content indexed by search engines. The deep web is unindexed content (database query results, private pages). The dark web is intentionally hidden content that requires special software, such as Tor, to access.

Dark web services use .onion addresses derived from hashed public keys. Both legitimate services (news outlets, SecureDrop, search engines) and illicit services operate on the dark web.

Tor (The Onion Router)

How Tor Works

Tor provides anonymous browsing through volunteer-operated relays. Users download the Tor Consensus Document, which describes the entire network: all valid relays, their IP addresses, ports, bandwidth, and their public keys. This document is signed by trusted directory authorities and updated hourly.

Tor provides two forms of anonymity:

Unobservability: Observers cannot link participants to actions
Unlinkability: Multiple actions cannot be associated as related

Circuits

Users build circuits through three relays (entry, middle, exit). Each relay only knows its immediate neighbors in the chain. The entry relay knows the user's IP address but not the destination. The exit relay knows the destination but not the user.

Circuit setup uses the relay public keys from the consensus document to establish session keys:

User establishes TLS link to Relay1, uses Relay1's public key to negotiate symmetric session key S1
User extends circuit to Relay2 through Relay1, uses Relay2's public key to negotiate session key S2
User extends circuit to Relay3 through Relay1 and Relay2, uses Relay3's public key to negotiate session key S3

Only each respective relay can decrypt its session key. Messages are then encrypted in layers (S3, then S2, then S1). Each relay strips one layer before forwarding.

Tor is not a VPN. It does not encapsulate IP packets but relays data streams. End-to-end TLS can still be used on top of Tor.

Limitations

Correlation attacks: If an attacker observes both entry and exit traffic, they can correlate timing and message sizes to link users to destinations.

Compromised exit nodes: Exit relays decrypt the final layer and contact destinations. Unencrypted traffic is visible to the exit node.

Sybil attacks: An adversary running many relays (entry and exit) can break anonymity for circuits using their nodes.

Censorship: Governments can block known Tor relays. Tor addresses this with bridges--unlisted relays using obfuscated protocols (obfs4) to disguise Tor traffic.

I2P (Invisible Internet Project)

I2P uses garlic routing, which bundles multiple messages together at each relay (making traffic analysis harder). Unlike Tor's bidirectional circuits, I2P uses separate unidirectional tunnels for inbound and outbound traffic.

Tor focuses on anonymous access to services. I2P focuses on anonymous hosting of services using a distributed hash table for routing.