Steganography &Watermarking

Hiding data

Paul Krzyzanowski

Jan 11, 2024

Introduction

In the 17th century, Sir John Trevanion, an English politician during the time of the English Civil War, faced execution with little hope of escape. On the brink of death, a letter from his loyal servant arrived. To the guards, it seemed like an ordinary, rambling message offering sympathy and encouragement. But to Sir John, the letter carried a hidden meaning, one that only he could decipher.

Worthie Sir John, –Hope, that is ye beste comfort of ye afflicted, cannot much, I fear me, help you now. That I would saye to you, is this only: if ever I may be able to requite that I do owe you, stand not upon asking me. 'Tis not much that I can do: but what I can do, bee ye verie sure I wille. I knowe that, if dethe comes, if ordinary men fear it, it frights not you, accounting it for a high honour, to have such a rewarde of your loyalty. Pray yet that you may be spared this soe bitter, cup. I fear not that you will grudge any sufferings; only if bie submission you can turn them away, 'tis the part of a wise man. Tell me, an if you can, to do for you anythinge that you wolde have done. The general goes back on Wednesday. Restinge your servant to command. R.T.

Using a shared key known to both men, Sir John read the secret message embedded in the letter: by looking at the third character after each punctuation mark, the instructions emerged: “Panel at east end of chapel slides.” With this knowledge, Sir John requested a final hour of reflection in the chapel. There, he followed the directions, found a hidden panel, and escaped through a concealed tunnel, evading his execution¹.

This clever method of embedding a hidden message within an innocuous letter is an example of a concealment cipher (also known as a null cipher), a type of cipher that embeds secret information within ordinary text. By instructing Sir John to look at the third character after each punctuation mark, the servant concealed the true message in plain sight, making the letter appear innocent to anyone else. This technique fits into the broader concept of steganography, which focuses on hiding the existence of a message rather than encrypting its content.

Steganography is the practice of concealing information in plain sight, such that only the intended recipient knows how to uncover it. Unlike cryptography, which scrambles a message to obscure its content, steganography hides the very existence of the message, often within ordinary texts, images, or objects. Sir John’s escape illustrates the power and creativity of this technique.

Steganography has been used throughout history to transmit secret information without arousing suspicion. Some classic techniques include:

Concealment Ciphers (Null Ciphers):
Concealment ciphers, which we just discussed, hide secret messages within seemingly ordinary text using predefined patterns or rules. For example, the hidden message might be revealed by taking the first letter of every word, the third character after punctuation, or every nth letter in the text.
Invisible Ink:
Invisible ink has been used for centuries to conceal messages that only become visible under certain conditions, such as exposure to heat, ultraviolet light, or specific chemicals. During the American Revolution, both George Washington’s spies and British agents used invisible ink to transmit sensitive military plans. Lemon juice, for instance, was a common choice and would become visible when exposed to heat, allowing messages to remain hidden from prying eyes until intentionally revealed.
Microdots:
Microdots involve shrinking messages to the size of a dot, often smaller than a period, which can be embedded within images, letters, or documents. These were widely employed during World War II by spies and intelligence agencies. German agents used microdots to conceal detailed blueprints, maps, or secret instructions on documents that appeared harmless. Magnification tools were required to reveal the information, making it an effective tool for espionage.
Hidden Text in Artwork or Objects:
Artists and craftspeople have historically embedded secret messages in their works to communicate hidden meanings or instructions. For example, during the Reformation, Protestant dissenters in Catholic-controlled regions concealed religious or political messages in artwork. Similarly, spies in World War II encoded escape routes and instructions into maps disguised as paintings or decorative items.
Writing Messages on One’s Head and Covering Them with Hair:
This unusual but effective technique dates back to ancient Greece. The historian Herodotus recounts how a message was written on a servant’s shaved head, which was then allowed to grow hair to conceal it. The servant traveled to deliver the message, which was revealed by shaving their head again. This method ensured that the message remained completely hidden during transit.
Carefully-Clipped Newspaper Articles:
Spies and informants have often used this method to communicate messages by clipping and arranging words or phrases from newspapers. The resulting collage conveyed the secret message while appearing as an innocent collection of clippings. This technique was particularly common in the 20th century when newspapers were widely available and their contents provided plausible deniability.
Knitting Patterns:
Messages have been encoded into knitting patterns, where the arrangement of stitches—such as purls and knits—corresponds to letters or words in a secret code. This technique was reportedly used by spies during World War II. Women knitting in occupied territories would embed instructions, troop movements, or other intelligence into seemingly ordinary scarves or blankets, which could then be passed unnoticed.
Signatures, Like “XOXO”:
Even seemingly ordinary signatures can contain hidden meanings. For example, simple patterns like “XOXO” (commonly used to signify hugs and kisses) could be used as a steganographic code if the sender and recipient had a predefined understanding of what the pattern represented. Variations in spacing, capitalization, or repetition could further encode specific instructions or messages.
Word or Letter Substitution:
This technique hides messages by subtly altering text, such as changing capitalization, spacing, or fonts. A historical example comes from prisoners in the 16th century who marked specific letters in books to encode messages. Similarly, acrostic poetry, where the first letters of each line spell out a hidden message, was often used during the Renaissance to convey covert information.
Chaffing and Winnowing:
Chaffing and winnowing takes its name from the process of separating wheat (the valuable grain) from chaff (the worthless husks).

This is a modern steganographic technique introduced by Ron Rivest in 1998. It doesn’t encrypt the message but instead hides it by pairing the real message (the “wheat”) with irrelevant data (the “chaff”) and transmitting both together. Each message is accompanied with a MAC or digital signature using a key known only to trusted parties. Intruders can see all the messages but can’t separate the meaningful message from the noise. This method doesn’t rely on hiding the data but rather on making it indistinguishable from meaningless information. It has practical applications in digital communication where authentication without encryption may be needed.

The purpose of all of these methods was the same: ensuring that secret messages could evade detection by blending into everyday objects and activities.

Image steganography

Messages can be embedded into images. This is arguably the most common way of using steganography.

There are three common ways of hiding a message in an image:

Most image formats have fields for storing textual metadata in addition to the image. PNG files, for instance, have a text field and jpeg, tiff, and raw image formats support Exif (Exchangeable image file format) data fields. This shouldn’t be considered steganography, however, since these fields are well-known and hidden only in the sense that they are not part of the image. However, they can be an effective way to transport data covertly and can be used to bypass content filtering firewalls that may consider images to be harmless.
A straightforward method to hide a message in an image is to use low-order bits (least significant bits) of an image, where the user is unlikely to notice slight changes in color. This is known as LSB steganography. An image is a collection of RGB pixels. You can mess around with the least significant bits and nobody will notice changes in the image, so a message can be encodd simply by spreading the bits of the message among the least-significant bits of the image.
You can do a similar thing but apply a frequency domain transformation, like jpeg compression does, by using a Discrete Cosine Transform (DCT). The frequency domain maps the image as a collection ranging from high-frequency areas (e.g., “noisy” parts such as leaves, grass, and edges of things) through low-frequency areas (e.g., a clear blue sky). Changes to high-frequency areas will generally be unnoticed by humans; that’s why jpeg compression works. Because modifications in these regions are unnoticed, you can add the message into those areas and then transform the data back into the spatial (bitmap) domain. Now the message is spread throughout the higher-frequency parts of the image and can be extracted if you do the DCT again and know where to look for the message.

Audio Steganography

Similar to images, audio files can host malware in their least significant bits. Also similar to images, audio steganography can take advantage of the same psychoacoustic analysis that audio compression algorithms use: place the bits in areas where human listeners simply won’t notice the distortion. Techniques like echo hiding, phase coding, and spread spectrum can embed data within audio signals without significantly altering the audio’s perceptible qualities.

In 2024, researchers at Meta created AudioSeal, a new technique to add and detect hidden watermarks in AI-generated speech. A specific goal of this is to make it possible to detect watermarks in snippets of audio to identify its use in deepfakes. It hides 32 bits of watermark data in one-second audio segments, ensuring that the watermark can be detected even if parts of the audio are cropped.

AudioSeal uses two neural networks: one to generate the watermark and another to detect it. It uses a training method to minimize the perceived distortion between the original and watermarked audio while maximizing the detection of the watermark. As part of the training, the audio is altered through various techniques (bandpass filter, boost audio, duck audio, echo, highpass filter, lowpass filter, pink noise, gaussian noise, slower, smooth, resample) to increase the likelihood that the watermark will survive the recoding or compression of the audio.

While AudioSeal produces the best results of any audio watermarking technology to date, it is still subject to adversarial attacks. Specifically, the more information about the algorithm is disclosed to attackers, the easier it is to mount an attack that will obscure the watermark. The authors propose keeping the training parameters secret. AudioSeal is freely available on github.

Video and Network Steganography

Video files are a largely a combination of audio and images (more commonly images and then motion vectors and changes to images). They provide a larger capacity for embedding more data.

Network steganography

Data can also be within network communication. The non-hidden communication can be an innocent data stream. Network steganography can embed additional data in packet headers or timing intervals between packets.

Steganography for malware delivery and exfiltration

Steganography has become a useful mechanism for attackers to deliver malware because malicious data can be hidden in “innocent” content, such as an image, and neither detected nor blocked by content-inspecting firewalls or intrusion detection systems. Similarly, attackers can use steganography to exfiltrate data from an organization by uploading images, audio, or other non-suspicious data.

For example, in April 2024, a report about the SteganoArmor campaign came out. The hacking group TA558 has been using a sophisticated method of delivering malware through the use of steganography, specifically targeting the hospitality and tourism sectors predominantly in Latin America. This method has been implicated in over 320 cyber attacks across various sectors and regions. The attacks exploit a known and old vulnerability in Microsoft Office’s Equation Editor, CVE-2017–11882, which has been patched since 2017 but still poses a threat to systems running outdated software versions.

The SteganoAmor campaign begins with phishing emails that appear benign but contain malicious document attachments, leveraging both Excel and Word formats. These documents exploit the CVE-2017–11882 vulnerability to download a Visual Basic Script from a legitimate online service upon being opened. When the script runs, it downloads a JPEG image from the internet. This image carries a hidden base-64 encoded payload. Subsequently, a PowerShell script embedded within the image downloads the final payload that is hidden in a text file, which then installs the malware. By using compromised SMTP servers, TA558 enhances the likelihood of their phishing emails bypassing standard email filters, as these messages are sent from legitimate domains. This campaign shows a blend of using old vulnerabilities and steganography to orchestrate targeted attacks.

Printers

An application of steganography is found in most modern color laser printers. This technique involves embedding a subtle pattern of nearly invisible yellow dots on every page printed. These dots, while typically undetectable to the naked eye, can be seen under specific lighting conditions or with the aid of magnifying equipment. The purpose of this steganographic method is to encode information directly onto the printed medium.

The encoded data within the yellow dot patterns typically includes the printer’s serial number and the date and time of the document’s printing. This allows each printed page to carry a unique identifier that can trace it back to its source printer, providing a valuable tool for tracking the origin of documents. For example, law enforcement agencies can use the information encoded in the dot patterns to trace counterfeit documents back to the printer that produced them, aiding in criminal investigations and the protection of sensitive information.

Watermarking

Steganography is closely related to watermarking. and the terms “steganography” and “watermarking” are often used interchangeably. Steganography can be thought of as invisible watermarking.

The primary goal of watermarking is to create an indelible imprint on a message such that an intruder cannot remove or replace the message. It is often used to assert ownership, authenticity, or encode DRM rules. The message may be, but does not have to be, invisible.

The goal of steganography is to allow primarily one-to-one communication while hiding the existence of a message. An intruder – someone who does not know what to look for – cannot even detect the message in the data.

See this article in Cryptiana for a discussion. The validity of this story is in doubt. While it’s been presented by many authors, there don’t seem to be any primary sources for it and some details are questionable. ↩︎