Taxonomy of Steganography

Bit plane slices 1

Aphex Twin's Windowlicker EP, 1999

A schizophrenic PDF (screen ~ printer)

Historical case #1 During WWII, a Nazi captive cross-stitches a seemingly innocently looking decorative pattern around the border. The Nazi's never found out that it was Morse code spelling out: "God Save the King" and "Fuck Hitler".

Amy Suo Wu, A Media Archeology of Steganography, 2015

Research ideas

Research questions:

How would be possible to set up a pirate library through steganography?
- Which technologies would be used?
- Where would be the books hidden? JPG, PDF, MP3, WAV, EXIF?
- Which books would be included?
- How do you link them?
- Which interface to use?
- Which cataloging system to use?
- How could layering (text, images, metadata) become the navigation element?

How could be pirated & steganographed books brought back to the "official" library?
- How could the reader find those books in the library?

How would it be possible to translate the texts into audio and while playing the audio the images would appear in frequency levels?

For my research I want to look into using several steganography tools through python to explore abusing file formats, such as hiding books in other books, texts in images, images in audio, etc. These experiments would create the foundation for programming the steganographed pirate library.

Research thought:

interest in hiding files in other files (imagine a PDF with an audio file?) A sound file has a whole library inside it. JPEG has free space inside it. (metadata ... EXIF data .... ) Steganography

hiding books in other books

censorship

books on the blacklist

read&seek

pirate library is pirating it’s own files

hiding pirated books in “official” library

Brief history of Steganography

480 BC Wooden tablets and beeswax

494 BC Head tattoo

1558 Hidden messages in hard boiled eggs

1585 Beer barrel

1680 Musical notes

1800 Newspaper code

1915 Invisible ink

1941 Microdotes

1980's Thatcher's watermarking

1990 Digital steganography

2003 Network steganography

current VoIP steganography

Research references

→ Funky File Formats
Binary tricks to evade identification, detection, to exploit encryption and hash collisions.

→ Steganography

Digital steganography, a set of algorithmic techniques for hiding data in files, is often used to hide text messages (or other digital content) within the bits of an image. In contrast to cryptography, steganography allows to hide the very fact that you are trying to hide something, an aspect that makes it really desirable for hidden communications or classified information leakage.

→ Javier Lloret - On opacity (2016)

→ Amy Suo Wu - A research project on analog steganography* and alternative forms of communication in the age of pervasive digital surveillance

→ Hiding in Plain Sight. Amy Suo Wu's The Kandinsky Collective

→ Aphex Twin's hidden message

→ british pow uses morse code to stitch hidden message during wwii

→ Script

→ Introduction to Steganography

→ Discussion on Steganography

→ Using PIL → Hack This: Extract Image Metadata Using Python

→ ExifRead 2.1.2 Exif

→ LSB-Steganography

→Steganography and Python 1

→Steganography and Python 2

→Wavelet compression

→BMP PCM polyglot

→ PDF hide Wiki

→ Shangping Zhong, Xueqi Cheng, and Tierui Chen. Data hiding in a kind of PDF texts for secret communication. International Journal of Network Security, 4(1):17–26, 2007

→ Using Steganography to hide messages inside PDF files

Bibliography

Articles are saved in this Zotero library.

Python experiments

#1 experiment based on the script of steganography the art science of hiding things in other things part 1

Outcome of the #1 experiment

Outcome of the #2 experiment

# let's get our message set up

message = list('Steganography')

    # convert to binary representation

message = ['{:07b}'.format(ord(x)) for x in message]

print("Message as binary:")

print(message)

    # split the binary into bits

message = [[bit for bit in x] for x in message]

    # flatten it and convert to integers

message = [int(bit) for sublist in message for bit in sublist]

print("Message as list of bits:")

print(message)

#2 experiment based on the script of the art and science of hiding things in other things part 2

from PIL import Image, ImageFilter

import numpy as np

message ='Digital steganography, a set of algorithmic techniques for hiding data in files, is often used to hide text messages (or other digital content) within the bits of an image. In contrast to cryptography, steganography allows to hide the very fact that you are trying to hide something, an aspect that makes it really desirable for hidden communications or classified information leakage.'   

    # first, open the original image

imgpath = 'steganography_test_1.bmp'

img = Image.open(imgpath)

    

    # we'll use simple repetition as a very rudimentary error correcting code to try to maintain integrity

    # each bit of the message will be repeated 9 times - the three least significant bits of the R,G, and B values of one pixel

imgArray = list(np.asarray(img))

    
""" given a value, which bit in the value to set, and the actual bit (0 or 1) 
            to set, return the new value with the proper bit flipped """
def set_bit(val, bitNo, bit):

    mask = 1 << bitNo

    val &= ~mask

    if bit:

        val |= mask

    return val

    

msgIndex = 0

newImg = []

    # this part of the code sets the least significant 3 bits of the 

    # R, G, and B values in each pixel to be one bit from our message

    # this means that each bit from our message is repeated 9

    # times - 3 each in R, G, and B. This is a waste, technically 

    # speaking, but it's needed in case we lose some data in transit

    # using the last 3 bits instead of the last 2 means the image looks

    # a little worse, visually, but we can store more data in it - a tradeoff

    # the more significant the bits get, as well, the less likely they are to be

    # changed by compression - we could theoretically hide data in the

    # most significant bits of the message, and they would probably never

    # be changed by compression or etc., but it would look terrible, which

    # defeats the whole purpose

for row in imgArray:

    newRow = []

    for pixel in row:

        newPixel = []

        for val in pixel:

            # iterate through RGB values, one at a time

            if msgIndex >= len(message):

                    # if we've run out of message to put in the image, just add zeros

                setTo = 0

            else:

                    # get another bit from the message

                setTo = message[msgIndex]

                # set the last 3 bits of this R, G, or B pixel to be whatever we decided 

            val = set_bit(val, 0, setTo)

            val = set_bit(val, 1, setTo)

            val = set_bit(val, 2, setTo)

                    

                # continue to build up our new image (now with 100% more hidden message!)

            newPixel.append(val) # this adds an R, G, or B value to the pixel

            # start looking at the next bit in the message

        msgIndex += 1

        newRow.append(newPixel) # this adds a pixel to the row

newImg.append(newRow) # this adds a row to our image array

    

arr = np.array(newImg, np.uint8) # convert our new image to a numpy array

im = Image.fromarray(arr)
im.save("image_steg.bmp")




# open the image and extract our least significant bits to see if the message made it through

    

img = Image.open(imgpath)

imgArray = list(np.asarray(img))

    

    # note that message must still be set from the code block above

    # (or you can recreate it here)

origMessage = message[:20] # take the first 20 characters of the original message

    # we don't use the entire message here since we just want to make sure it made it through

print("Original message:")

print(origMessage)

    
message = []

    

for row in imgArray:

    for pixel in row:

            # we'll take a count of how many "0" or "1" values we see and then go with

            # the highest-voted result (hopefully we have enough repetition!)

        count = {"0": 0, "1": 0}

        for val in pixel:

                # iterate through RGB values of the pixel, one at a time

                # convert the R, G, or B value to a byte string

            byte = '{:08b}'.format(val)

                # then, for each of the least significant 3 bits in each value...

            for i in [-1, -2, -3]:

                    # try to get an actual 1 or 0 integer from it

                try:

                    bit = int(byte[i])

                except:

                        # if, somehow, the last part of the byte isn't an integer...?

                        # (this should never happen)

                    print(bin(val))

                    raise

    

                    # count up the bits we've seen

                if bit == 0:

                    count["0"] += 1

                elif bit == 1:

                    count["1"] += 1

                else:

                    print("WAT")

                        

            # and once we've seen them all, decide which we should go with

            # hopefully if compression (or anything) flipped some of these bits,

            # it will flip few enough that the majority are still accurate

        if count["1"] > count["0"]:

            message.append(1)

        else:

            message.append(0)

    

    # even though we extracted the full message, we still only display the

    # first 20 characters just to make sure they match what we expect

print("Extracted message:")            

print(message[:20])

Encoding a text message based on the script of ASCII

ASCII script outcome

for ch in "Digital steganography!":
    d = ord(ch)
    b = bin(d)
    print(ch, d, b)

#4 experiment based on the script of https://github.com/RobinDavid/LSB-Steganography/LSB-Steganography]

Input image png

Input txt file

Command line command

Decode output

#!/usr/bin/env python
# coding:UTF-8
"""LSBSteg.py

Usage:
  LSBSteg.py encode -i <input> -o <output> -f <file>
  LSBSteg.py decode -i <input> -o <output>

Options:
  -h, --help                Show this help
  --version                 Show the version
  -f,--file=<file>          File to hide
  -i,--in=<input>           Input image (carrier)
  -o,--out=<output>         Output image (or extracted file)
"""

import cv2
import docopt
import numpy as np


class SteganographyException(Exception):
    pass


class LSBSteg():
    def __init__(self, im):
        self.image = im
        self.height, self.width, self.nbchannels = im.shape
        self.size = self.width * self.height
        
        self.maskONEValues = [1,2,4,8,16,32,64,128]
        #Mask used to put one ex:1->00000001, 2->00000010 .. associated with OR bitwise
        self.maskONE = self.maskONEValues.pop(0) #Will be used to do bitwise operations
        
        self.maskZEROValues = [254,253,251,247,239,223,191,127]
        #Mak used to put zero ex:254->11111110, 253->11111101 .. associated with AND bitwise
        self.maskZERO = self.maskZEROValues.pop(0)
        
        self.curwidth = 0  # Current width position
        self.curheight = 0 # Current height position
        self.curchan = 0   # Current channel position

    def put_binary_value(self, bits): #Put the bits in the image
        for c in bits:
            val = list(self.image[self.curheight,self.curwidth]) #Get the pixel value as a list
            if int(c) == 1:
                val[self.curchan] = int(val[self.curchan]) | self.maskONE #OR with maskONE
            else:
                val[self.curchan] = int(val[self.curchan]) & self.maskZERO #AND with maskZERO
                
            self.image[self.curheight,self.curwidth] = tuple(val)
            self.next_slot() #Move "cursor" to the next space
        
    def next_slot(self):#Move to the next slot were information can be taken or put
        if self.curchan == self.nbchannels-1: #Next Space is the following channel
            self.curchan = 0
            if self.curwidth == self.width-1: #Or the first channel of the next pixel of the same line
                self.curwidth = 0
                if self.curheight == self.height-1:#Or the first channel of the first pixel of the next line
                    self.curheight = 0
                    if self.maskONE == 128: #Mask 1000000, so the last mask
                        raise SteganographyException("No available slot remaining (image filled)")
                    else: #Or instead of using the first bit start using the second and so on..
                        self.maskONE = self.maskONEValues.pop(0)
                        self.maskZERO = self.maskZEROValues.pop(0)
                else:
                    self.curheight +=1
            else:
                self.curwidth +=1
        else:
            self.curchan +=1

    def read_bit(self): #Read a single bit int the image
        val = self.image[self.curheight,self.curwidth][self.curchan]
        val = int(val) & self.maskONE
        self.next_slot()
        if val > 0:
            return "1"
        else:
            return "0"
    
    def read_byte(self):
        return self.read_bits(8)
    
    def read_bits(self, nb): #Read the given number of bits
        bits = ""
        for i in range(nb):
            bits += self.read_bit()
        return bits

    def byteValue(self, val):
        return self.binary_value(val, 8)
        
    def binary_value(self, val, bitsize): #Return the binary value of an int as a byte
        binval = bin(val)[2:]
        if len(binval) > bitsize:
            raise SteganographyException("binary value larger than the expected size")
        while len(binval) < bitsize:
            binval = "0"+binval
        return binval

    def encode_text(self, txt):
        l = len(txt)
        binl = self.binary_value(l, 16) #Length coded on 2 bytes so the text size can be up to 65536 bytes long
        self.put_binary_value(binl) #Put text length coded on 4 bytes
        for char in txt: #And put all the chars
            c = ord(char)
            self.put_binary_value(self.byteValue(c))
        return self.image
       
    def decode_text(self):
        ls = self.read_bits(16) #Read the text size in bytes
        l = int(ls,2)
        i = 0
        unhideTxt = ""
        while i < l: #Read all bytes of the text
            tmp = self.read_byte() #So one byte
            i += 1
            unhideTxt += chr(int(tmp,2)) #Every chars concatenated to str
        return unhideTxt

    def encode_image(self, imtohide):
        w = imtohide.width
        h = imtohide.height
        if self.width*self.height*self.nbchannels < w*h*imtohide.channels:
            raise SteganographyException("Carrier image not big enough to hold all the datas to steganography")
        binw = self.binary_value(w, 16) #Width coded on to byte so width up to 65536
        binh = self.binary_value(h, 16)
        self.put_binary_value(binw) #Put width
        self.put_binary_value(binh) #Put height
        for h in range(imtohide.height): #Iterate the hole image to put every pixel values
            for w in range(imtohide.width):
                for chan in range(imtohide.channels):
                    val = imtohide[h,w][chan]
                    self.put_binary_value(self.byteValue(int(val)))
        return self.image

                    
    def decode_image(self):
        width = int(self.read_bits(16),2) #Read 16bits and convert it in int
        height = int(self.read_bits(16),2)
        unhideimg = np.zeros((width,height, 3), np.uint8) #Create an image in which we will put all the pixels read
        for h in range(height):
            for w in range(width):
                for chan in range(unhideimg.channels):
                    val = list(unhideimg[h,w])
                    val[chan] = int(self.read_byte(),2) #Read the value
                    unhideimg[h,w] = tuple(val)
        return unhideimg
    
    def encode_binary(self, data):
        l = len(data)
        if self.width*self.height*self.nbchannels < l+64:
            raise SteganographyException("Carrier image not big enough to hold all the datas to steganography")
        self.put_binary_value(self.binary_value(l, 64))
        for byte in data:
            byte = byte if isinstance(byte, int) else ord(byte) # Compat py2/py3
            self.put_binary_value(self.byteValue(byte))
        return self.image

    def decode_binary(self):
        l = int(self.read_bits(64), 2)
        output = b""
        for i in range(l):
            output += chr(int(self.read_byte(),2)).encode("utf-8")
        return output


def main():
    args = docopt.docopt(__doc__, version="0.2")
    in_f = args["--in"]
    out_f = args["--out"]
    in_img = cv2.imread(in_f)
    steg = LSBSteg(in_img)

    if args['encode']:
        data = open(args["--file"], "rb").read()
        res = steg.encode_binary(data)
        cv2.imwrite(out_f, res)

    elif args["decode"]:
        raw = steg.decode_binary()
        with open(out_f, "wb") as f:
            f.write(raw)


if __name__=="__main__":
    main()