Monday, June 2, 2014

Who's on First? Understanding Bases.

There was a question about Base64 so lets talk about bases.
A "base" is how many "things" you have to communicate with.

In English you have 26 letters.
English is Base26, if you only use lower case "abcdefghijklmnopqrstuvwxyz".

If you include upper case "ABCDEFGHIJKLMNOPQRSTUVWXYZ" then you just included 26 more bases.

Many humans like English. (Base26)
Computers like to read Binary using 1 or 0. (Base2)

Some humans read Japanese.
It is said that you need to know at least 3000 Japanese characters (Kanji, Katakana, Hiragana) to read a Japanese newspaper.
That's Base3000 and that's not even all of the Japanese characters!

When successfully communicating we have to change (convert, encode, translate, whatever) your message from one base to another base.

You are hoping that someone who reads the message will NOT be able to understand it.

For example the APT Malware had the call back - ''
The creator could have easily left the callback in English.

But instead, it was converted into Base64 to hide from those who easily read English.
It becomes necessary for us to recognize the code we are seeing and convert it into something we are better at reading.

    Base2   (Binary)             = 1 or 0
    Base4   (DNA)               = ATCG
    Base10 (Decimal)          = 0123456789
    Base16 (Hexadecimal)   = 0123456789abcdef
    Base64 = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=

Here we convert the word "MalwareViz" into other base options.

Base2 (Binary) = 1 or 0
    M              a              l                w               a              r              e               V              i     
    01001101 01100001 01101100 01110111 01100001 01110010 01100101 01010110 01101001

Base4 (DNA)  = ATCG
    M                    a                     l                     w                     a                       r                            

    e                       V                      i                   z

Base16 (hex) = 0123456789abcdef
    M  a   l    a    r   e   V    i   z
    4d 61 6c 77 61 72 65 56 69  7a

Base64 = ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=

Base Japanese

Using Python:
  Base16 (hex)
    >>> 'MalwareViz'.encode('hex')
    >>> '4d616c7761726556697a'.decode('hex')

    >>> 'MalwareViz'.encode('base64')
    >>> 'TWFsd2FyZVZpeg==\n'.decode('base64')

