Level : beginner
This is a small, and by no means complete, digestion of some crypto techniques.
Featuring :
- Transposition
- Substitution
- Frequency analysis
- Le Chiffre Indéchiffrable
- Charles Babbage and Vigenére
- Playfair
- ADFGVX crypto
Transposition
One method of crypting a text is transposition.
One simple traspose is to write out the message on two rows.
Letters 1,3, 5 and so on, you write on the first row.
Letters 2, 4, 6 etc. you write on the second row.
The message:
Your secret is your prisoner, let it go and you become its prisoner.
Row 1:
Yusceiyupioelyyonyueoeypioi
Row 2:
orertsorrsnreigadobcmisrsnr
Crypted text:
Yusceiyupioelyyonyueoeypioiorertsorrsnreigadobcmis rsnr
Substitution
Another method is substitution where you substitute one letter with another letter.
Code
====
A code substitute a word or group of letter.
Monoalphabetic substitution
=======================
One variant is the Caesar roll. This is based on crypto alphabet that's shifted a number
of steps relative the clear text alphabet. It's common in cryptography to write out the
clear text in lowercase letter and the crypted text in uppercase letter.
Clear alphabet:
abcdefghijklmnopqrstuvwxyz
Crypto alphabet:
DEFGHIJKLMNOPQRSTUVWXYZABC (here with a three letter right rotation)
Clear text:
veni, vidi vici
Crypto text:
YHQL, YLGL, YLFL
To crypt a text, you write out the crypto alphabet under the clear alphabet.
Then you take the letter from the text, here starting with "v", and go to the
column where v is in the clear alphabet.
Look in the same column in the crypto alphabet to get the letter to substitute with.
Here v is substituted with Y. Repeat this for all letters in the text.
As you may see, this is not a very strong protection. With a computer it's easy to
try all possible shifts, where each shift can be regarded as a key to the crypto.
Monoalphabetic substitution with key
====================================
If you choose a more general algorithm where the clear alphabet can be reordered in any possible way,
you get 400 000 000 000 000 000 000 000 000 possible keys to choose from. If the enemy gets hold of
the crypted text and he knows the algorithm, he still have to test all keys to decrypt the message.
If he test one key per second, he approximately would need the time equal the age of
universe multiplied with 10e9.
Clear alphabet:
abcdefghijklmnopqrstuvwxyz
Crypto alphabet:
DJKTUVCWNOLPAEGFHIQRXYMSZB ("random" order)
Clear text:
ettu, brute?
Crypto text:
URRX, JIXRU?
It can be shown with mathematical proof, that if you use a random key as long as the text you want to crypt,
the crypted text CAN NOT be decrypted if you don't know the key. In any crypto a weak spot is repetitions
that occur if you use a to short key. Another thing to remember is that you can't use the same key twice,
and you must use a really random key. If you use the key more then once, there's methods to break the crypto.
A variant to this, but with a less number of keys, is to use a key or key phrase. Instead of a random alphabet
you take a key or a key phrase, for example "Julius Ceasar". Remove all spaces and duplicate letters and use
the rest, JULISCAER, as beginning of the crypto alphabet. The rest of the alphabet is just a shift beginning
where the keyword ends and with the letters in the keyword removed from the alphabet.
Clear alphabet:
abcdefghijklmnopqrstuvwxyz
Crypto alphabet:
JULISCAERTVWXYZBDFGHKMNOPQ
This can also be rotated a number of steps to.
Crypto alphabet:
CAERTVWXYZBDFGHKMNOPQJULIS
A good thing about this is that it is easy to remember the key word or key phrase. The simplicity in combination
with the strength made this the common method to crypt a text the first thousand years a.c.
Anyone interested in cracking a monoalphabetic text can look at the mono crackme in the crackme's.
But there is ways to break this crypto, the Arabs was the first ones out.
Frequency analysis
The oldest known description, by an Arab, to break a monoalphabetic crypto is from 800 a.c.
The trick is that in any given language there's letters that's used more frequently.
If you know the letter frequency for a language, you take the letters frequency in the crypted
text and substitute it with the one that have about the same frequency for that language.
It's relatively easy to make a qualified guess what letter it is.
The same rules can be applied on one, two, three or more, letter words.
As you are a cracker, it's easy to write a small app to analyse some text files on your computer.
What kind of text you analyse of course affects the result of the letter frequency. If you for example
analyze a sorce code file, you get a funny letter and word distribution. If it's a *.asm file you get a
lot of three letter words like: eax,ebx,ecx...:-).
Le Chiffre Indéchiffrable
For many centuries the monoalphabetic substitution crypto was the prevailing method to write a secret text.
But after the Arabs invention of the frequency analyze, this method was no longer a safe one to use.
The original idea for this new crypto came from a man named Leon Battista Alberti, born 1404.
His idea was use two or more crypto alphabet and switch between them.
Clear alphabet:
abcdefghijklmnopqrstuvwxyz
Crypto alphabet 1:
FZBVKIXAYMEPLSDHJORGNQCUTW
Crypto alphabet 2:
GOXBFWTHQILAZPJDESYVCRKUHN
If we crypt the word "hello" we use the first alphabet for h, that become A.
For the second letter e we use the second alphabet, e becomes F, and so on.
With this method hello becomes AFPAD.
A man named Blaise de Vigenére, born 1523, developed this idea further.
Instead of using two or three alphabet, he used 26 (for a-z for English).
Although he based his work on ideas of previous thinkers, this method is known as the Vigenére crypto.
The use of more than one crypto alphabet gives this type of crypto the name polyalphabetic crypto.
The first step to use this is to write out a Vigenére table. This is done by writing down a clear alphabet
followed by 26 crypto alphabets, each one rotated on step in relation to the previous one.
abcdefghijklmnopqrstuvwxyz1 BCDEFGHIJKLMNOPQRSTUVWXYZA2 CDEFGHIJKLMNOPQRSTUVWXYZAB3 DEFGHIJKLMNOPQRSTUVWXYZABC4 EFGHIJKLMNOPQRSTUVWXYZABCD5 FGHIJKLMNOPQRSTUVWXYZABCDE6 GHIJKLMNOPQRSTUVWXYZABCDEF7 HIJKLMNOPQRSTUVWXYZABCDEFG8 IJKLMNOPQRSTUVWXYZABCDEFGH9 JKLMNOPQRSTUVWXYZABCDEFGHI10 KLMNOPQRSTUVWXYZABCDEFGHIJ11 LMNOPQRSTUVWXYZABCDEFGHIJK12 MNOPQRSTUVWXYZABCDEFGHIJKL13 NOPQRSTUVWXYZABCDEFGHIJKLM14 OPQRSTUVWXYZABCDEFGHIJKLMN15 PQRSTUVWXYZABCDEFGHIJKLMNO16 QRSTUVWXYZABCDEFGHIJKLMNOP17 RSTUVWXYZABCDEFGHIJKLMNOPQ18 STUVWXYZABCDEFGHIJKLMNOPQR19 TUVWXYZABCDEFGHIJKLMNOPQRS20 UVWXYZABCDEFGHIJKLMNOPQRST21 VWXYZABCDEFGHIJKLMNOPQRSTU22 WXYZABCDEFGHIJKLMNOPQRSTUV23 XYZABCDEFGHIJKLMNOPQRSTUVW24 YZABCDEFGHIJKLMNOPQRSTUVWX25 ZABCDEFGHIJKLMNOPQRSTUVWXY26 ABCDEFGHIJKLMNOPQRSTUVWXYZ
The first row is a crypto alphabet with a Caesar roll, a rotation of one step.
The second one have a two step rotation and so on.
To use it, you use a new row for each letter you crypt.
To decrypt it the user must know which row to use for which letter.
You maybe use row 5 for the first letter, row 14 for the second, row 21 for the third and so on.
To decrypt the text the receiver must know which row shifts to use.
One way to do this is to use a key word.
To crypt the message "Begin attack at sundown" and the key GREEN we do like this.
Write out the key word above the message over and over, so each letter in the message
is linked to a letter in the key word.
GREENGREENGREENGREENbeginattackatsundown
Then you create the crypto text like this:
Crypt the letter b by taking the letter in the key word above, here it is G.
This letter is the index to a row in the Vigenére table.
The row beginning with G, row 6, is the crypto alphabet to use for
the first letter b. In the row for the clear alphabet, go to the column
for the letter b. To find the substitution letter, follow this column down
to row 6, here you find the letter H to substitute for b.
Repeat this procedure for the letter e. The key letter above e is R which gives
us the row that start with R, row number 17. In the clear alphabet, go to the column
for the letter e in the clear alphabet, follow this column down to row 17 to get the substitution letter V.
Repeat this for each letter in the message.
Each letter in the keyword gives us a different crypto alphabet to use.
The number of letters in the key word gives us the number of rows to use in the
Vigenére table. In this example we use four rows, G, R, E and N (E is doubled here).
The message "beginattackatsundown" crypted with the keyword GREEN comes out as:
HVKMAGKXEPQRXWHTUSAA
With a longer keyword, or a key sentence, you add more rows from the table you make the crypto more
difficult to break. The frequency analysis, described above, has no effect on this kind of crypto.
If you have a word with at double letter, like attack above, the first t becomes K and the second becomes X.
If you do a frequency analysis of this, there's no way to tell that this is the letter t.
Another strength is the enormous number of keys you can use. Any word in the book can be used,
you even can make up you own words. It's just not possible to try all possible combinations.
The work by Vigenéres was published in Tracicté des Chiffres 1586 but did not come to common use
until 200 years later.
Charles Babbage and Vigenére
As mentioned earlier, the strength of the Vigenére crypto is that a letter can be crypted on a number of
different ways. If the keyword is KING is used, there's four different ways to crypt a given letter.
The same apply to words that's crypted. The word "the" can be crypted as, DQR, BUK, GNO or ZRM depending on
the words relation to the keyword KING. You get no clues from the frequency analysis.
This making the decrypting much harder, but not impossible.
A man named Charles Babbage, born 1791, was the first one to break this kind of crypto.
Among many things, he was devoted to statistics. He was one of the first one making statistic on born/death relations,
today commonly used by insurance companies.
He thought that if there's only four ways to crypt the word "the", and the word is used many times in the text,
the possibility that the same crypto variant is repeated on a number of places in the text is rather significant.
The longer the text, the greater the chance. It was this kind of repetition that lead Babbage to the decryption
of the Vigenére crypto.
Babbage stated some fairly simple steps to break the Vigenére crypto.
Look for letter sequences that repeat more that once in the text. There's a possibility that it's the same word
crypted with the same part of the keyword. The distance between the repetitions and the distance from the beginning
of the text, gives you some clues of the length of the keyword. When you know the length of the keyword, you know
how many crypto alphabet that was used to crypt the text. If you use just one alphabet, it's just a
monoalphabetic crypto, and that you know how to decrypt.
If you found out that the keyword is five letters, you use the first alphabet for letter 1, 6 ,11 and so on.
For letter 2, 7, 12... you use the second alphabet and so on. You only have to use as many alphabet as there
are letters in the keyword. But how do we know what alphabet to use?
You already know the answer: Frequency analysis.
When you know the length of the keyword, you just have to use frequency analysis.
Remember that the crypto alphabet is just a clear alphabet rotated one step in relation to the previous one.
Implementing Vigenére in ASM
============================
To implement Vigenére crypto in asm we need a lot of Vigenére tables in memory. To this we need a lot of
index and key char pointers. NOT
This was my first approach, but if we stop and think a bit we can reduce it to this.
Crypting: add al,ah ;al is clear char and ah is key char sub al,"A"+"A" cmp al,25 jng @F sub al,26 ;Overflow, wrap around@@: add al,"A" ;al is now crypted char
Decrypting:
sub al,ah ;al is crypt char and ah is key char
cmp al,0
jge @F
add al,26 ;Underflow, wrap around.
@@:
add al,"A" ;al is now clear char
To understand the code above, remember that the Vigenére table is just 26 alphabets where each one is
rotated one step in relation to the previous. If we simplify it a bit and just use one alphabet,
a-z, we can look at it with a new approach. In the above example with the word "the" and the key
"KING" we get "t" as the first letter to crypt with key char "K". As you remember,
we go to the "t" column in the clear alphabet, and from there down to the line starting with k.
In the point of intersection we find "D".
Another way to look at it is, go to the "K" column of the alphabet starting with "A", column 10 (or "A" - "K",
where the "A"-column is column 0). Add the number of the "T" column, column 19, to the "K" column and we get 29.
As the English alphabet only have 26 letters we get an overflow, or wrap around of 4 (remember "A" column is 0).
If we add 4 to 0 we end up in the "D" column, which is the substitution letter for "T" with the key char "K".
We apply the same rules for the decryption, but here we subtract the key character from the crypted character.
Instead of overflow we look for underflow and if this happen, we shift it up 26 letters.
@Detten add link vigenere.asm
The complete source code can be found here, if Detten is so kind and add a link to it ;-).
And last, an exercise in Vigenére:
BBLM RS VRJ XTYOETOSWP UNTYOJH XBLHCOQ DLVTSQX FHO T PRQMJLJ UJG?
QXJ CD FJDG YK JWTBTKM FHO BB DCXLYCHDS HYW WSBUDTOS NZ IUAA GNNS,
MQE QDMYC BB UUOI NZ VJRTI LLZVNRKOX.
QSTC IU DMY OBOFGBJHNX KEVGJYY XAOVSH UYW TIPUD?
YCHCIE SX ODBWG C PJUEANR....MSSEJ BB UUSSA EAN WJYQY NARCMOS.
Avoiding Vigenére
=================
The strength of the Vigenére crypto also makes this kind of crypto harder to use.
One candidate to a crypto stronger than the monoalphabetic crypto, but easier to use than the
Vigenére crypto is the homophonic substitution crypto.
In this crypto you substitute a letter with many letters, and the number of substitute characters is
proportional to the frequency of the letter. If for example the letter a have a frequency of 9%,
we assign 8 characters to substitute for a. For every occurence of the letter a in the clear text,
we substitute it with any of the 8 characters we assigned to a, not important which one.
After crypting the text, every character substituted for a then have about 1% frequency in the crypted text.
If b, or any letter, have 1% frequency, we only have to substitute it with 2 characters.
When all letters in the clear text is crypted, every character then have about 1% frequency in the crypted text.
An example of homophonic substitution crypto. Here number is used.a b c d e f g h i j k l m n o p q r s t u v w x y z09 48 13 01 14 10 06 23 32 15 04 26 22 18 00 38 94 29 11 17 08 34 60 28 21 0212 81 41 03 16 31 25 39 70 37 27 58 05 95 35 19 20 61 89 5233 62 45 24 50 73 51 59 07 40 36 30 63 47 79 44 56 83 84 66 54 42 76 4353 46 65 88 71 72 77 86 4967 55 68 93 91 90 80 96 6978 57 99 7592 64 85 74 97 82 87 98
With a frequency of 1% for every character, there's no way to use
frequency analysis on the crypted text. Is this unbreakable? NOT
There's a number of discrete clues for a clever decrypter.
Every letter in any language have its own quality and relation to other letters.
This can be distinguished even if homophonic substitution crypto is used.
In english the letter q is always followed by the letter u. No other letter can follow q.
If we want to decrypt an english text crypted with homophonic substitution we now that q is a rare
letter probably only substituted with one character (or number).
We also know that the letter u have e frequency of about 3% of all letters in an english text and therefore
probably substituted with 3 characters.
If we find a character in the crypted text that is always followed by the same three characters,
we have reason to believe that these characters stands for the letter u, and the first stand for q.
Other letter are harder to pick, but their internal relation reveals which one it is.
It's possible to break this kind of crypto, but it's a lot safer than a simple monoalphabetic crypto.
At first glance, the homophobic substitution crypto seems to be some kind of polyalphabetic crypto.
Every clear text letter can be substituted with a number of characters, but there's a significant difference.
In the example above, the letter a is represented by 8 different numbers.
These numbers represent the letter a and only the letter a. In a polyalphabetic crypto a clear text letter
can be substituted with different letters, but here the substitution letters can represent different clear
text letters. Because of this, the homophonic substitution crypto is said to be a monoalphabetic crypto.
By the time the crypto alphabet is made, it's used through out the whole crypto.
It makes no difference that there's a number of substitute for a letter.
If you use a polyalphabetic crypto, you constantly change between different crypto alphabets.
My way of implementing homophonic substitution crypto in asm can be found here.
And now a homophonic exercise:
HNE 0IQWtG OY98CKÂ5u YfTBÅ7| pA vÏÃ2ä] éJ 1W[UZÂjweh3 XÈ i
åÅçgÄvâ ìqmV-sSkboDÁÏI6 }dcaäYz xÉÆÊÇÎË ÍL åét2Wë ãSáÌèDíæT
2.2, 9u ï]HÂ0|Cà X13-5Ã ëZ7gycK. Ulî Ëpx8MEçeikÅÄI ÏtDQw1GB o
äJÁ æA 3éVAObfuch[ jqÇvsz| åWÃ2Â] ÈÆmV-ÎSád}xíïÉ 2.2 Êçg
vÅI2Ïë âãàA-îSHÌèDK0T ]EZì5t9Q GËäUé7u, årWc{ ÂB Å|xy1O3 vÏeÀ
kNäJ Dpën ÄV åéÃ2W].
Playfair
The playfair crypto was created by Lyon Playfair. This crypto substitutes every two letters in the clear
text with another letter pair. To crypt and decrypt a text, the sender and receiver must first agree on a keyword to use.
The crypto is used like this.
Write out the letters in an 5 x 5 square (a-z) and let I and J share the same place. Start with the keyword .
If we use the keyword CHARLES we get:
C H A R LE S B D FG I/J K M NO P Q T UV W X Y Z
Then you divide the message into letter pairs, called bigram. Every bigram must consist of two different letters.
Therefore you put in an x for double letters that otherwise would end up in the same bigram.
Also put an x at the end if the text end with a single letter.
Clear text:
We meet at hammersmith bridge at seven.
Text in bigram:
we-me-et-at-ha-mx-me-rs-mi-th-br-id-ge-at-se-ve-nx
Then the crypting start.
Every bigram fall into one of the following categories.
1 Both letters is on the same line
2 Both letter is in the same column
3 Neither 1 or 2.
1 If both letter is in the same row, substitute them with the letter to the right if each one.
MI becomes NK. If the one letter is the last in the row, substitute it with the first letter in the row.
2 If both letters is in the same column, substitute them with the letter in the column below each one. GE becomes OG. If one letter is in the last row, substitute it with the letter from the first row. YR becomes RD.
3 If neither 1 or 2 apply, we do like this. To crypt the first letter, follow the row it's on until you
reach the column where the second letter is. The letter where the two intersect is used as a substitute for
letter one. When you crypt the second letter, follow the row it's on until you reach the column of the first
letter. The letter where the two intersect is used as a substitute for letter 2. VI becomes WG and SV becomes EW.
If you look at the clear text letters as the corner of a diagonal, the substitute letters is in the opposite corners.
The bigram text:
we me et at ha mx me rs mi th br id ge at se ve nx
The crypted text:
VSDGODQRARKYDGDHNKRPADSMOGQRBSCGKZ
ADFGVX crypto
In ADFGVX crypto, both substitution and transposition is used. The crypto is used like this.
In a 6 x 6 square, 36 positions, write out the letters a-z and the numbers 0-9 in a random order.
Both rows and columns in the square is named with the letters A, D, F, G, V and X.
The order of the letters in the square becomes a part of the key and therefore the receiver must have a
copy of the square.
A D F G V XA 8 p 3 d 1 nD l t 4 o a hF 7 k b c 5 zG j u 6 w g mV x s v i r 2X 9 e y 0 f q
The first step is to look at in which row and column each letter in the text is.
Write down the two letters that refer to this position in the square.
In this example 8 is substituted with AA and p with AD.
Message:
Attack at 2230
Crypt step 1:A t t a c k a t 2 2 3 0DV DD DD DV FG FD DV DD VX VX AF XG
So far it is a simple monoalphabetic substitution crypto, breakable with frequency analysis.
The second step in the crypto is the traspose. The traspose is based on a keyword, and in this example we use MARK.
This keyword must also be know by the receiver.
The transposition is done like this:
Write out the letters of the keyword as the first row of a new square.
Then write out the text from the first step of the crypto below the keyword using the same row length as the keyword.
Then arrange the columns so that the letters in the keyword is sorted from a to z.
The other columns follows the keyword columns.
M A R K -> A K M RD V D D V D D DD D D V D V D DF G F D G D F FD V D D V D D DA V X G V G A XA D G X D X A G
Final crypto text:VDDDDVDDGDFFVDDDVGAXDXAG
The reason for using A, D, F, G, V and X is that they are the letters in the Morse alphabet that
differs the most from each other. This was done to reduce the number of errors when the text was sent.