Detective Society Logo White and Blue

Substitution Cipher

A substitution cipher is where the encoder replaces the alphabet with a different alphabet in order to write their message.

For example, the alphabet could be written as follows.

Substitution Cipher 1

So if the encoder wanted to write the word SUBSTITUTION they would instead write

camcqtqaqtwy

The alphabet could also be replaced with numbers or symbols.

NB: The simplest type of substitution cipher is the Caesar cipher, where all the letters in the alphabet are shifted by a uniform number of places. You can read more about the Caesar cipher here:

To decode a substitution cipher we need to work backwards. The best way to imagine this is by looking at an example.

Example 1

There are lots of useful pieces of information that can help us to do this.

Firstly, we can look to see if the text written in one long string, or if are there spaces between letters indicating words.

For example, let's say we have the code:

OZ'L Q EQZ

This implies three words (although some codes put in spaces randomly to throw you off).

You can begin by guessing that any single letter words are either A or I.

Apostrophes are also helpful, because we know that singe letters after apostrophes are most likely to be, in order: S, T, D or M (double letters are most likely to be LL, RE or VE).

The most likely options for the first word are: he's, it's, he'd, it'd and we'd.

This means it's likely that Z = E or T, L = S or D and Q = A or I.

Therefore the message could be any of the following 8 options:

  • ?E'S A ?AE
  • ?E'D A ?AE
  • ?T'S A ?AT
  • ?T'D A ?AT
  • ?E'S I ?IE
  • ?E'D I ?IE
  • ?T'S I ?IT
  • ?T'D I ?IT

Now it makes sense to assume that this phrase would makes sense grammatically. Therefore it is unlikely that he's, it's, he'd it'd or we'd be followed by the word I. It also doesn't make sense that he'd or we'd would be followed by the word a.

Therefore the message must be one of these:

  • ?E'S A ?AE
  • ?T'S A ?AT

There aren't any common words which are three letters long and end with 'ae'. Therefore ?T's A ?AT is the likely solution.

As we have seen from the list of words earlier, the first word must be it's.

There are 20 different three letter words which end with 'at'. We can rule some of them out. All the words which are verbs don't make grammatical sense (e.g. 'eat'), so we know it must be a noun. It can't begin with a vowel as the word before would be 'an' rather than 'a'. Nor can it begin with any letters we have already discovered. Therefore we know the final missing letter is 'b', 'c', 'f', 'h', 'm' or 'r'.

This brings us to the final part of decoding - as the message is so short there's nothing else to go on. Therefore we must take into account the context in which we found the code - did the person that wrote it have an interest in hats or bats? Did they own a cat or a rat?  Were they researching fats or making mats?

Ultimately, when solving a substitution cipher - or any code - you need to combine these techniques with your own logic and powers of deduction.

Example 2

We're not always fortunate enough to have a code with spaces and apostrophes.

Let's look at this code:

UAGWAFLEDNYAWAFUAVTERKAGFRNRKAMNWRBNURACCBNNBA

The best way to start is by looking at the frequency of letters.

In the example, the letters appear this often:

Frequency Table 2

We know that the most common letters in the alphabet are, in order: E, T, A, O, I, N, S, H and R. Here are the percentages of all letters they make up in normal English:

Letter Frequency Table

So it's likely that the A in our cipher is the letter 'E'.

Tying into this, we should be aware of the most common words in English - this is particularly useful when you can see word lengths, but it is still helpful in long strings. Here are the 10 most common words, and the percentage of all words they make up:

Word Frequency Table

These 10 words make up more than 22% of all words used in written English, meaning that it's highly likely any phrase will contain one or more of these words.

Let's go back to our phrase.

We can start by substituting the most common letter in the phrase, A, with the most common letter in the alphabet, 'e'.

UeGWeFLEDNYeWeFUeVTERKeGFRNRKeMNWRBNUReCCBNNBe

We also know from our list of words that 'e' is often preceded by 'th'. We can see that in our string R and K come before A twice. Also R appears 5 times, while K only appears in these two occurrences, which ties in with our earliest stats. Let's assume that R is 't' and K is 'h'.

UeGWeFLEDNYeWeFUeVTE the GFtN the MNWtBNUteCCBNNBe

Remember at this stage we're still guessing - we may need to go back and try different things later.

The next most popular letter in the phrase is N. It's likely that N is one of the other popular vowels - A, I or O. Towards the end of the phrase we see a double N. It's possible that one of those letters comes at the end of one word and the next comes at the beginning of another, however this would mean - assuming we're right about the 'e' - that the final word was either a?e, i?e or o?e - none of which seem likely. Or they could be 'a ?e', 'i ?e' or 'o ?e'. Both 'a ?e', 'i ?e' are possible but rely on the prior word ending with either 'a' or 'i' respectively. Therefore it's highly likely the double N comes within the same word. If we assume that the N has to be one of the popular vowels, this would mean it was highly likely that N is 'o'.

UeGWeFLEDoYeWeFUeVTE the GF to the MoWtBoUteCCBooBe

This theory seems more likely as we can now see the word 'to' - English's third most popular word - has appeared.

We now have several options of what to do next - teCCBooBe looks like a good option to explore, however let's first deal with something that doesn't make sense.

The phrase 'the GF to the' is slightly jarring. The implication is that as GF follows 'the' it has to be a noun. However there are very few two letter nouns, the most common of which is 'ox', which could make sense if this is a cipher about farming (remember context is king), however we have already assigned the 'o'.

There is another possibility - 'the' also begins several 4 and 5 letter words: 'then', 'them', 'they', 'there', 'their', 'theft' and 'theme'. Given the surrounding words, it's unlikely that its a four letter word followed by a one letter word, and we have already assigned other letters in 'there', 'theft' and 'theme'. The phrase could be 'their to the', but this doesn't seem to make sense either. Frustrating.

There is one a further possibility before we throw out what we have already done - the first 'the' might not be 'the' at all. It could be a word that ends 't' or 'th' followed by a word that is 'e??' or 'he??'. This gives us lots of options, so knowing this let's look at teCCBooBe - if we can get that to make sense we can return to this later, if we can't then we have two reasons to discard our work so far.

To complete this phrase, we only need to identify two letters - C and B. C is a double letter We have another double letter in teCCBooBe - again it's possible that one of those letters comes at the end of one word and the next comes at the beginning of another, However, as it's a short phrase it won't take long to check.

The most common double letter is LL, followed by SS, EE, OO, and TT. We've already assigned E, O and T, so let's look at L and S.

  • tellBooBe
  • tessBooBe

'Tell' is a common noun and 'Tess' is a proper noun - meaning both could be stand alone words. It's highly likely that the 'oo' is proceeded by a consonant, and it's unlikely that 'tell' or 'tess' would be followed by one, so it's probably that they are complete words, although we should take into account that we don't know for sure that the word begins with 'te' - it could begin with one of the letters that proceeds it. However, there is a good reason to assume it doesn't. By quickly looking at what letters could replace the B, we can see that there is one very strong candidate - 'n', which makes 'BooBe' become 'no one'. This means that the phrase is almost certainly 'tell no one'.

UeGWeFLEDoYeWeFUeVTEtheGF to the MoWtnoU tell no one

Now we're cooking with gas.

Let's look at to the MoWtnoU

We know there aren't a great deal of words where we see 'tn' aside from compound words like mustn't. 'Partner' might be one option, if we weren't already confident about the placement of the 'o'. Knowing this, let's split this phrase between the 't' and the 'n' and look at the second word: 'no?'. The obvious choice would be 'not', but we've already used the 't'. The other options are 'nod', 'nor' and 'now'.

In the wider context that gives us:

  • to the ?o?t nod tell no one
  • to the ?o?t nor tell no one
  • to the ?o?t now tell no one

The one that makes the most sense in that context seems to be 'now'. If we put that into the whole phrase that gives us:

weGWeFLEDoYeWeFweVTEtheGF to the MoWt now tell no one

We now have the phrase 'we' repeated twice in the phrase, including once at the beginning. This could be the start of longer words, but, given the wider phrase, it does make sense that the person writing the message might be using the first-person plural.

we GWeFLEDoYeWeF we VTEtheGF to the MoWt now tell no one

The word 'MoWt' has lots of different possibilities, but we can narrow them down by only considering common nouns - 'boat', 'fort', 'goat' 'host' 'loft' 'lout' 'port' and 'tout' are all possibilities, but once you discount words that use letters we have already assigned we are left with 'boat', 'fort', 'goat' and 'port', meaning that W is either 'a' or 'r'.

The section eWeF comes at the end of a word (or words). It seems highly likely that of the two choices 'eaeF' and 'ereF', the latter is the correct one. Therefore, it seems likely that W is 'r'.

we GreFLEDoYereF we VTEtheGF to the Mort now tell no one

The first 'we' is most likely followed by a verb. The most common verbs are 'to be', 'to have' and 'to do', which in the context of 'we' means they'd be 'we are' 'we have' and 'we do'. It is therefore almost certain that the phrase begins 'we are'

we are FLEDoYereF we VTEtheaF to the Mort now tell no one

Now we have just three sections with missing letters. Of the most common letters in English, the only ones we haven't assigned so far are 'i' and 's' although there are no guarantees they will occur in our phrase.

Let's return to VTEtheaF

As it's proceeded by 'we' and followed by 'to the ?ort now', it has to contain a verb. The most likely option would be 'go', but we've already used the 'o'. We know it's likely the phrase splits either after the 't' or after the 'h'. This means that are options for verbs are plentiful if it's the first word of the phrase, but if it's the second then we only have the choices of 'eat', 'head', 'heal', 'heap', 'hear' and 'heat'. Only one of those really makes sense when followed by 'to the ?ort now' - 'head'. This means F is 'd', which also makes sense with the word that ends 'ereF'.

we are dLEDoYered we VTEt head to the Mort now tell no one

VTEt can only really be a modal verb, like 'can', 'should' or 'could'. There is only one modal verb that fits: 'must'.

we are dLsDoYered we must head to the Mort now tell no one

'dLsDoYered' could be one word or several words. What is for certain is that a vowel must come between the 'd' and the 's' - there are only two options left: 'i' and 'y'. It clearly has to be 'i'. This is looking increasingly like one word, and if it is one word then there is only one option: 'discovered'.

we are discovered we must head to the Mort now tell no one

That certainly looks like a message. The question is: is the final word 'fort' or 'port'. Once again we have to rely on context. Are the people involved near the sea? Or are the near a castle? Only your powers of logic and deduction will get you to the true answer.