Thing-a-day, Day 18: (incomplete) Subsititution Cypher Solver

Today I re-started work on a program to solve substitution cyphers. The substitution cypher is where you take every character in your message and replace it with a different character- they sell little puzzle books full of these to try and crack by hand. My (most likely naive) approach for doing this is to count the frequency of each letter in the message. Starting with a list of the letters that occur most frequently in the English language, I will then use a bunch of different substitution combinations to decrypt the message and compare against a dictionary to calculate which combination works best. It would be ugly to search all combinations because there are so many, so I will start off by masking out all of the letters I haven’t found yet. The very basic code I have so far is after the break. Note that it does not attempt to solve the cypher at this point.

#!/usr/bin/env python
import sys
from string import lower
from operator import itemgetter
#import re, collections
 
# Load in the file
cyphertext = file(sys.argv[1]).readlines()
cyphertext = lower(''.join(cyphertext))
 
bins = {}
for letter in range(ord('a'), ord('z')):
    bins[chr(letter)] = cyphertext.count(chr(letter))
 
# Show the letters with highest occurance
print sorted(bins.iteritems(), key=itemgetter(1), reverse=True)
 
# Load the dictionary file
dict = {}
words = file('words.txt').readlines()
for word in words:
    dict[lower(word.strip())] = 1
 
# This is how to search the dictionary
print dict.has_key("true")
This entry was posted in Journal, thingaday. Bookmark the permalink.

2 Responses to Thing-a-day, Day 18: (incomplete) Subsititution Cypher Solver

  1. David Hagler says:

    Hey, another idea for you for this particular exercise – I once attempted to do the same, but instead of pure letter frequency I built a table of 26×26 for letter pair frequency similar to some of the standard random name generators (not the phenome type), sort your pair frequencies and use that – pair frequency should allow you to decipher quite a few, especially if you shove in the replacement characters and use your mind to finish the message, Also, its really oddly hard to find a list of just dictionary words some times, so if you haven’t found a good one already most linux systems come with a spellcheck library aspell – http://aspell.net/ thats where I got my word list.. it was wierd for me the first time finding just a plain dictionrary file. P.S. Happy hunting!!

  2. mahto says:

    Cool, thanks for the advice. Somehow I totally misjudged the amount of time this would take, then lost it. I saw someone doing a spell checker that read in a bunch of English source materials to build a probability table for words, so something like that might work for letter pairs as well. I’ll have to give that a try!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>