PEX, the Plain English Cipher

How it works

PEX evolved from this thing which I made a few years ago. The idea was to create a tool to encipher information, whose output would not be suspected of being a cipher; my bash at hitting that target was to create an output that looked like a foreign language. But it doesn't take much staring to realise that the old cipher was a cipher: the output consists almost entirely of strings of the form consonant-vowel, and it is not recognisable as any particular language. PEX overcame this by producing (mostly) grammatically correct English sentences. Of course they're complete nonsense, but that's besides the point (for now).

There are three stages to PEX encryption:

  1. Convert the text to a string of numbers;
  2. Encode the string of numbers;
  3. Convert the encoded string of numbers to grammatically correct English seentences.

It doesn't really matter how we go about doing Steps 1 or 2 in the above, though if we wish to decode the output afterwards then we'll need to make sure we can reverse the steps (so using hash codes isn't suitable, for example); in brief, PEX encodes the text numerically and then mashes the numbers about a bit using modular arithmetic. Step 3 is done using the theory of Markov chains... and even then, not very much of it! The idea is that, when we speak an English sentence, there are various levels of structure going on. On one level we have phrases, such as noun phrases and verb phrases. A noun phrase is something like 'the big green apple' and a verb phase is something like 'slowly walks away'. Within each of these phrases there are the words themselves: nouns, verbs, adjectives and so on. The form of the phrases and of the words depends on the characteristics of the words being used. For example, are we talking about 'an apple' or 'the apple' (determinacy)? Are we talking about 'the postman, who ...' or 'the table, which ...' (animacy)? What tense is the verb in?

A Markov chain, together with a bit of computer-generated randomness, can be used to model an English sentence. For PEX this is kept fairly simple: at any point in a sentence, we are either in a noun phrase or a verb phrase. Every sentence starts with a noun phrase and has a verb phrase; between the first noun and verb phrases there might be a subordinate clause (which contains a pronoun and a verb phrase), and if the verb is transitive there might be another noun phrase to finish with. In deciding whether to move from one phrase to another, we assign a probability that we move from one phrase to another. Within each phrase, we have mini-Markov chains doing their work as well; for instance, in a noun phrase, there is a set probability that a noun will be preceded by an adjective, and that it will be pluralised, and so on. The conjugations, declinations and words used to fill in the gaps are chosen depending on the form of the words and how the sentence fits together.

Ultimately, it works. Not especially well, for instance it has trouble finishing sentences at the very end, and some of the subordinate clauses don't make much sense, but on the whole it takes in text and spits out sentences. Of course, it's only useful for short inputs, since the output is many times longer than the input; if I have more free time in later life, I might try to improve this.

The high architect teaches tomatos. Sinks won't hurt an architect, and the red grapefruits can eat a pen, however the sofas won't cough, and architects play. The harsh keyboard which I stand hates shopkeepers. A cute grapefruit which he executes can't stand, and the mouse won't design a husky fisherman. The poised farmers swim. Melons sleep, and the grey snails which I sleep wouldn't go. A loud lecturer listens. The colorful computers shouldn't give a lecturer, so the beautiful farmer shouldn't play. A fork coughs, so a caterpillar wouldn't make eagles, so a pomegranate sends grapes. The artists would leave, but a chair will fit, however a knife won't get a policeman. A donkey can't find blushing potatos.

Back to: homepage.