A Secure Communication System using Factorization

via

Elliptic Curve Methods






Abstract
A secure communication system using factorization can make possible the sending [low complexity] secure messages over difficult communication links with no loss in overall security. This Elliptic Curve coding system is designed with spies, spy ring supervisors and the diplomatic corps in mind. This is a general specification, not a fixed recommendation.

The output is Morse Code Compatible {"A"..."Z" + "0" ... "9"}, making it suitable for Shortwave (HF) transmission and Earth Moon Earth (EME) transmission modes.

This research is theoretical, as there are many kinks and weak points in this kind of information and telecom technology.

The core requirement is that all of the cryptography run on a 100 MHz to 500 MHz PC with a Java language application.
  • A outgoing plaintext message pre-processor stage will be needed to make message encoding optimal for the narrow bandwidth conditions of potential transmission systems.
  • An outgoing "initial" encrypted message pre-processor (to process the message for transmission) is needed too, so as to allow for better message propagation over bad conditions.
  • The agent must have a simple and memorable key or keys.
  • The application should be able to disguise itself as a ECM factoring application, so something obscure and terse.





Mechanical details, for illustration only...



Letter or Command
Suggested Initial Value
Symbol


A
11
01


B
12
02


C
13
03


D
14
04


E
15
05


F
22
06


G
23
07


H
24
08


I
25
09


J
35
10


K
36
11


L
37
12


M
38
13


N
39
14


O
41
15


P
42
16


Q
43
17


R
44
18


S
45
19


T
46
20


U
56
21


V
57
22


W
58
23


X
59
24


Y
60
25


Z
70
26


Macro / Terminate Macro
71 / 16 / 89 / 33
27


1
82
28


2
83
29


3
84
30


4
85
31


5
86
32


6
87
33


7
88
34


8
94
35


9
95
36


0
96
37


SPACE (a quasi null)
97 / 63 / 49
38


NULL (optional)
99 / 72 / 74
39



The above table is optimized for Roman based character sets. Roman typically uses 36 core coding symbols, including digits.

  • Cyrillic uses 32 core letters, and the usual 10 numbers (for a total of 42 total symbols) so it will need an expanded table.
  • Asian languages that have phonetic symbols will need a symbol custom set that may possibly including the Roman symbol.
  • Languages that used an expanded Roman character (Vietnamese, Czech, Slovak, Finnish, German, Swedish etc...) set should use a custom table.

Ideally, there should be 90 symbols used so as to decrease the quality of each intercept -- if the intercept is worked backward to a stream of 2 digit numbers embedded into a larger number.

Symbol representations should be uneven for cryptographic security reasons. You have to decrease the frequency of representations of the most common symbols, for cryptographic reasons.

The most common characters should have the most 3 representations (12 chars x 3 representations = 36 representations; 36 representations / 90 symbols = 40% symbol 'tax' so to speak). For messages under 2 KB this is very practical, but for messages reaching into the 20 KB range this will start to backfire.

It is okay for at least one symbol to have only one representation, providing it is not a space or a macro or a common character.

Ideally, this table should change automatically in the software about 9 times per year or about once per 40 days from 01 JAN.

Use of the "Null" symbol should be optional, but if used 5% to 10% of the message should be Nulls.

It is assumed that the message pre-processor should randomly allocate symbols, based on a random symbol representation picker. Macros are not affected by this requirement, as they are special irregular symbol groups not subject to this requirement.

What if you want to use 3 digit codes?
Three digit codes have a great deal of viability here, but to some extent one sees similar trade offs vs the recommended two digit codes here

Issue
3 Digit
2 Digit
Difference
Note
Numeric Range 100 ... 999
10 ... 99
899/89 = 10.1x
More
Symbol Density
3
2
3/2 = 1.5x
Less
Mix 3 & 2
NO, Macro
NO, Macro
Macro needed
More
Compressibility
More
Less
Method?

Symbol Code Density
10.1 better
= NA =
Symbols >> Alphabet

ECM Codebreaking
Easier
Harder
Padding insertion rule
Variable
ECM Coding
Same
Same
Lower Code Densities

Asian language utility
Excellent
Poorer
Symbol Code Space
Variable
Change Symbol Space
3x/year
40 days
Symbols >> Alphabet







The pre-processor should use the "Macro" and "Terminate Macro" commands to encode Upper Case (CAPS ON), Lower Case (CAPS OFF), Text Formatting Symbols, Cyrillic {ON / OFF}, etc ... using a structure like
  1. Macro Indicator
  2. Macro State = {"A" ... "Z"} + ({"A" ... "Z"} and {"0" ... "9"}) + [optional ({"A" ... "Z"} and {"0" ... "9"})]; preferred formatting states being $# or $#($ | #) or $$#. [$ : Alpha, # : Number]
  3. Macro Data = ({"A" ... "Z"} + {"A" ... "Z"}), allowing for 262  = 676 macro data groups.
  4. Terminate Macro (a syntactical necessity to allow macros to be used at all)
It is up to the HQ end user (not the agent) to determine how macros can or should be used, and how the pre-processor should function. Without the macros, message complexity is greatly reduced -- and phrases, equations and other important cannot be transmitted unless with great difficulty.

With macros, codebook access becomes possible -- so entire dictionaries (typically under 35,000 words) can be used to compress the message with long macros.
  • Short macros (with no data) could be used to implement limited scale trigram word compression.
  • Compression is important to lessen the quality of the intercept.
  • The master dictionaries should be updated every 3 to 6 months. as they can be among other things : an important database of place names.

This current specification allows for [{26} x {36}] + [{26} x {36} x {36}] macros or [936] + [33696] or 34632 macros -- this is more than enough for any possible user. Macros can be used with or without data, it is assumed that short macros can exist without data.





The encoding chain (the decoding chain works in reverse)
  1. Accept raw text (RTF, UTF8-HTML, ASCII) typically under 6 KB. This is to allow for non-Roman character set information to sent.
  2. If message is greater than 6 KB, initialize message fragmentation procedures.
  3. A Secure Header is developed from the inputs "To" + "Subject" + "Message Length" + "Words" + "Filler"
  4. Pre-processor outputs optimized output, with nominal header and version number.
  5. The crypto encoder takes over.
  6. Message compression via macros it initiated, with nominal header and version number.
  7. The message is converted into a stream of message fragments.
  8. Encode message fragments
Within each message fragment
  1. The sequence of raw numbers is broken into 80, 82, 84 ... 90 long number blocks. Odd number blocks are probably acceptable and feasible but ignored here. To keep the computational complexity low, under 100 digits should be used.
  2. Add a beginning and terminating random number, with beginning number not permitted to be "0" for mathematical reasons. Alternately, up to 5 'base-10' and 'base-11' check digits could be inserted at fixed points in the numeric message block to distort the 2 digit pattern.
  3. A final number for the message fragment is known, so save state.
  4. Factor the number using ECM.
  5. Take the ECM factors of the message fragment, and store in a bracketed or delimited format.
  6. Format the message fragments for transmission using a low grade cipher that hides the prime number message stream.





Message fragment mathematical encoding properties
  1. Assuming one is working in pure cipher mode : 84 digits = 82 groups = 41 symbols @ 4.55 char/word ~ 9 words.
  2. In pure cipher mode, one can assume : 84 digits / 9 words ~ 9.33 or about 10% error margin per word.





Sample message : "The Quick Gray Fox Jumped Over The Lazy Brown Dog"
  1. "The Quick" = 46/24/15/49/43/56/25/13/36
  2. Add One Time Pad to message (Optional)
  3. Add initial and terminating numbers :: 7/46/24/15/49/43/56/25/13/36/1; optionally one could add base-10 or base-11 check digits in fixed locations inside the message, in gobs of 3 or 5.
  4. Final form :: 74624154943562513361
  5. Factor :: 74 624 154 943 562 513 361 = 9 x 227 x 3121 x 6263 x 18371 x 101719
  6. If the (huge) number is prime, send it as a single group.
  7. Format for transmission :: AA {9  227  3121  6263  18371  101719} (a nominal message pre-processor will be required)
  8. Optional, use a low density One Time Pad to fuzzy the numbers : AA {9  229  3133  6271  18376  101722}
If the message is to be transmitted via high speed Morse Code, and there is a desire to hide the numeric origin of the message then use the Western Union "Numbers-to-Letters Substitution Table" that was used by nearly all the users of One Time Pad technology during World War II. The Western Union table would have to have added to it "Space" and "Null", and possibly "{" and "}".

Use of the Western Union 'Numbers to Letters' table could allow for a synthetic 3 rotor device to be used to encode each line. This would or could force the agent to remember yet another key parameter. No plugboard or reflector should be used with this rotor system, and the rotor should reset to the "Agent Key" on a line by line basis. It is really up to the designers of the coding system to mull over what to do here as these are just suggestions.

Subjecting the numeric 2 digit code groups of the final plain text encoding to a 'low density' One Time Pad (~80% of "0") is probably a good solution if you don't want to add checksums or initial or final numbers. Another separate low density One Time Pad could be applied to all the prime numbers in the output that are greater than 1 digit long. You don't need a high density One Time Pad to do a lot of cryptographic damage.





What does the agent have to remember?
  • Assuming that the Whirlpool Hash Function could be used to create a One Time Pad : the One Time Pad "initialization vector" could be something like like "Snowflake" (as the encoder could add "-Day_of_Year" to add entropy). If the decimal digits created from such a mnemonic are rationed sparsely, then this should generate more than enough entropy for a 2 KB message. As One Time Pads are optional, this should be taken as an advisory.
  • The Random Number Generator could use the same "initialization vector" as the One Time Pad, but it is a different codebase.
  • Any "Low Density" One Time Pads should not have to be remembered by the agent.
  • Low complexity rotor devices, in software could be used at some stages of message encoding. The default agent key would have to be remembered, but it is assumed that the key would change about 12 times per year.
  • An agent "Rotor Key" if a rotor system is used in the final encipherment stage before transmission.





Seeking workable solutions
The near inevitable output of this encryption format is a series of sequences of prime numbers. Given the automated signal processing technology available that allow signals and messages to be processed over many different domains of analysis -- this cannot be good.

Forbidden or Forced
  • You cannot do any digit re-grouping in the plaintext domain.
  • You must however scramble the ECM prime numbers in the ciphertext domain.
Maybe
  • You could insert non-prime numbers into the message stream, but the injection rate would have to be high -- something like 80%. The numbers would have to be filtered out. Sounds simple, but its probably a mess.
  • A well designed 8999 or 18999 element codebook could allow for some hefty message compression, but the codebook digits would be required not to collide with the alpha numeric substitution designators. Codebooks are best when implemented in the Macro section to avoid this possible conflict.

Things that must be done to lessen this possibility of message origin
  1. All bracketed output sequences must be scrambled. Sorting of the multipliers via value should be forbidden.
  2. All message sequence indicators are OK, but the message itself must have its segments sent out of sequence.
Encoding State Machine Issues
  • I don't believe in the encoding state machine changing 365 times per year, that is just overkill.
  • The encoding state machine should change with a frequency no greater than 90 days to no fewer than 40 days.
  • Agent telecom uplinks are not expected to be any more than 52 times per year, with 36 being more typical.
  • The encoding state machine should change adequately to allow for a network of 400 users to use this technology, based on a probable message frequency of 36 nominal + 8 emergency messages per year.


Conclusions

Although this ECM cryptography system can be powerful one, the day to day message transmission issues may betray the ECM origin of the message. There are many problems that may plague this transmission system and hinder its development and security.
  • Although it may be possible to hide the bracketed "multiplication groups" of primes in each message fragment, this does come at the cost of message length and increases encoder and decoder complexity.
  • This cryptography system does have error correction issues. Without error correction levels similar to the "Voyager Code" or "Cassini Code" messages may be damaged on arrival forcing extra work to recover all the codegroups.
  • There are a lot of extra decoder blocks (as well as encoder blocks) due to using ECM -- and this increase in complexity will cause multiple debugging and deployment problems.
  • The "baseband" transmission formats for this format are more limited due to its error correction properties. RTTY formats with error correction are preferred over Morse Code, not only for their higher transmission rates but their self correcting nature.
  • Bell 103 modem and Bell 202 modem are recommended as transmission formats as an alternate to self correcting RTTY transmission systems providing the message is transmitted twice.

Providing that the known issues can be overcome, this system can provide a viable and secure communications channel for messages of 2 KB to 4 KB in maximal length.


References


Created : 04 March 2010 //  Last Modified : 22  August 2010 // Created by : Max Power