|
A Secure Communication System
using
Factorization
via
Elliptic Curve Methods
|
|
|
|
|
|
Abstract
A secure communication system using factorization can make possible the
sending [low complexity] secure messages over difficult communication
links with no loss in overall security. This Elliptic Curve coding
system is designed with spies, spy ring supervisors
and the diplomatic corps in mind. This is a general specification, not
a fixed recommendation.
The output is Morse Code Compatible {"A"..."Z" + "0" ... "9"}, making
it suitable for Shortwave (HF) transmission and Earth Moon Earth (EME)
transmission modes.
This research is theoretical, as there are
many kinks and weak points in this kind of information and telecom
technology.
The core requirement is that all of the cryptography run on a 100 MHz
to 500 MHz PC with a Java language application.
- A outgoing plaintext message pre-processor stage will be
needed to make message encoding optimal for the narrow bandwidth
conditions
of potential transmission systems.
- An outgoing "initial" encrypted message pre-processor (to
process the message for
transmission) is needed too, so as to allow for better message
propagation over bad conditions.
- The agent must have a simple and memorable key or keys.
- The application should be able to disguise itself as a ECM
factoring application, so something obscure and terse.
|
|
|
|
|
|
Mechanical
details, for illustration only...
|
|
|
|
Letter
or Command
|
Suggested
Initial
Value
|
Symbol
|
|
|
A
|
11
|
01
|
|
|
B
|
12
|
02
|
|
|
C
|
13
|
03
|
|
|
D
|
14
|
04
|
|
|
E
|
15
|
05
|
|
|
F
|
22
|
06
|
|
|
G
|
23
|
07
|
|
|
H
|
24
|
08
|
|
|
I
|
25
|
09
|
|
|
J
|
35
|
10
|
|
|
K
|
36
|
11
|
|
|
L
|
37
|
12
|
|
|
M
|
38
|
13
|
|
|
N
|
39
|
14
|
|
|
O
|
41
|
15
|
|
|
P
|
42
|
16
|
|
|
Q
|
43
|
17
|
|
|
R
|
44
|
18
|
|
|
S
|
45
|
19
|
|
|
T
|
46
|
20
|
|
|
U
|
56
|
21
|
|
|
V
|
57
|
22
|
|
|
W
|
58
|
23
|
|
|
X
|
59
|
24
|
|
|
Y
|
60
|
25
|
|
|
Z
|
70
|
26
|
|
|
Macro
/ Terminate Macro
|
71 /
16
/ 89 / 33
|
27
|
|
|
1
|
82
|
28
|
|
|
2
|
83
|
29
|
|
|
3
|
84
|
30
|
|
|
4
|
85
|
31
|
|
|
5
|
86
|
32
|
|
|
6
|
87
|
33
|
|
|
7
|
88
|
34
|
|
|
8
|
94
|
35
|
|
|
9
|
95
|
36
|
|
|
0
|
96
|
37
|
|
|
SPACE
(a quasi null)
|
97 /
63 / 49
|
38
|
|
|
NULL
(optional)
|
99 /
72 / 74
|
39
|
|
|
|
|
The above table is optimized for
Roman based character sets. Roman typically uses 36 core coding
symbols, including digits.
- Cyrillic uses 32 core letters, and the usual 10 numbers
(for a total of 42 total symbols) so it will need an expanded table.
- Asian languages that have phonetic symbols will need a
symbol custom set that may possibly including the Roman symbol.
- Languages that
used an expanded Roman character (Vietnamese, Czech, Slovak, Finnish,
German, Swedish etc...) set should use a custom table.
Ideally, there should be 90
symbols used so as to decrease the quality of each intercept -- if the
intercept is worked backward to a stream of 2 digit numbers embedded
into a larger number.
Symbol representations should be uneven for cryptographic security
reasons. You have to decrease the frequency of representations of the
most common symbols, for cryptographic reasons.
The most common
characters should have the most 3 representations (12 chars x 3
representations = 36 representations; 36 representations / 90 symbols =
40% symbol 'tax' so to speak). For messages under 2 KB this is very
practical, but for messages reaching into the 20 KB range this will
start to backfire.
It is okay for at
least one symbol to have only one representation, providing it is not a
space or a macro or a common character.
Ideally, this table
should change automatically in the software about 9 times per year or
about once per 40 days from 01 JAN.
Use of the "Null" symbol should be optional, but if used 5% to 10% of
the message should be Nulls.
It is assumed that the message pre-processor should randomly allocate
symbols, based on a random symbol representation picker. Macros are not
affected by this requirement, as they are special irregular symbol
groups not subject to this requirement.
What if you want to use
3 digit codes?
Three digit codes have a great deal of viability here, but to some
extent one sees similar trade offs vs the recommended two digit codes
here
Issue
|
3 Digit
|
2 Digit
|
Difference
|
Note
|
| Numeric Range |
100 ... 999
|
10 ... 99
|
899/89 = 10.1x
|
More
|
Symbol Density
|
3
|
2
|
3/2 = 1.5x
|
Less
|
Mix 3 & 2
|
NO, Macro
|
NO, Macro
|
Macro needed
|
More
|
Compressibility
|
More
|
Less
|
Method?
|
|
Symbol Code Density
|
10.1 better
|
= NA =
|
Symbols >> Alphabet
|
|
ECM Codebreaking
|
Easier
|
Harder
|
Padding insertion rule
|
Variable
|
ECM Coding
|
Same
|
Same
|
Lower Code Densities
|
|
Asian language utility
|
Excellent
|
Poorer
|
Symbol Code Space
|
Variable
|
Change Symbol Space
|
3x/year
|
40 days
|
Symbols >> Alphabet |
|
|
|
|
|
|
|
The pre-processor should use the
"Macro"
and "Terminate Macro" commands to encode Upper Case (CAPS ON), Lower
Case (CAPS OFF), Text Formatting Symbols, Cyrillic {ON / OFF}, etc ...
using a structure like
- Macro Indicator
- Macro State = {"A" ... "Z"} + ({"A" ... "Z"} and {"0" ...
"9"}) + [optional ({"A" ... "Z"} and {"0" ... "9"})]; preferred
formatting states being $# or $#($ | #) or $$#. [$ : Alpha, # : Number]
- Macro Data = ({"A" ... "Z"} + {"A" ... "Z"}), allowing for
262 = 676 macro data groups.
- Terminate Macro (a syntactical necessity to allow macros to
be used at all)
It is up to the HQ end user (not the agent) to determine how macros can
or should be used,
and how the pre-processor should function. Without the macros, message
complexity is greatly reduced -- and phrases, equations and other
important cannot be transmitted unless with great difficulty.
With macros, codebook access becomes possible -- so entire dictionaries
(typically under 35,000 words) can be used to compress the message with
long macros.
- Short macros (with no data) could be used to implement
limited scale
trigram word compression.
- Compression is important to lessen the
quality of the intercept.
- The master dictionaries should be updated
every 3 to 6 months. as they can be among other things : an important
database of place names.
This current specification allows for [{26} x {36}] + [{26} x {36} x
{36}] macros or [936] + [33696] or 34632 macros -- this is more than
enough for any possible user. Macros can be used with or without data,
it is assumed that short macros can exist without data.
|
|
|
|
|
|
The encoding chain (the decoding
chain works in reverse)
- Accept raw text (RTF, UTF8-HTML, ASCII) typically under 6
KB. This is to allow for non-Roman character set information to sent.
- If message is greater than 6 KB, initialize message
fragmentation procedures.
- A Secure Header is developed from the inputs "To" +
"Subject" + "Message Length" + "Words" + "Filler"
- Pre-processor outputs optimized output, with nominal header
and version
number.
- The crypto encoder takes over.
- Message compression via macros it initiated, with nominal
header and version number.
- The message is converted into a stream of message fragments.
- Encode message fragments
Within each message fragment
- The sequence of raw numbers is broken into 80, 82, 84 ...
90 long number blocks. Odd number blocks are probably acceptable and
feasible but ignored here. To keep the computational complexity low,
under 100 digits should be used.
- Add a beginning and terminating random number, with
beginning number not permitted to be "0" for mathematical reasons.
Alternately, up to 5 'base-10' and 'base-11' check digits could be
inserted at fixed points in the numeric message block to distort the 2
digit pattern.
- A final number for the message fragment is known, so save
state.
- Factor the number using ECM.
- Take the ECM factors of the message fragment, and store in
a bracketed or delimited format.
- Format the message fragments for transmission using a low
grade cipher that hides the prime number message stream.
|
|
|
|
|
|
Message fragment mathematical
encoding properties
- Assuming one is working in pure cipher mode : 84 digits =
82 groups = 41 symbols @ 4.55 char/word ~ 9 words.
- In pure cipher mode, one can assume : 84 digits / 9 words ~
9.33 or about 10% error margin per word.
|
|
|
|
|
|
Sample message : "The Quick Gray
Fox Jumped Over The Lazy Brown Dog"
- "The Quick" = 46/24/15/49/43/56/25/13/36
- Add One Time Pad to message (Optional)
- Add initial and terminating numbers ::
7/46/24/15/49/43/56/25/13/36/1; optionally one could add base-10 or
base-11 check digits
in fixed locations inside the message, in gobs of 3 or 5.
- Final form :: 74624154943562513361
- Factor :: 74 624 154 943 562 513 361 = 9 x 227 x 3121 x
6263 x 18371 x 101719
- If the (huge) number is prime, send it as a single group.
- Format for transmission :: AA {9 227 3121
6263 18371 101719} (a nominal message pre-processor will be
required)
- Optional, use a low density One Time Pad to fuzzy the
numbers : AA {9 229 3133
6271 18376 101722}
If the message is to be transmitted via high speed Morse Code, and
there is a desire to hide the numeric origin of the message then use
the Western Union "Numbers-to-Letters Substitution Table" that was used
by nearly all the users of One Time Pad technology during World War II.
The Western Union table would have to have added to it "Space" and
"Null", and possibly "{" and "}".
Use of the Western Union 'Numbers to Letters' table could allow for a
synthetic 3 rotor device to be used to encode each line. This would or
could force the agent to remember yet another key parameter. No
plugboard or reflector should be used with this rotor system, and the
rotor should reset to the "Agent Key" on a line by line basis. It is
really up to the designers of the coding system to mull over what to do
here as these are just suggestions.
Subjecting the numeric 2 digit code groups of the final plain text
encoding to a 'low density' One Time Pad (~80% of "0") is probably a
good solution if you don't want to add checksums or initial or final
numbers. Another separate low density One Time Pad could be applied to
all the prime numbers in the output that are greater than 1 digit long.
You don't need a high density One Time Pad to do a lot of cryptographic
damage.
|
|
|
|
|
|
What does the agent have to
remember?
- Assuming that the Whirlpool Hash Function could be used to
create a One Time Pad : the One Time Pad "initialization vector" could
be something like like "Snowflake" (as the encoder could add
"-Day_of_Year" to add entropy). If the decimal digits created from such
a mnemonic are rationed sparsely, then this should generate more than
enough entropy for a 2 KB message. As One Time Pads are optional, this
should be taken as an advisory.
- The Random Number Generator could use the same
"initialization vector" as the One Time Pad, but it is a different
codebase.
- Any "Low Density" One Time Pads should not have to be
remembered by the agent.
- Low complexity rotor devices, in software could be used at
some stages of message encoding. The default agent key would have to be
remembered, but it is assumed that the key would change about 12 times
per year.
- An agent "Rotor Key" if a rotor system is used in the final
encipherment stage before transmission.
|
|
|
|
|
|
Seeking workable solutions
The near inevitable output of this encryption format is a series of
sequences of prime numbers. Given the automated signal processing
technology available that allow signals and messages to be processed
over
many different domains of analysis -- this cannot be good.
Forbidden or Forced
- You cannot do any digit re-grouping in the plaintext domain.
- You must however scramble the ECM prime numbers in the
ciphertext domain.
Maybe
- You could insert non-prime numbers into the message stream,
but the injection rate would have to be high -- something like 80%. The
numbers would have to be filtered out. Sounds simple, but its probably
a mess.
- A well designed 8999 or 18999 element codebook could allow
for some hefty message compression, but the codebook digits would be
required not to collide with the alpha numeric substitution
designators.
Codebooks are best when implemented in the Macro section to avoid this
possible conflict.
Things that must be done to lessen this possibility of message origin
- All bracketed output sequences must be scrambled. Sorting
of the multipliers via value should be forbidden.
- All message sequence indicators are OK, but the message
itself must have its segments sent out of sequence.
Encoding State Machine Issues
- I don't believe in the encoding state machine changing 365
times per year, that is just overkill.
- The encoding state machine should change with a frequency
no greater than 90 days to no fewer than 40 days.
- Agent telecom uplinks are not expected to be any more than
52 times per year, with 36 being more typical.
- The encoding state machine should change adequately to
allow for a network of 400 users to use this technology, based on a
probable message frequency of 36 nominal + 8 emergency messages per
year.
|
|
|
Conclusions
Although this ECM cryptography system can be powerful one, the day to
day message transmission issues may betray the ECM origin of the
message. There are many problems that may plague this transmission
system and hinder its development and security.
- Although it may be possible to hide the bracketed
"multiplication groups" of primes in each message fragment, this does
come at the cost of message length and increases encoder and decoder
complexity.
- This cryptography system does have error correction issues.
Without error correction levels similar to the "Voyager Code" or
"Cassini Code" messages may be damaged on arrival forcing extra work to
recover all the codegroups.
- There are a lot of extra decoder blocks (as well as encoder
blocks) due to using ECM -- and this increase in complexity will cause
multiple debugging and deployment problems.
- The "baseband" transmission formats for this format are
more limited due to its error correction properties. RTTY formats with
error correction are preferred over Morse Code, not only for their
higher transmission rates but their self correcting nature.
- Bell 103 modem and Bell 202 modem are recommended as
transmission formats as an alternate to self correcting RTTY
transmission systems providing the message is transmitted twice.
Providing that the known issues can be overcome, this system can
provide a viable and secure communications channel for messages of 2 KB
to 4 KB in maximal length.
|
|
|
References
|
|
|
Created : 04
March 2010 // Last Modified : 22
August 2010 // Created by : Max Power
|
|