EFFector Online 4.05 1/7/1993

                //////////////  //////////////// //////////////

            ////             ////             ////

_________ /////////________ /////////_______ /////////________________

        ////               ////             ////

      //////////////////  ////             ////


//////////////////////////////////////////////////////////////////////

EFFector Online 4.05          1/7/1993                 editors@eff.org

A Publication of the Electronic Frontier Foundation     ISSN 1062-9424


                     -==--==--==-<>-==--==--==-

[Editor's Note: With this issue, EFFector Online will begin to

examine the technical, social, moral, legal and political issues 

surrounding the uses of encryption in computer-based communications. 

While many in various online communities around the world are highly 

conversant with cryptography and encryption, many others are not. 

Because of this we begin our series with Larry Loen's superb primer on

basic cryptography. This article originally appeared as a proto-FAQ

in the Usenet group, sci.crypt. Our readers with an interest in

learning about encryption on an ad-hoc basis are advised to read

sci-crypt and to participate in it. As with any other place on

the Net, "Ask. People know.How the world works is not a secret." -GV]

                     -==--==--==-<>-==--==--==-


                    HIDING DATA IN PLAIN SIGHT

               Some Key Questions About Cryptography

            BY LARRY LOEN (lwloen@rchland.vnet.ibm.com)


NOTE: The information and opinions expressed in this article

are those of the author and his collaborators and do not

represent the final word on these matters or the opinions,

views or policies of any company or organization.



Q1:  What is cryptography?  How, basically, does it work?

  What are the basic terms used to describe cryptography?


  Cryptography is the art and science of hiding data in plain sight.

  It is also the art and science of stealing data hidden in plain sight.

  There are at least three players. The first is the one who has

  the original data, which is presumed to have high value to

  others. This data is presumed to reside in a safe place that

  no one but the originator and his/her friends can see. (If the

  originator cannot physically secure the original data,

  cryptography is a waste of time). Now, for cryptography to be

  necessary, the data must, for some reason, have to be

  transmitted over some public means such as a telephone line, a

  letter through the mail; any means that cannot be physically

  secured by the owner to a legitimate receiver of the data. The

  receiver is the second party.


  Cryptography is any transformation of the data into a form

  that cannot (it is hoped) be recovered in a timely manner by an

  unknown party, which is called here 'the opponent'.

  The transformation is not some physical means, such as hiding the

  data on microfilm or  some such; it is some mathematical

  transformation that scrambles the original data in a way

  that the receiver on the other end knows how to unscramble.


  The process of scrambling (transforming) the data is called

  'encryption'. Unscrambling is called decryption. An

  encryption system has two basic parts. 1) A general

  transformation process called the encryption algorithm. 2) A

  customization of that algorithm called a cipher key. The

  sender and the receiver must find a secure means to exchange

  the cipher key. That is, the same public means used to send

  the encrypted data cannot be used. This may be a difficult

  problem, and has a variety of solutions, but will be assumed

  solved for now. Once the key is successfully exchanged, the

  two parties can separately implement the encryption algorithm

  and its inverse, the decryption algorithm.


  At this point, the data can be transmitted in its encrypted form

  using the agreed-to key to customize the general algorithm to a

  particular version that transforms the data. Since the

  encrypted data is sent over some insecure medium, it is assumed

  that an opponent (the third party) may intercept the data,

  possibly without being detected, and analyze the encrypted

  text, also called cipher text.


  In theory, any encryption system can be defeated, given enough

  time. The amount of time it takes cannot always be predicted.

  This is because the opponent can supply extra information

  that might reduce the computation time a great deal. For one

  thing, the sender and receiver may make a very poor choice of

  cipher key. If the opponent has a list of poor keys, a

  computer may permit a large list of such keys to be tried;

  if the poor key actually used is on the list, the opponent wins

  even if the encryption system is otherwise secure.


  All methods the opponent dreams up have one thing in common.

  It is an attempt to recover the original data without advance

  knowledge of the particular cipher key. There are a wide

  variety of means available and some will be described later on.

  The name for any of these methods is called 'cryptanalysis' and

  the person who does the penetration is called the cryptanalyst.


  In diagram form (the boxes indicated physically secure areas)--


  -------------|                   --------------

    Sender     |                   | Receiver

   "x"         |                   |  cipher key

    cipher key |------->  y  ----->|

   y=Encrypt(  |          |        | x=Decrypt(y,key)

      x,key)   |          |        |

  -------------|          |        |-------------

                          V

                     Opponent

                     z = Cryptanalysis(y,Encrypt Algorithm,

                         general knowledge of x, guesses about

                         secret key, statistical analysis of y)


       The opponent uses Cryptanalysis of message y until

       the value z is either equal to x or z is "enough" like

       x to accomplish the illicit purpose. The sender and

       receiver win whenever recovery of z takes too much time.


Q2:  I have invented this wonderful, fast-running encryption

  algorithm. How do I find out if it is as great as I think it

  is?


  It is one thousand times easier to invent an encryption

  algorithm than it is to discover if it is worthwhile. Most

  designers who have not learned cryptography are used to dealing

  with mathematics that discusses the general case. But,

  successful cryptanalysis often relies on any number of

  fortuitous special cases that the designer must anticipate lest

  a given key and data stream create even one of them. Many

  practical illicit decryptions astonish the newcomer; they seem

  like cheating, but they do work.


  It is easy to get superficial reassurance that a poor

  encryption algorithm seems good. Most people reading this file

  have probably attempted the kinds of cryptograms one finds in

  newspapers and puzzle books (usually called 'cryptograms').

  That encryption algorithm is simple -- one replaces each letter

  of the alphabet with exactly one other letter of the alphabet.

  In less than an hour, sixth graders have been taught to

  successfully solve this kind of cipher. Yet, it has 26

  factorial possible keys (about 2 to the 88th power), which is

  much more than the 2 to the 56th keys of the well known

  commercial algorithm, DES. A large number of keys is

  important, but is not by itself secure. Obviously, the

  successful sixth graders do not attempt all possible keys.

  They use their general knowledge of English to shortcut the

  process and eliminate all but a few possible keys.


  Since the gross mathematical properties of an encryption

  system prove nothing, only cryptanalytic attacks matter

  and these require some study.


Q3:  What is an "attack"?


  An attack is a particular form of cryptanalysis. There are

  generic types of attacks, and some very specific attacks. In

  the end, cryptography is a war of specifics. The opponent

  will either invent a very clever and unique attack or will

  customize a general attack to suit the needs at hand. Some

  attacks might even exploit the properties of a particular

  message or settle for a partial illicit decryption.


  However, in sci.crypt, there is a great deal of discussion

  about attacks, both general and specific, and an assumption

  that the reader can fill in missing details at times. Since

  those who post here usually have other things to do, to-the-bit

  details are often omitted.


Q4:  Hmm. In spite of questions 2 and 3, I still think I

  have a pretty good system. But, it seems pretty hard

  to know if it can really defeat an expert. Still, I know

  it will work if I can keep my method secret, right?


  Good luck. There are documented cases aplenty where an

  encryption system was deduced based entirely on clever analysis

  of the encrypted text alone. There are also known cases

  where encryption systems were deduced because the encrypted

  text was later published verbatim somewhere (for instance, a

  press release) and so the system was figured out, eliminating

  the presumed secrecy advantage for the next cipher text.


Q5:  What are the principal cryptanalytic attacks?


  The first is "cipher text only". This is recovering the text

  or the key by analysis of the text (using statistical means,

  for instance) and by knowing broad details such as whether it

  is the text of someone's speech, a PC-DOS EXE file, whether text

  is in English or French. For non-puzzle examples, such broad

  information is almost always reliably known. People have some

  idea of what they wish to steal. The weakest systems fall to

  this form of attack.


  The next attack is "known plaintext". If one works with a

  newspaper cryptogram, one may have little idea of what is in

  the text. However, if one was illicitly trying to decrypt the

  source code of Megabarfoocorp's C++ compiler, one would be much

  better off. There would be lots of things one could

  confidently expect, such as long strings of the space

  character, strings like "if (" and "if(" and the like.


  There would even be whole phrases like "Copyright 1990,

  Megabarfoocorp International" or some such. With imagination,

  surprisingly long strings can be predicted. Computers can

  tirelessly try a number of trivial variations of such known

  text at a great many locations. Knowledge of the encryption

  system may reveal the correct placement outright or a small

  number of places to try. A wide variety of systems can be

  broken if enough known plaintext can be successfully placed.


  The next attack is "chosen plaintext". In some situations, it

  is possible for the opponent to pose as the legitimate user

  of the encryption system. This is especially true in "public

  key" systems (described later). In this case, the opponent

  can present fairly arbitrary unencrypted data of his/her

  choosing, cause it to be encrypted, and study the results.

  Very few systems ever invented pass this test, but it should be

  seriously considered for any significant use. Why?  No

  designer can dream up all attacks. Moreover, at some point, a

  form of known plaintext attack may well enough approximate a

  chosen plaintext attack to make it worthwhile for the designer

  to allow for it to begin with, especially as it might not be

  dreamed up by the designer!


  There are other attack strategies. One worth mentioning for

  telecommunications applications is the "replay" attack.

  Suppose one has an Automatic Teller Machine (ATM) which uses a

  substitution cipher. Since one assumes the telephone line to

  the ATM can be tapped (why encrypt if it cannot?), one can also

  assume the opponent can _inject_ false ciphertext. Thus,

  without even being aware of the actual system, an opponent may

  be able to simply replay old ciphertext and get the cash drawer

  to give him/her $50 from your account. There are encryption

  systems which avoid this difficulty.


  Another general form of attack often not considered by

  newcomers is comparing multiple messages using the same key.

  It is impractical to use a different key for each cipher

  text (with one important exception called the 'one time

  pad'). Therefore, an opponent will have several different

  texts encrypted in the same key and may be able to exploit

  this fact. All transpositions algorithms (those which merely

  scramble the order of the bytes) are vulnerable to this

  attack. More sophisticated systems are also vulnerable to this

  attack in some cases as well.


  This is a vast area and one of the things that is difficult,

  even for experienced designers, is anticipation of all possible

  attacks. Moreover, there is no obligation on the attacker's

  part to be less mathematically sophisticated than the designer.

  A system that survives the attacks the designer invents may

  fall easily to a mathematical approach the designer of the

  system is unfamiliar with.


  And, one even has to worry about items like a rare bug in the

  program that injects the cipher key rather than the cipher text

  into the output stream. It is amazing how often trifling

  errors in the implementation make theory irrelevant.


Q6: What does make a system 'good'?


  What makes a system 'good' relies on many details specific to

  a given situation. Only after a lot of ingenious attacks are

  tried can a system be released for use. There never can be any

  absolute guarantees. All that can be said is that it defeated

  the best experts available. The opponent may be smarter.


  However, there are some agreed-to minimums that a good system

  must have to even be worth serious analysis --


  1)  It must be suitable for computer use. Ordinary byte streams

   (as arbitrary as possible) would be the input "plain"

   text and byte streams would be the output "cipher" text.

   There should be few cases where some kinds of input text

   produces poor results and these, if they exist, should be

   easily known, described, and avoided.


  2)  To be cost-competitive, it must produce about the same number of

   output cipher bytes as input plain bytes. Relaxing this restriction

   is not as helpful as one might think; the best historical systems

   meet this restriction, so a new system must meet it also.


  3)  Output bytes should have a complex relationship between the key,

   the plain text being encrypted, and possibly some amount of

   text previously encrypted. "Simple", general methods, such

   as creation/construction of minterm sums and matrix inversions should

   not produce the cipher key or an equivalent, usable illicit

   decryption method.


  4)  Trying all keys must not be practical. Trying a cleverly ordered

   subset of the keys must not work. This is hard to achieve; it means

   that the failure of a particular key X to illicitly decrypt must

   not also allow an opponent to conclude that some large class of keys

   "similar" to X need not be tried.


   All keys must be equally strong; any exceptions must be well

   known, few in number, and easily avoided.


  5)  Assume all details about the encryption algorithm are known.

   Relying on a secret method has failed repeatedly. It is prudent to

   assume only the variable part of the system, called the cipher key,

   that is selected by the customer, is unknown.


  6)  Classical attacks must be tried in great variety and ingenuity.

   Details of this are beyond the scope of this file. For a summary

   of the principal attacks, see Question 5, "What are the principal

   cryptanalytic attacks?". It is easy to do this particular step

   incompletely. Remember, there may be little effective limit to the

   resources or the brain power of one's opponent. The data may be

   very valuable and it make take only a couple of days rental of the

   right kind of consultant and a supercomputer to get the job

   done. The legitimate user will, by contrast, have a smaller

   budget that is begrudged as "overhead".


  7) Performance must be competitive with existing solutions. A

   practical problem is that every moment spent encrypting is

   regarded as "overhead". Therefore, the method must not be

   uncompetitive with existing algorithms regarded as secure.


  Inventing one's own system is an interesting pastime and quite

  educational. However, any hope of really securing data requires, at

  the very minimum, mastery of illicit decipherment. It is very easy

  to scramble data impressively and please oneself. This is not

  the same as keeping it actually secure.


Q7:  OK, you guys seem to know a lot about this stuff. I

  think I have a neat system. Here's a message I encrypted.

  Tell me if the system is any good. Oh, and can I keep my

  algorithm secret?  I think I want to patent it, so Q5 does not

  apply in my case.


  Most people read and understand questions 5 and 6 and their

  implications. But, a few individuals, because of what they

  invented, believe they have an exception of some kind. If that

  is you, you're setting yourself up for disappointment. Even if

  you are stone cold right about what you have invented, you are

  not going about proving it properly. The main issue is a

  mindset issue. Analyzing a cipher is not like a proposition

  in geometry or the denouement of a mystery novel. One's

  intuition about proofs may not hold for cryptography.


  Finding out if a cipher is any good is perhaps most like

  debugging a program. Just as you can never be sure you got all

  of the bugs out, you can never be sure you have a cipher that

  will withstand all attacks an opponent might come up with.


  So, even if you do publish some form of challenge, and even if

  the posters of sci.crypt attack it vigorously, it may prove

  nothing.


  Second, you may not be in a position to form a good, sound

  test. Beginners who get this far are peeved that they are

  asked to reveal their encryption algorithm. They may also be

  asked to reveal whole paragraphs of cipher text or to encrypt

  certain texts in a secret key and give the answer back. All

  these things seem like cheating. The answer is: real opponents

  _do_ cheat. Unlike those who post here, they are not above

  burglarizing your home to get a copy of your source code, if

  that is cheaper than hiring experts by the hour, to give a

  relevant example. Whatever we ask for will represent a close

  analogy of a real-world attack.


  If you are still convinced you have a good system, by all means

  go out and try to patent it; you are not legally obliged to

  ask our help, after all. But history is against you; against

  you so much that you will find few people here that are willing

  to trust your definition of a good test of a cipher.


  There is no 'royal road' to cryptography. The best thing to do

  is take a couple of months and seriously study crytanalysis.


Q8:  What are the legal restrictions on cryptography?


  You'd have to ask a lawyer. Most governments consider cryptography a

  sensitive topic for one reason or another. There are a variety

  of restrictions possible.


  Most governments don't give their reasons publicly, so one

  may not know why there are restrictions, just that there are

  restrictions to follow.


  One can expect to find laws about sending encrypted data over national

  borders and may often expect to find laws about the import and

  export of encryption systems.


  Since sending data over Internet, BitNet, Usenet, Fidonet, etc.

  may cross national borders (even if the originator does not

  realize it), a basic familiarity with these laws is recommended

  before sending out encryption systems or encrypted data.


Q9:  What is a public key system?


  A public key system is an encryption method with a unique

  property -- encryption and decryption use different keys and

  one of those keys can be published freely.


  Being able to pass around the 'decrypt' key part of one's

  encryption algorithm allows some very interesting things to

  happen. For one thing, messages can be exchanged by people who

  had not planned on doing so in advance. One merely encrypts in

  one's private key, decrypts using the receiver's public key and

  passes on the result to the receiver. The receiver encrypts in

  his/her own private key, then decrypts using the sender's public

  key, recovering the message.


Q10:  What is key distribution?


  Key distribution is the practical problem of exchanging keys

  between the parties that wish to set up an encryption system

  between the two of them. Especially in a network environment,

  passing keys one can trust back and forth, can be difficult.

  How can one be sure a cipher key was not sent by an impostor?

  Unless the keys are exchanged in a secret, secure place,

  face-to-face, getting keys securely exchanged and with

  knowledge of the fact that the key was sent authentically can

  be difficult. Yet, any practical system must permit reasonably

  convenient, but very secure exchange of cipher keys.


  Once a few special keys are securely exchanged, it may be

  possible to send new cipher keys in encrypted form between the

  sender and receiver that have a known lifetime. That is, the

  cipher key is sent in a special encrypted message using a

  special key used only for exchanging keys. In

  telecommunications environments, this allows frequent change of

  keys between the parties 'safely' over the same insecure medium

  used to send the cipher text. While this idea is at the heart

  of much commercial use of cryptography, it is not easily

  accomplished and less easily summarized.


Q11:  What is the 'one time pad'?


  The 'one time pad' is an encryption method that is known to be

  absolutely, provably secure. How it works is as follows --

  1. Generate a huge number of bits using a naturally random

  process. 2. Both sides exchange this data, which is as much

  data as they are going to exchange later on. 3. Exclusive OR the

  original text, bit by bit, with the 'one time pad' data, never

  reusing the 'one time pad' data. 4. Have elaborate rules to

  keep the two sides in synch so that the data can be recovered

  reliably by the receiver. (Both sides must know where they are

  in terms of how much 'one time pad' has been consumed).


  Note that only genuine, naturally random processes will do. There

  must be no relationship between any prior bit of the 'one time pad'

  and a future bit of the key. "Random number generators", in

  particular, may not substitute and still be a 'one time pad'. The

  reason the one time pad works is precisely because there is no

  relationship between any bits of the key stream. All cipher

  keys are equally probable. All original data messages are

  equally probable. There is no 'hat' to hang analysis upon.

  Even if one can inject as much text into a one time pad as one

  wishes, recovering the key stream tells nothing about the next

  message.


  Unfortunately, one time pads are very ungainly, so they are not

  typically used. The requirement to have a genuinely random process,

  with the right kind of statistical probability, is not easy to

  to set up. The ability to exchange vast amounts of data,

  securely, in advance, is not easy to achieve in environments

  when encryption is needed in the first place.


  There are a variety of cipher systems which generate "pseudo

  one time pad" streams of cipher key, but all have the same

  theoretical vulnerability; any algorithmic process introduces

  relationships between some old key bit(s) and the new key bit

  and so permits cryptanalysis. "Random number generators" are

  frequently dreamed up by newcomers as a "pseudo one time pad",

  but they are notoriously vulnerable to analysis, all

  independent of whether the pseudo-random stream satisfies

  randomness tests or not.


  The favorite example is a "standard" pseudo-random number

  generator of the form x = ((A*x) +C) mod M where A, C, and M

  are fixed and x is the most recent value used to form the last

  "random" number. Thus, the key of the cipher is the initial

  value of x. Let's set M equal to two to the 32nd for this example.

 

  Now, if one can predict or simply find the word

  "the " (the word "the" followed by a space character) on a

  even four byte boundary in the file, one can recover an old

  value of "x" and predict the rest of the keystream from that

  point, which may be enough of a "break" to accomplish the

  purpose. This is true even if the particular A, C, and M

  perfectly satisfy any randomness test that anyone ever devised.


  Naturally, this example can be improved upon, but it

  illustrates the potential problem all such methods have.


Q12:  What is the NSA (National Security Agency)?


  The NSA has several tasks. The most relevant here is that it employs

  the United States' government's cryptographers. Most nations have some

  department that handles cryptography for it, but the US' NSA tends to

  draw the most attention. It is considered equal to or superior to any

  such department in the world. It keeps an extremely low public

  profile, and has a "large", but secret budget. Because of these and

  other factors, there will be many posts speculating about the

  activities and motives of the NSA.


Q13:  What is the American Cryptogram Association?


  American Cryptogram Association Information, Sept 1992


  The American Cryptogram Association is an international group

  of individuals who study cryptography together and publish

  puzzle ciphers to challenge each other and get practical

  experience in solving ciphers. It is a nonprofit group.


  The American Cryptogram Association (ACA) publishes the

  bi-monthly magazine, "The Cryptogram", which contains

  a wide variety of simple substitution ciphers ("cryptograms")

  in English and other languages as well as cryptograms

  using cipher systems of historical interest (such as Playfair).


  The level of difficulty varies from easy to difficult. Except

  for "foreign language" cryptograms, all text is in English.


  The magazine also features "how to" articles at the hobbyist level

  and other features of interest to members. A "Computer Supplement"

  is also available which features articles on computerizing various

  phases of cryptogram solving; the level of the articles varies from

  simple programming examples to automatic algorithmic solutions of

  various cipher systems. The supplement is available as a separate

  subscription, and is published when material permits; usually two

  or three times per year.


  Where to write for subscription or other information --


  ACA Treasurer

  18789 West Hickory St

  Mundelein IL 60060


Q14:  What are some good books on cryptography?


  Good books on this topic that weren't government classified used

  to be rare. There are now a host of good books. One informal

  test of a library's quality is how many of these are on the

  shelves. These are widely available, but few libraries have

  them all.


    David Kahn, The Codebreakers, Macmillan, 1967 [history; excellent]

    H. F. Gaines, Cryptanalysis, Dover, 1956 [originally 1939, as

        Elementary Cryptanalysis]

    Abraham Sinkov, Elementary Cryptanalysis, Math. Assoc. of Amer.,

        1966

    D Denning, Cryptography and Data Security, Addison-Wesley, 1983

    Alan G. Konheim, Cryptography:  A Primer, Wiley-Interscience, 1981

    Meyer and Matyas, Cryptography:  A New Dimension in Computer Data

        Security, John Wiley & Sons, 1982.

  Books can be ordered from Aegan Park Press. They are not inexpensive,

        but they are also the only known public source for most of these

        and other books of historical and analytical interest.

        From the Aegean Park Press P.O. Box 2837, Laguna

        Hills, CA 92654-0837

        [write for current catalog].


  The following is a quality, scholarly journal. Libraries may carry

  it if they are into high technology or computer science.


     Cryptologia:  a cryptology journal, quarterly since Jan 1977.

        Cryptologia; Rose-Hulman Institute of Technology; Terre Haute

        Indiana 47803 [general: systems, analysis, history, ...]


  Thanks to

     cme@ellisun.sw.stratus.com (Carl Ellison)

     Gwyn@BRL.MIL (Doug Gwyn)

     smb@ulysses.att.com (Steven Bellovin)

  for assembling this list of books in bibliography form. I knew

  of each here, but getting all the details is a lot of work.

  Thanks again.

------------------------]

   Larry W. Loen        | My Opinions are decidedly my own,so please

                        | do not attribute them to my employer

------------------------]

=====================================================================

     EFFector Online is published by

     The Electronic Frontier Foundation

     155 Second Street, Cambridge MA 02141

     Phone: +1 617 864 0665 FAX: +1 617 864 0866

     Internet Address: eff@eff.org

 Reproduction of this publication in electronic media is encouraged.

 Signed articles do not necessarily represent the view of the EFF.

 To reproduce signed articles individually, please contact the authors

 for their express permission. EFFector Online is edited by Gerard

 Van der Leun (van@eff.org) and Rita Rouvalis (rita@eff.org).

=====================================================================


Comments

Popular posts from this blog

BOTTOM LIVE script

Fawlty Towers script for "A Touch of Class"