Archie Manual

 ARCHIE(1L)        MISC. REFERENCE MANUAL PAGES         ARCHIE(1L)



NAME

     archie - an Internet archive server listing service


SYNOPSIS

     archie


DESCRIPTION

     The archie system is a program which can  query  a  database

     maintained  by  the  Computer  Science  Department of McGill

     University.  The database contains a list of software  which

     is available by means of anonymous ftp(1) to hosts connected

     to the Internet network.


     The system can be accessed in an interactive fashion or  via

     electronic  mail  (email). In order use the interactive sys-

     tem:


     1)   Connect to  host  quiche.cs.mcgill.ca  (132.206.2.3  or

          132.206.51.1) with telnet(1).


     2)   Login  as  user  archie  (no  capitals,   no   password

          required).   The  system  prints  a  banner message and

          status report.


     3)   Type ``help'' for further information.


     In order to use the email interface, send requests to


               archie@cs.mcgill.ca


     Send the word ``help'' in a message for  available  commands

     and  features.  Please note that this is an automated inter-

     face: no human sees it. See "THE  EMAIL  INTERFACE"  section

     below.


     Comments and suggestions should be sent to


               archie-l@cs.mcgill.ca


     Adimistrative requests such as adding a site to the database

     or  modifying  the  Software  Description Database should be

     sent to


               archie-admin@cs.mcgill.ca


THE INTERACTIVE INTERFACE

     Variables


     archie has a number of variables which modify its  behavior.

     The  values  of these variables may be changed using the set

     command.  archie distinguishes between three types of  vari-

     able:


     boolean

          which may be either set or unset.


     numeric

          representing an integer within a pre-determined range.


     string

          whose value is a string of characters (which may or may

          not be restricted).


     The following variables are currently recognized


     autologout


          By default, archie will exit after  one  hour  of  idle

          time.   This value can be changed though this variable,

          which represents in minutes, the length  of  idle  time

          before you are automatically logged out.


          The  minimum  and  maximum  values  are  1   and   300,

          representing one minute through five hours.


          Example:


             set autologout 45


          will cause you to be automatically logged out after  45

          minutes of idle time.



     mailto


          A string variable whose value is  a  mail  address,  or

          comma-separated list of addresses. Note that there must

          not be any spaces within the list of addresses. If this

          is  set  and  the  mail command is issued with no argu-

          ments, then the output of the last command is mailed to

          that address.


          Example:


             set mailto user@frobozz.com


          Example:


             set mailto user1@hello.edu,user2@goodbye.com


          All the various Internet addressing styles  are  under-

          stood. BITNET sites should use the convention


             user@sitename.bitnet


          UUCP addresses can be specified as


              user@sitename.uucp


     maxhits


          A numeric variable whose value is the maximum number of

          matches you want the prog command to generate.


          If archie seems to be slow, or you don't want a lot  of

          output  this  can be set to a small value.  ``maxhits''

          must be within the range 0 to 1000.  The default  value

          is 1000.


          Example:


             set maxhits 100


          prog will now stop after 100 matches have been found


     pager


          A boolean variable which, when  set,  tells  archie  to

          filter  all  output  through  the pager less(1L).  When

          using the pager you may also want to set the term vari-

          able to your terminal type (see term variable).


          Example:


             set pager


     search


          This variable determines the kind of  search  performed

          on  the  database by the prog command, providing flexi-

          bilty on search times and types.



          search is a string variable whose value is one  of  the

          following:


          sub


               Substring (case insensitive). A  simple,  everyday

               substring  search.  A match occurs if the the file

               (or directory) name in the database  contains  the

               user-given substring.


               Example:


                    The pattern ``is'' will  match  ``islington''

               and ``this'' and      ``poison''


          subcase


               Substring (case sensitive). As above but the  case

               of the strings involved becomes significant.


               Example:


                   ``TeX'' will match ``LaTeX'' but not ``Latex''

               or ``TExTroff''.


          exact


               Exact match. The fastest  search  method  of  all.

               The restriction is that the user string (the argu-

               ment to the prog command)  has  to  exactly  match

               (including  case) the string in the database. This

               is provided for those of who who  know  just  what

               you are looking for.


               For example, if you wanted to know where  all  the

               ``xlock.tar.Z''  files  were,  this is the kind of

               search to use.


          regex


               This is the default search method.   Searches  the

               database  with  the  user (search) string which is

               given in the form of an ed(1) regular expression.


               NOTE: Unless specifically anchored to  the  begin-

               ning  (with  ^)  or  end (with $) of a line, ed(1)

               regular  expressions  (effectively)  have   ``.*''

               prepended and appended to them. For example, it is

               not necessary to say


                    prog .*xnlock.*


               since


                    prog xnlock


               will suffice. Thus the regex match becomes a  sim-

               ple substring match.


     sortby


          This variable describes how the output  from  the  prog

          command  is  to be ordered. It can have one of 5 values

          (and their associated reverse orders). For each method,

          the  ``natural''  sort order (or at least, what we con-

          sider to be the natural order) is the default.


          hostname


               Output is sorted on the archive hostname in  lexi-

               cal order.


               Reverse order rhostname


          time


               Output is sorted with the most recent  modifcation

               times  of  the  found  file/directory names coming

               first (youngest -> oldest).


               Reverse order rtime


          size


               Output  is  sorted  by  the  size  of  the   found

               files/directories, largest first.


               Reverse order rsize


          filename


               Sorted in file/directory name lexical order.


               Reverse order rfilename


          none


               This is the DEFAULT order.


               Unsorted. There is no reverse order although rnone

               is accepted for symmetry.


          Typing the keyboard interrupt  character  (  Ctl-C  for

          most  people  on  UNIX)  during a search will cause the

          search to aborted. The results up to that time will  be

          sorted (determined by the value of the sortby variable)

          and the results output. The output phase may itself  be

          aborted by typing the abort character a second time.


     status


          This boolean variable  determines  if  the  status-line

          will  be  displayed while the prog command is searching

          through the database. If  set  (which  is  the  default

          value) then the number of matches and percentage of the

          database searched is displayed. Otherwise no output  is

          given until the search is complete.


     term  This variable tells archie what type of  terminal  you

          are using, and optionally its size in rows and columns.

          This information is used by the pager.


          The usage is:


             set term <terminal-type> [<#rows> [<#columns>]]


          That is, the terminal type is required, but the  number

          of  rows  and  columns  is optional.  You may specify a

          value for rows only, but if  you  want  to  change  the

          number  of  columns you must give a value for both rows

          and columns.  The default values for rows  and  columns

          are 24 and 80.


          Examples:


             set term vt100


             set term xterm 60


             set term xterm 24 100




     Regular Expressions


          archie uses ed(1) regular expressions in  a  number  of

          commands.


          A regular expression, on the one hand, is a string like

          any  other;  a  sequence  of  characters.  On the other

          hand, special characters within the string have certain

          functions  which  make  regular expressions useful when

          trying to match portions of other strings.  In the fol-

          lowing  discussion  and examples, a string containing a

          regular expression will be called the ``pattern'',  and

          the  string against which it is to be matched is called

          the ``reference string''.


          Regular expressions  allow  one  to  search  for  ``all

          strings ending with the letters ize

           '' or ``all strings beginning with a number between  1

          and 3 and ending in a comma''.


          In order to accomplish this, regular expressions co-opt

          the  use  of  some  characters to have special meaning.

          They also provide for these characters  to  lose  their

          special  meaning  if the user so desires. The rules for

          regular expresssion are



     c    Any character c  matches  itself  unless  it  has  been

          assigned  other  special  meaning as listed below. Most

          special characters can be escaped  (made  to  lose  its

          special meaning), by placing the character '\' in front

          of it. This doesn't apply to '{' which  is  non-special

          until  it  is  escaped.  Thus although '*' normally has

          special meaning the string '\*' matches itself.


          Example:


          The pattern


               acdef


          matches


               s83acdeffff or acdefsecs or acdefsecs


          but not


               accdef or aacde1f


          That is it will any string that contains ``acdef'' any-

          where in the reference string.


          Example:


               Normally the characters '*'  and '$' are  special,

          but the pattern


               a\*bse\$


          acts as above. That is any reference string  containing

          ``*abse$'' as a substring will be flagged as a match.




     .     A period matches  any  character  except  the  newline

          character. This is known as the wildcard character.


          Example:


               The pattern


                ....


          will match any 4 characters in  the  reference  string,

          except a newline character.



     ^    If `^' appears at the begining of the pattern  then  it

          is said to ``anchor'' the match to the beginning of the

          line. That is, the reference string must start with the

          pattern  following  the  `^'. If this character appears

          anywhere else other than at the beginning of the  line,

          then  it  is  no longer considered special, and matches

          itself as any non-special character would. Similarly if

          it starts a string but is escaped, it matches itself.


          Example:


          The pattern


               ^efghi


          Will match


               efghi or efghijlk


          but not


               abcefghi


          That is the pattern will  match  only  those  reference

          strings  starting  with  ``efghi''. Just containing the

          substring is not sufficient.



     $     Occurring at the end of the  pattern,  this  character

          ``anchors''  the pattern to the end of the line (refer-

          ence string). A '$' occurring anywhere else in the pat-

          tern  is  regarded as a non-special. Similarly if it is

          at the end of the pattern but is escaped,  it  is  non-

          special.


          Example:


          The pattern


               efghi$


          Will match


               efghi or abcdefghi


          but not


               efghijkl


          That is the pattern will  match  only  those  reference

          strings ending with ``efghi''. Just containing the sub-

          string is not sufficient.



     \<    This sequence in the pattern causes the one  character

          regular expression following it only to match something

          at the beginning of a word: the beginning of a line  or

          just  before a letter, digit or underline character, or

          just after a charcter which is not one of these.


          Example:


               The pattern


               \<abc


          would match the last 'abc' in the reference string


               @hijabc#+abc


          but not the first since the first 'abc' did  not  start

          on a ``word'' boundary.



     \>    Constrains the one-character regular  expression  fol-

          lowing  it  to  be  at the end of a ``word'' as defined

          above.



     [string]


          One or more characters within  square  brackets.   This

          pattern  matches any single character within the brack-

          ets. The caret, '^', has a special meaning if it is the

          first  character  in the series: the pattern will match

          any character other than one in the list.


          Example:


               The pattern


               [^abc]


          Will match any character except 'a', 'b' or 'c'.


          To match a right bracket, ']', in the list it  must  be

          put first:


               []ab01]


          For a caret, '^', in the list it  can  appear  anywhere

          but first.


          In


               [ab^01]


          the caret loses its special meaning.



          The '-' character is special within square brackets. It

          is  interpreted  as a range of characters (in the ASCII

          character set) and  will  match  any  single  character

          within  that  range.   '[a-z]'  matches  any lower case

          letter. The '-' can be made non special by  placing  it

          first or last within the square brackets.



          The characters '$', '*' and '.' are not special  within

          square brackets.



          Example:


               The pattern


               [ab01]


          matches a single occurence of a character from the  set

          'a', 'b', '0', '1'.


          Example:


               The pattern


               [^ab01]


          will match any single character other  than  'a',  'b',

          '0', '1'.



          Example :


               The pattern


               [a0-9b]


          which matches one of 'a', 'b' or a digit between 0  and

          9 inclusive.


          Example :


               The pattern


               [^a0-9b.$]



          means any single character not 'a', 'b' '.' , '$' or  a

          digit between 0 and 9 inclusive.


     *     An asterisk following a regular expression in the pat-

          tern   has   the   effect  of  matching  zero  or  more

          occurrences of that expression.


          Example:


               The pattern


               a*


          means zero or more occurrences of the character 'a'.



          Example:


               The pattern


               [A-Z]*


          means zero or more occurrences of the upper case alpha-

          bet.





     \{m\}


     \{m,\}


     \{m,n\}


          A one-character regular expression followed by  one  of

          the  three  of  these  constructions  causes a range of

          occurrences of that regular expression to  be  matched.

          If  it  is  followed by \{m\} where m is a non-negative

          integer between 0 and 255 (inclusive), then  exactly  m

          occurrences  of that regular expression are matched. If

          followed by \{m,\}, then at  least  m  occurrences  are

          matched.   Finally, if it is followed by \{m,n\} (where

          n is a non-negative integer between 0 and 255 and where

          n > m), then between m and n occurrences of the expres-

          sion are matched.


          Example:


               The pattern


               ab\{3\}


          would match any substring in the reference string of an

          'a' followed by exactly 3 'b's.


          Example:


               The pattern


               ab\{3,\}


          would match any substring in the reference string of an

          'a' followed by at least 3 'b's.



          Example:


               The pattern


               ab\{3,5\}


          would match any substring in the reference string of an

          'a' followed by at least 3 but at most 5 'b's.



          Common Problems with Regular Expression



     (1)  When matching a substring it is not  necessary  to  use

          the  wildcard character to match the part of the refer-

          ence string preceeding and following the substring.


          Example:


               The pattern


               abcd


          will match any reference string  containing  this  pat-

          tern. It is not necessary to use


                .*abcd.*


          as the pattern.



     (2)  In order to constrain a pattern to the entire reference

          pattern, use the the construction:


               ^pattern$



     (3)  The easiest way to obtain case insensitivity in a regu-

          lar  expression  is to use the '[]' operator. For exam-

          ple, a pattern to match the word ``hello'' regarless of

          the case of the letters would be:


               [Hh][Ee][Ll][Ll][Oo]



     Commands


          Arguments to commands shown  here  in  square  brackets

          '[]' are optional. All others are mandatory.  help List

          the valid archie commands.


     list [pattern]

          This command provides a list  of  the  sites  currently

          stored  in the database and the time at which they were

          last updated.  There is an optional regular  expression

          argument to limit the list to specific sites.


          Note that the numerical (IP) address associated with  a

          site  name  is valid at the listed time, but since they

          do  occasionally  change,  it  is   possible   that   a

          discrepancy may occur until that site is updated in our

          database. Furthermore, the listed  IP  address  is  the

          primary,  as  listed  in  the  DNS  database: secondary

          addresses are not stored.


          Example:


               list


          will list all sites in the database, while


               list \.de$


          lists all German sites.


     mail [address1,[address2...]]

          With an argument (or arguments) the output of the  last

          command  is  mailed  to the specified address or comma-

          separated list of addresses.   No  spaces  must  appear

          anywhere in the address list.


          Example:


               mail user1@hello.edu,user2@goodbye.com


          Without an argument the output of the last  command  is

          sent to the address specified in the mailto variable.


          Example:


               mail


          All the various Internet addressing styles  are  under-

          stood. BITNET sites should use the convention


               user@sitename.bitnet


          UUCP address can be specified as


               user@sitename.uucp


     prog pattern

          Find all occurrences of programs  with  names  matching

          pattern.  How  pattern  is  interpreted  depends on the

          value of the search variable.   The  output  lists  the

          names  of  hosts with matching entries, the size of the

          matching program, its last modification  date  and  its

          path.


          The results are sorted according to the  value  of  ths

          sortby  variable, and are limited in number by the max-

          hits variable.


     set variable-name

          This command allows you to set one  of  archie's  vari-

          ables.   Their  values affect how archie interacts with

          the user.


          boolean variables are either set or unset


          Example:


               set pager


          numeric variables take a number within a certain range


          Example:


               set maxhits 500


          string variables take a  (possibly  restricted)  string

          value


          Example:


               set sortby time



          See entries on unset and show .




     show [variable-name]

          This command is  used to display the value of a partic-

          ular  variable,  or  all variables. With an argument it

          will display the value of  that  variable,  without  an

          argument it will display the value of all variables.


          Example:


             show maxhits


     site sitename

          This command allows you to get a  full  listing  of  an

          ftp(1)  site in the archie database.  The output format

          is similar to that of UNIX ls(1) long  recursive  (-lR)

          listing.


          Example:


             site col.hp.com


     unset variable

          This causes the specified variable to  have  no  value.

          This  means that it will not be used by archie until it

          has been given a value with the set command.


          Note: this may cause ``counter-intuitive'' behaviour in

          some  cases  (e.g.  in the case of maxhits ).  Although

          one might expect prog to print matches  without  regard

          for  any  limit, this is not the case.  If the value of

          maxhits is not available it will merely  fall  back  to

          some internal default.


     whatis substring

          This  command searches the archie Software  Description

          Database  for  the  given  substring,  with  case being

          ignored. This database  consists  of  names  and  short

          descriptions  of  many  of the software packages, docu-

          ments (like RFCs and  educational  material)  and  data

          files that are stored on the Internet.


          Example:


             whatis uucp


          in part gives as a result:


               findpath.sh             UUCP Pathfinder

               logfile-stats           UUCP LOGFILE analyzer

               mapstats                UUCP map statistics program


          We welcome and encourage additions and  corrections  to

          this  database  and depend on the archie user community

          to keep it uptodate. To make your contribution to  this

          database, mail to



                    archie-admin@cs.mcgill.ca


          For new additions, please keep the  description  to  25

          words or less.



THE EMAIL INTERFACE

     The archie email interface currently accepts a limited  sub-

     set of the interactive interface commands, plus a few of its

     own. Currently variables are  not  supported  in  the  email

     interface.



     Requests to this server should be addressed to


                    archie@cs.mcgill.ca


     Note that the ``Subject:'' line in  incoming  mail  is  pro-

     cessed  as if it were part of the main message body. No spe-

     cial keywords are required.


     Note that the help command is exclusive. All other  commands

     in the same message are ignored.


     The server recognizes the following commands. If  a  message

     not  containing  any  valid  requests or an empty message is

     received, it will be considered to be a help request.



     path path

          This lets the requestor override the address that would

          normally  be  extracted from the header.  If you do not

          hear from the archive server within a couple  of  hours

          might  consider  adding a path command to your request.

          The  path  describes  how  to  mail  a   message   from

          cs.mcgill.ca  to  your  address.  cs.mcgill.ca is fully

          connected to the Internet.



          BITNET users can use the convention


               user@site.bitnet


          UUCP user can use the convention


               user@site.uucp



     help Will send you a message describing how to use the email

          interface (basically this section).



     prog <reg expr1> [<reg exp2> ...]


          A search of the archie database is performed with  each

          <reg exp> (a regular expression as defined by ed(1)) in

          turn, and any matches found are returned to the reques-

          tor.  Note that multiple <reg exp> may be placed on one

          line, in which case the results will be mailed back  to

          you  in  one message.  If you have multiple prog lines,

          then multiple messages will be returned, one  for  each

          line  [This  doesn't  work as expected at the moment...

          stay tuned].


          Any regular expression containing spaces must be quoted

          with  single  (') or double (") quotes. ALL OTHER ed(1)

          rules must be followed.


          NOTE: The searches are CASE SENSITIVE. The  ability  to

          change this will hopefully be added soon.


          The prog command is currently executed as if the search

          variable were set to regex.



     site <site name> | <site IP address>


          A listing of the given <site name>  will  be  returned.

          The  fully  qualified  domain name or IP address may be

          used.



     compress


          ALL of your files in the current mail message will  run

          through  compress(1)  and uuencode(1). When you receive

          the reply, remove everything before the ``begin''  line

          and run it through uudecode(1).  This will produce a .Z

          file. You can then run uncompress(1) on this  file  and

          get the results of your request.




     quit Nothing past this point is interpreted.  This  is  pro-

          vided  so that the occasional lost soul whose signature

          contains a line that looks like a command can still use

          the server without getting a bogus response.




THE ARCHIE DATABASE

     The archie database subsystem maintains a list of about  600

     Internet  ftp(1)  archive  sites.   Each night, the database

     subsystem executes an anonymous ftp(1) to a subset of  these

     sites  and  fetches a recursive directory listing (or a file

     containing the recursive directory listing if this  exists).


     Currently,  each  site  gets  updated  approximately  once a

     month.    The   directory    listings    are    stored    on

     quiche.cs.mcgill.ca  (132.206.2.3), where they are available

     to the Internet community via anonymous ftp(1).  They appear

     in the directory ~ftp/archie/listings in compressed form.


BUGS

     1)   Only UNIX sites are included in the database.


     2)   The user can not limit searches to specific sites.


     3)   There is no graphical user interface.


     4)   There is no way to abort the help facility completely.


     It is hoped that all these will change in coming versions.



LONG TERM PLANS

     The archie system is regarded as  being  ``in  development''

     and  is not being released to outside sites at present.  The

     current database requires about 70 MB of disk  storage,  and

     the  updates  and  searches put a noticeable load on the Sun

     4/280 on which it operating.  Eventually, we hope to distri-

     bute archie to several sites around the world.


     We welcome comments and suggestions;  please  send  them  to

     archie-l@cs.mcgill.ca.


SEE ALSO

     ftp(1), telnet(1)


AUTHORS

     Alan Emtage (bajan@cs.mcgill.ca), McGill University.


     Bill Heelan (wheelan@cs.mcgill.ca), McGill University.



     Manual page by R. P. C. Rodgers, UCSF  School  of  Pharmacy,

     San           Francisco,           California          94143

     (rodgers@maxwell.mmwb.ucsf.edu) and Alan Emtage.

Comments

Popular posts from this blog

BOTTOM LIVE script

Fawlty Towers script for "A Touch of Class"