Archie Manual
ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L)
NAME
archie - an Internet archive server listing service
SYNOPSIS
archie
DESCRIPTION
The archie system is a program which can query a database
maintained by the Computer Science Department of McGill
University. The database contains a list of software which
is available by means of anonymous ftp(1) to hosts connected
to the Internet network.
The system can be accessed in an interactive fashion or via
electronic mail (email). In order use the interactive sys-
tem:
1) Connect to host quiche.cs.mcgill.ca (132.206.2.3 or
132.206.51.1) with telnet(1).
2) Login as user archie (no capitals, no password
required). The system prints a banner message and
status report.
3) Type ``help'' for further information.
In order to use the email interface, send requests to
archie@cs.mcgill.ca
Send the word ``help'' in a message for available commands
and features. Please note that this is an automated inter-
face: no human sees it. See "THE EMAIL INTERFACE" section
below.
Comments and suggestions should be sent to
archie-l@cs.mcgill.ca
Adimistrative requests such as adding a site to the database
or modifying the Software Description Database should be
sent to
archie-admin@cs.mcgill.ca
THE INTERACTIVE INTERFACE
Variables
archie has a number of variables which modify its behavior.
The values of these variables may be changed using the set
command. archie distinguishes between three types of vari-
able:
boolean
which may be either set or unset.
numeric
representing an integer within a pre-determined range.
string
whose value is a string of characters (which may or may
not be restricted).
The following variables are currently recognized
autologout
By default, archie will exit after one hour of idle
time. This value can be changed though this variable,
which represents in minutes, the length of idle time
before you are automatically logged out.
The minimum and maximum values are 1 and 300,
representing one minute through five hours.
Example:
set autologout 45
will cause you to be automatically logged out after 45
minutes of idle time.
mailto
A string variable whose value is a mail address, or
comma-separated list of addresses. Note that there must
not be any spaces within the list of addresses. If this
is set and the mail command is issued with no argu-
ments, then the output of the last command is mailed to
that address.
Example:
set mailto user@frobozz.com
Example:
set mailto user1@hello.edu,user2@goodbye.com
All the various Internet addressing styles are under-
stood. BITNET sites should use the convention
user@sitename.bitnet
UUCP addresses can be specified as
user@sitename.uucp
maxhits
A numeric variable whose value is the maximum number of
matches you want the prog command to generate.
If archie seems to be slow, or you don't want a lot of
output this can be set to a small value. ``maxhits''
must be within the range 0 to 1000. The default value
is 1000.
Example:
set maxhits 100
prog will now stop after 100 matches have been found
pager
A boolean variable which, when set, tells archie to
filter all output through the pager less(1L). When
using the pager you may also want to set the term vari-
able to your terminal type (see term variable).
Example:
set pager
search
This variable determines the kind of search performed
on the database by the prog command, providing flexi-
bilty on search times and types.
search is a string variable whose value is one of the
following:
sub
Substring (case insensitive). A simple, everyday
substring search. A match occurs if the the file
(or directory) name in the database contains the
user-given substring.
Example:
The pattern ``is'' will match ``islington''
and ``this'' and ``poison''
subcase
Substring (case sensitive). As above but the case
of the strings involved becomes significant.
Example:
``TeX'' will match ``LaTeX'' but not ``Latex''
or ``TExTroff''.
exact
Exact match. The fastest search method of all.
The restriction is that the user string (the argu-
ment to the prog command) has to exactly match
(including case) the string in the database. This
is provided for those of who who know just what
you are looking for.
For example, if you wanted to know where all the
``xlock.tar.Z'' files were, this is the kind of
search to use.
regex
This is the default search method. Searches the
database with the user (search) string which is
given in the form of an ed(1) regular expression.
NOTE: Unless specifically anchored to the begin-
ning (with ^) or end (with $) of a line, ed(1)
regular expressions (effectively) have ``.*''
prepended and appended to them. For example, it is
not necessary to say
prog .*xnlock.*
since
prog xnlock
will suffice. Thus the regex match becomes a sim-
ple substring match.
sortby
This variable describes how the output from the prog
command is to be ordered. It can have one of 5 values
(and their associated reverse orders). For each method,
the ``natural'' sort order (or at least, what we con-
sider to be the natural order) is the default.
hostname
Output is sorted on the archive hostname in lexi-
cal order.
Reverse order rhostname
time
Output is sorted with the most recent modifcation
times of the found file/directory names coming
first (youngest -> oldest).
Reverse order rtime
size
Output is sorted by the size of the found
files/directories, largest first.
Reverse order rsize
filename
Sorted in file/directory name lexical order.
Reverse order rfilename
none
This is the DEFAULT order.
Unsorted. There is no reverse order although rnone
is accepted for symmetry.
Typing the keyboard interrupt character ( Ctl-C for
most people on UNIX) during a search will cause the
search to aborted. The results up to that time will be
sorted (determined by the value of the sortby variable)
and the results output. The output phase may itself be
aborted by typing the abort character a second time.
status
This boolean variable determines if the status-line
will be displayed while the prog command is searching
through the database. If set (which is the default
value) then the number of matches and percentage of the
database searched is displayed. Otherwise no output is
given until the search is complete.
term This variable tells archie what type of terminal you
are using, and optionally its size in rows and columns.
This information is used by the pager.
The usage is:
set term <terminal-type> [<#rows> [<#columns>]]
That is, the terminal type is required, but the number
of rows and columns is optional. You may specify a
value for rows only, but if you want to change the
number of columns you must give a value for both rows
and columns. The default values for rows and columns
are 24 and 80.
Examples:
set term vt100
set term xterm 60
set term xterm 24 100
Regular Expressions
archie uses ed(1) regular expressions in a number of
commands.
A regular expression, on the one hand, is a string like
any other; a sequence of characters. On the other
hand, special characters within the string have certain
functions which make regular expressions useful when
trying to match portions of other strings. In the fol-
lowing discussion and examples, a string containing a
regular expression will be called the ``pattern'', and
the string against which it is to be matched is called
the ``reference string''.
Regular expressions allow one to search for ``all
strings ending with the letters ize
'' or ``all strings beginning with a number between 1
and 3 and ending in a comma''.
In order to accomplish this, regular expressions co-opt
the use of some characters to have special meaning.
They also provide for these characters to lose their
special meaning if the user so desires. The rules for
regular expresssion are
c Any character c matches itself unless it has been
assigned other special meaning as listed below. Most
special characters can be escaped (made to lose its
special meaning), by placing the character '\' in front
of it. This doesn't apply to '{' which is non-special
until it is escaped. Thus although '*' normally has
special meaning the string '\*' matches itself.
Example:
The pattern
acdef
matches
s83acdeffff or acdefsecs or acdefsecs
but not
accdef or aacde1f
That is it will any string that contains ``acdef'' any-
where in the reference string.
Example:
Normally the characters '*' and '$' are special,
but the pattern
a\*bse\$
acts as above. That is any reference string containing
``*abse$'' as a substring will be flagged as a match.
. A period matches any character except the newline
character. This is known as the wildcard character.
Example:
The pattern
....
will match any 4 characters in the reference string,
except a newline character.
^ If `^' appears at the begining of the pattern then it
is said to ``anchor'' the match to the beginning of the
line. That is, the reference string must start with the
pattern following the `^'. If this character appears
anywhere else other than at the beginning of the line,
then it is no longer considered special, and matches
itself as any non-special character would. Similarly if
it starts a string but is escaped, it matches itself.
Example:
The pattern
^efghi
Will match
efghi or efghijlk
but not
abcefghi
That is the pattern will match only those reference
strings starting with ``efghi''. Just containing the
substring is not sufficient.
$ Occurring at the end of the pattern, this character
``anchors'' the pattern to the end of the line (refer-
ence string). A '$' occurring anywhere else in the pat-
tern is regarded as a non-special. Similarly if it is
at the end of the pattern but is escaped, it is non-
special.
Example:
The pattern
efghi$
Will match
efghi or abcdefghi
but not
efghijkl
That is the pattern will match only those reference
strings ending with ``efghi''. Just containing the sub-
string is not sufficient.
\< This sequence in the pattern causes the one character
regular expression following it only to match something
at the beginning of a word: the beginning of a line or
just before a letter, digit or underline character, or
just after a charcter which is not one of these.
Example:
The pattern
\<abc
would match the last 'abc' in the reference string
@hijabc#+abc
but not the first since the first 'abc' did not start
on a ``word'' boundary.
\> Constrains the one-character regular expression fol-
lowing it to be at the end of a ``word'' as defined
above.
[string]
One or more characters within square brackets. This
pattern matches any single character within the brack-
ets. The caret, '^', has a special meaning if it is the
first character in the series: the pattern will match
any character other than one in the list.
Example:
The pattern
[^abc]
Will match any character except 'a', 'b' or 'c'.
To match a right bracket, ']', in the list it must be
put first:
[]ab01]
For a caret, '^', in the list it can appear anywhere
but first.
In
[ab^01]
the caret loses its special meaning.
The '-' character is special within square brackets. It
is interpreted as a range of characters (in the ASCII
character set) and will match any single character
within that range. '[a-z]' matches any lower case
letter. The '-' can be made non special by placing it
first or last within the square brackets.
The characters '$', '*' and '.' are not special within
square brackets.
Example:
The pattern
[ab01]
matches a single occurence of a character from the set
'a', 'b', '0', '1'.
Example:
The pattern
[^ab01]
will match any single character other than 'a', 'b',
'0', '1'.
Example :
The pattern
[a0-9b]
which matches one of 'a', 'b' or a digit between 0 and
9 inclusive.
Example :
The pattern
[^a0-9b.$]
means any single character not 'a', 'b' '.' , '$' or a
digit between 0 and 9 inclusive.
* An asterisk following a regular expression in the pat-
tern has the effect of matching zero or more
occurrences of that expression.
Example:
The pattern
a*
means zero or more occurrences of the character 'a'.
Example:
The pattern
[A-Z]*
means zero or more occurrences of the upper case alpha-
bet.
\{m\}
\{m,\}
\{m,n\}
A one-character regular expression followed by one of
the three of these constructions causes a range of
occurrences of that regular expression to be matched.
If it is followed by \{m\} where m is a non-negative
integer between 0 and 255 (inclusive), then exactly m
occurrences of that regular expression are matched. If
followed by \{m,\}, then at least m occurrences are
matched. Finally, if it is followed by \{m,n\} (where
n is a non-negative integer between 0 and 255 and where
n > m), then between m and n occurrences of the expres-
sion are matched.
Example:
The pattern
ab\{3\}
would match any substring in the reference string of an
'a' followed by exactly 3 'b's.
Example:
The pattern
ab\{3,\}
would match any substring in the reference string of an
'a' followed by at least 3 'b's.
Example:
The pattern
ab\{3,5\}
would match any substring in the reference string of an
'a' followed by at least 3 but at most 5 'b's.
Common Problems with Regular Expression
(1) When matching a substring it is not necessary to use
the wildcard character to match the part of the refer-
ence string preceeding and following the substring.
Example:
The pattern
abcd
will match any reference string containing this pat-
tern. It is not necessary to use
.*abcd.*
as the pattern.
(2) In order to constrain a pattern to the entire reference
pattern, use the the construction:
^pattern$
(3) The easiest way to obtain case insensitivity in a regu-
lar expression is to use the '[]' operator. For exam-
ple, a pattern to match the word ``hello'' regarless of
the case of the letters would be:
[Hh][Ee][Ll][Ll][Oo]
Commands
Arguments to commands shown here in square brackets
'[]' are optional. All others are mandatory. help List
the valid archie commands.
list [pattern]
This command provides a list of the sites currently
stored in the database and the time at which they were
last updated. There is an optional regular expression
argument to limit the list to specific sites.
Note that the numerical (IP) address associated with a
site name is valid at the listed time, but since they
do occasionally change, it is possible that a
discrepancy may occur until that site is updated in our
database. Furthermore, the listed IP address is the
primary, as listed in the DNS database: secondary
addresses are not stored.
Example:
list
will list all sites in the database, while
list \.de$
lists all German sites.
mail [address1,[address2...]]
With an argument (or arguments) the output of the last
command is mailed to the specified address or comma-
separated list of addresses. No spaces must appear
anywhere in the address list.
Example:
mail user1@hello.edu,user2@goodbye.com
Without an argument the output of the last command is
sent to the address specified in the mailto variable.
Example:
All the various Internet addressing styles are under-
stood. BITNET sites should use the convention
user@sitename.bitnet
UUCP address can be specified as
user@sitename.uucp
prog pattern
Find all occurrences of programs with names matching
pattern. How pattern is interpreted depends on the
value of the search variable. The output lists the
names of hosts with matching entries, the size of the
matching program, its last modification date and its
path.
The results are sorted according to the value of ths
sortby variable, and are limited in number by the max-
hits variable.
set variable-name
This command allows you to set one of archie's vari-
ables. Their values affect how archie interacts with
the user.
boolean variables are either set or unset
Example:
set pager
numeric variables take a number within a certain range
Example:
set maxhits 500
string variables take a (possibly restricted) string
value
Example:
set sortby time
See entries on unset and show .
show [variable-name]
This command is used to display the value of a partic-
ular variable, or all variables. With an argument it
will display the value of that variable, without an
argument it will display the value of all variables.
Example:
show maxhits
site sitename
This command allows you to get a full listing of an
ftp(1) site in the archie database. The output format
is similar to that of UNIX ls(1) long recursive (-lR)
listing.
Example:
site col.hp.com
unset variable
This causes the specified variable to have no value.
This means that it will not be used by archie until it
has been given a value with the set command.
Note: this may cause ``counter-intuitive'' behaviour in
some cases (e.g. in the case of maxhits ). Although
one might expect prog to print matches without regard
for any limit, this is not the case. If the value of
maxhits is not available it will merely fall back to
some internal default.
whatis substring
This command searches the archie Software Description
Database for the given substring, with case being
ignored. This database consists of names and short
descriptions of many of the software packages, docu-
ments (like RFCs and educational material) and data
files that are stored on the Internet.
Example:
whatis uucp
in part gives as a result:
findpath.sh UUCP Pathfinder
logfile-stats UUCP LOGFILE analyzer
mapstats UUCP map statistics program
We welcome and encourage additions and corrections to
this database and depend on the archie user community
to keep it uptodate. To make your contribution to this
database, mail to
archie-admin@cs.mcgill.ca
For new additions, please keep the description to 25
words or less.
THE EMAIL INTERFACE
The archie email interface currently accepts a limited sub-
set of the interactive interface commands, plus a few of its
own. Currently variables are not supported in the email
interface.
Requests to this server should be addressed to
archie@cs.mcgill.ca
Note that the ``Subject:'' line in incoming mail is pro-
cessed as if it were part of the main message body. No spe-
cial keywords are required.
Note that the help command is exclusive. All other commands
in the same message are ignored.
The server recognizes the following commands. If a message
not containing any valid requests or an empty message is
received, it will be considered to be a help request.
path path
This lets the requestor override the address that would
normally be extracted from the header. If you do not
hear from the archive server within a couple of hours
might consider adding a path command to your request.
The path describes how to mail a message from
cs.mcgill.ca to your address. cs.mcgill.ca is fully
connected to the Internet.
BITNET users can use the convention
user@site.bitnet
UUCP user can use the convention
user@site.uucp
help Will send you a message describing how to use the email
interface (basically this section).
prog <reg expr1> [<reg exp2> ...]
A search of the archie database is performed with each
<reg exp> (a regular expression as defined by ed(1)) in
turn, and any matches found are returned to the reques-
tor. Note that multiple <reg exp> may be placed on one
line, in which case the results will be mailed back to
you in one message. If you have multiple prog lines,
then multiple messages will be returned, one for each
line [This doesn't work as expected at the moment...
stay tuned].
Any regular expression containing spaces must be quoted
with single (') or double (") quotes. ALL OTHER ed(1)
rules must be followed.
NOTE: The searches are CASE SENSITIVE. The ability to
change this will hopefully be added soon.
The prog command is currently executed as if the search
variable were set to regex.
site <site name> | <site IP address>
A listing of the given <site name> will be returned.
The fully qualified domain name or IP address may be
used.
compress
ALL of your files in the current mail message will run
through compress(1) and uuencode(1). When you receive
the reply, remove everything before the ``begin'' line
and run it through uudecode(1). This will produce a .Z
file. You can then run uncompress(1) on this file and
get the results of your request.
quit Nothing past this point is interpreted. This is pro-
vided so that the occasional lost soul whose signature
contains a line that looks like a command can still use
the server without getting a bogus response.
THE ARCHIE DATABASE
The archie database subsystem maintains a list of about 600
Internet ftp(1) archive sites. Each night, the database
subsystem executes an anonymous ftp(1) to a subset of these
sites and fetches a recursive directory listing (or a file
containing the recursive directory listing if this exists).
Currently, each site gets updated approximately once a
month. The directory listings are stored on
quiche.cs.mcgill.ca (132.206.2.3), where they are available
to the Internet community via anonymous ftp(1). They appear
in the directory ~ftp/archie/listings in compressed form.
BUGS
1) Only UNIX sites are included in the database.
2) The user can not limit searches to specific sites.
3) There is no graphical user interface.
4) There is no way to abort the help facility completely.
It is hoped that all these will change in coming versions.
LONG TERM PLANS
The archie system is regarded as being ``in development''
and is not being released to outside sites at present. The
current database requires about 70 MB of disk storage, and
the updates and searches put a noticeable load on the Sun
4/280 on which it operating. Eventually, we hope to distri-
bute archie to several sites around the world.
We welcome comments and suggestions; please send them to
archie-l@cs.mcgill.ca.
SEE ALSO
ftp(1), telnet(1)
AUTHORS
Alan Emtage (bajan@cs.mcgill.ca), McGill University.
Bill Heelan (wheelan@cs.mcgill.ca), McGill University.
Manual page by R. P. C. Rodgers, UCSF School of Pharmacy,
San Francisco, California 94143
(rodgers@maxwell.mmwb.ucsf.edu) and Alan Emtage.
Comments
Post a Comment