concept-reader-algorithm.tex

%-*- Mode: TeX -*-
%% Reader Algorithm

%% 22.1.1 5
This section describes the algorithm used by the \term{Lisp reader}
to parse \term{objects} from an \term{input} \term{character} \term{stream},
including how the \term{Lisp reader} processes \term{macro characters}.

When dealing with \term{tokens}, the reader's basic function is to distinguish
representations of \term{symbols} from those of \term{numbers}.
%%Barmar didn't like the double negatives:
% When a \term{token} is
% accumulated, it is assumed to be a \term{number} unless it does
% not satisfy the Syntax for Numbers listed in \figref\SyntaxForNumericTokens.
When a \term{token} is accumulated, it is assumed to represent a \term{number} if it
satisfies the syntax for numbers listed in \figref\SyntaxForNumericTokens.
%%Ditto:
% If it is not a \term{number}, it is then assumed to be a potential
% number unless it does not satisfy the rules governing the syntax for a
% \term{potential number}.
If it does not represent a \term{number},
it is then assumed to be a \term{potential number} 
if it satisfies the rules governing the syntax for a \term{potential number}.
If a valid \term{token} is neither a representation of a \term{number} 
			       nor a \term{potential number},
it represents a \term{symbol}.

The algorithm performed by the \term{Lisp reader} is as follows:

%% 22.1.1 6
\beginlist
\item{1.}            
If at end of file, end-of-file processing is performed as specified
in \funref{read}.
Otherwise,
one \term{character}, \param{x},  is read from the \term{input} \term{stream}, and
dispatched according to the \term{syntax type} of \param{x} to one
of steps 2 to 7.

%% 22.1.1 7
\item{2.}                                          
If \param{x} is an \term{invalid} \term{character},
an error \oftype{reader-error} is signaled.

%% 22.1.1 8
\item{3.}
If \param{x} is a \term{whitespace}\meaning{2} \term{character},
then it is discarded and step 1 is re-entered.

%% 22.1.1 9
\item{4.}
If \param{x} is a \term{terminating} or \term{non-terminating} \term{macro character}
then its associated \term{reader macro function} is called with two \term{arguments},
the \term{input} \term{stream} and \param{x}.

%% 22.1.1 10
The \term{reader macro function} may read \term{characters} 
from the \term{input} \term{stream}; 
if it does, it will see those \term{characters} following the \term{macro character}.
The \term{Lisp reader} may be invoked recursively from the \term{reader macro function}.

%% 22.1.5 16
The \term{reader macro function} must not have any side effects other than on the
\term{input} \term{stream};
because of backtracking and restarting of the \funref{read} operation,
front ends to the \term{Lisp reader} (\eg ``editors'' and ``rubout handlers'') 
may cause the \term{reader macro function} to be called repeatedly during the
reading of a single \term{expression} in which \param{x} only appears once.

%% 22.1.1 11
The \term{reader macro function} may return zero values or one value.
If one value is returned,
then that value is returned as the result of the read operation;
the algorithm is done.
If zero values are returned, then step 1 is re-entered.

%% 22.1.1 12
\item{5.}
If \param{x} is a \term{single escape} \term{character}
then the next \term{character}, \param{y}, is read, or an error \oftype{end-of-file} 
is signaled if at the end of file.
\param{y} is treated as if it is a \term{constituent} 
whose only \term{constituent trait} is \term{alphabetic}\meaning{2}.
\param{y} is used to begin a \term{token}, and step 8 is entered.

%% 22.1.1 13
\item{6.}
If \param{x} is a \term{multiple escape} \term{character}
then a \term{token} (initially
containing no \term{characters}) is  begun and step 9 is entered.

%% 22.1.1 14
\item{7.}
If \param{x} is a \term{constituent} \term{character}, then it begins a \term{token}.
After the \term{token} is read in, it will be interpreted
either as a \Lisp\ \term{object} or as being of invalid syntax.
If the \term{token} represents an \term{object},
that \term{object} is returned as the result of the read operation.
If the \term{token} is of invalid syntax, an error is signaled.
% If \param{x} is a \term{lowercase} \term{character},
% it is replaced with the corresponding \term{uppercase} \term{character}.
%% Tentatively replaced with the following to satisfy Sandra:
If \param{x} is a \term{character} with \term{case},
it might be replaced with the corresponding \term{character} of the opposite \term{case}, 
depending on the \term{readtable case} of the \term{current readtable},
as outlined in \secref\ReadtableCaseReadEffect.
\param{X} is used to begin a \term{token}, and step 8 is entered.

%% 22.1.1 15
%% 22.1.1 16
%% 22.1.1 17
\item{8.}
At this point a \term{token} is being accumulated, and an even number
of \term{multiple escape} \term{characters} have been encountered.
If at end of file, step 10 is entered.
Otherwise, a \term{character}, \param{y}, is read, and
one of the following actions is performed according to its \term{syntax type}:

\beginlist
\itemitem{\bull}
If \param{y} is a \term{constituent} or \term{non-terminating} \term{macro character}:
\beginlist
\itemitem{--}
% If \param{y} is a \term{lowercase} \term{character}, it is replaced with the
% corresponding \term{uppercase} \term{character}.
%% Tentatively replaced with the following to satisfy Sandra:
If \param{y} is a \term{character} with \term{case},
it might be replaced with the corresponding \term{character} of the opposite \term{case}, 
depending on the \term{readtable case} of the \term{current readtable},
as outlined in \secref\ReadtableCaseReadEffect.
\itemitem{--}
\param{Y} is appended to the \term{token} being built.
\itemitem{--}
Step 8 is repeated.
\endlist

%% 22.1.1 18
\itemitem{\bull}
If \param{y} is a \term{single escape} \term{character}, then the next \term{character},
\param{z}, is read, or an error \oftype{end-of-file} is signaled if at end of file.
\param{Z} is treated as if it is a \term{constituent} 
whose only \term{constituent trait} is \term{alphabetic}\meaning{2}.
\param{Z} is appended to the \term{token} being built,
and step 8 is repeated.

%% 22.1.1 19
\itemitem{\bull}
If \param{y} is a \term{multiple escape} \term{character},
then step 9 is entered.

%% 22.1.1 20
\itemitem{\bull}
If \param{y} is an \term{invalid} \term{character},
an error \oftype{reader-error} is signaled.

%% 22.1.1 21
\itemitem{\bull}
If \param{y} is a \term{terminating} \term{macro character},
then it terminates the \term{token}.
First the \term{character} \param{y} is unread (see \funref{unread-char}),
and then step 10 is entered.

%% 22.1.1 22
\itemitem{\bull}
If \param{y} is a \term{whitespace}\meaning{2} \term{character}, then it terminates
the \term{token}.  First the \term{character} \param{y} is unread
if appropriate (see \funref{read-preserving-whitespace}),
and then step 10 is entered.
\endlist

%% 22.1.1 23
%% 22.1.1 24
\item{9.}
At this point a \term{token} is being accumulated, and an odd number
of \term{multiple escape} \term{characters} have been encountered.
If at end of file, an error \oftype{end-of-file} is signaled.
Otherwise, a \term{character}, \param{y}, is read, and
one of the following actions is performed according to its \term{syntax type}:

%% 22.1.1 25
\beginlist
\itemitem{\bull}
If \param{y} is a \term{constituent}, macro, or \term{whitespace}\meaning{2} \term{character},
\param{y} is treated as a \term{constituent} 
whose only \term{constituent trait} is \term{alphabetic}\meaning{2}.             
\param{Y} is appended to the \term{token} being built, and step 9 is repeated.

%% 22.1.1 26
\itemitem{\bull}
If \param{y} is a \term{single escape} \term{character}, then the next \term{character},
\param{z}, is read, or an error \oftype{end-of-file} is signaled if at end of file.
\param{Z} is treated as a \term{constituent}
whose only \term{constituent trait} is \term{alphabetic}\meaning{2}.
\param{Z} is appended to the \term{token} being built,
and step 9 is repeated.

%% 22.1.1 27
\itemitem{\bull}
If \param{y} is a \term{multiple escape} \term{character},
then step 8 is entered.

%% 22.1.1 28
\itemitem{\bull}
If \param{y} is an \term{invalid} \term{character},
an error \oftype{reader-error} is signaled.
\endlist

%% 22.1.1 29
\item{10.}
An entire \term{token} has been accumulated.
The \term{object} represented by the \term{token} is returned 
as the result of the read operation,
or an error \oftype{reader-error} is signaled if the \term{token} is not of valid syntax.
\endlist

%% 22.1.1 30
%% 22.1.1 31
%%Barmar observes that this is said elsewhere, and in any case is
%%implied by the algorithm above:
% \term{Single escape} and \term{multiple escape} \term{characters}
% can be included in a \term{token} when
% preceded by another \term{single escape} \term{character}.