The Text Interpreter (Gforth Manual)

5.13 The Text Interpreter

The text interpreter²³ is an endless loop that processes input from the current input device. It is also called the outer interpreter, in contrast to the inner interpreter (see Engine) which executes the compiled Forth code on interpretive implementations.

The text interpreter operates in one of two states: interpret state and compile state. The current state is defined by the aptly-named variable state.

This section starts by describing how the text interpreter behaves when it is in interpret state, processing input from the user input device – the keyboard. This is the mode that a Forth system is in after it starts up.

The text interpreter works from an area of memory called the input buffer²⁴, which stores your keyboard input when you press the RET key. Starting at the beginning of the input buffer, it skips leading spaces (called delimiters) then parses a string (a sequence of non-space characters) until it reaches either a space character or the end of the buffer. Having parsed a string, it makes two attempts to process it:

It looks for the string in a dictionary of definitions. If the string is found, the string names a definition (also known as a word) and the dictionary search returns information that allows the text interpreter to perform the word’s interpretation semantics. In most cases, this simply means that the word will be executed.
If the string is not found in the dictionary, the text interpreter attempts to treat it as a number, using the rules described in Number Conversion. If the string represents a legal number in the current radix, the number is pushed onto a parameter stack (the data stack for integers, the floating-point stack for floating-point numbers).

If both attempts fail, the text interpreter discards the remainder of the input buffer, issues an error message and waits for more input. If one of the attempts succeeds, the text interpreter repeats the parsing process until the whole of the input buffer has been processed, at which point it prints the status message “ ok” and waits for more input.

The text interpreter keeps track of its position in the input buffer by updating a variable called >IN (pronounced “to-in”). The value of >IN starts out as 0, indicating an offset of 0 from the start of the input buffer. The region from offset >IN @ to the end of the input buffer is called the parse area²⁵. This example shows how >IN changes as the text interpreter parses the input buffer:

: remaining source >in @ /string
  cr ." ->" type ." <-" ; immediate 

1 2 3 remaining + remaining . 

: foo 1 2 3 remaining swap remaining ;

The result is:

->+ remaining .<-
->.<-5  ok

->SWAP remaining ;-<
->;<-  ok

The value of >IN can also be modified by a word in the input buffer that is executed by the text interpreter. This means that a word can “trick” the text interpreter into either skipping a section of the input buffer²⁶ or into parsing a section twice. For example:

: lat ." <<foo>>" ;
: flat ." <<bar>>" >IN DUP @ 3 - SWAP ! ;

When flat is executed, this output is produced²⁷:

<<bar>><<foo>>

This technique can be used to work around some of the interoperability problems of parsing words. Of course, it’s better to avoid parsing words where possible.

Two important notes about the behaviour of the text interpreter:

It processes each input string to completion before parsing additional characters from the input buffer.
It treats the input buffer as a read-only region (and so must your code).

When the text interpreter is in compile state, its behaviour changes in these ways:

If a parsed string is found in the dictionary, the text interpreter will perform the word’s compilation semantics. In most cases, this simply means that the execution semantics of the word will be appended to the current definition.
When a number is encountered, it is compiled into the current definition (as a literal) rather than being pushed onto a parameter stack.
If an error occurs, state is modified to put the text interpreter back into interpret state.
Each time a line is entered from the keyboard, Gforth prints “ compiled” rather than “ ok”.

When the text interpreter is using an input device other than the keyboard, its behaviour changes in these ways:

When the parse area is empty, the text interpreter attempts to refill the input buffer from the input source. When the input source is exhausted, the input source is set back to the previous input source.
It doesn’t print out “ ok” or “ compiled” messages each time the parse area is emptied.
If an error occurs, the input source is set back to the user input device.

You can read about this in more detail in Input Sources.

>in       – addr         core       “to-in”

uvar variable – a-addr is the address of a cell containing the char offset from the start of the input buffer to the start of the parse area.

source       – addr u         core       “source”

Return address addr and length u of the current input buffer

tib       – addr         core-ext-obsolescent       “t-i-b”

#tib       – addr         core-ext-obsolescent       “number-t-i-b”

uvar variable – a-addr is the address of a cell containing the number of characters in the terminal input buffer. OBSOLESCENT: source superceeds the function of this word.

• Input Sources:
• Number Conversion:
• Interpret/Compile states:
• Interpreter Directives:
• Recognizers:

5.13 The Text Interpreter

Footnotes

(23)

(24)

(25)

(26)

(27)