The nicest way to write a string literal is to write it as
"STRING"
. You can use the same \-escapes inside as for
s\"
. However, this way is non-standard, so you may want
to use one of the following words for improved portability:
s\"
( compilation ’ccc"’ – ; run-time – c-addr u ) core-ext,file-ext “s-backslash-quote”
Like S"
, but translates C-like \-escape-sequences, as follows:
\a
BEL (alert), \b
BS, \e
ESC (not in C99), \f
FF, \n
newline, \r
CR, \t
HT, \v
VT, \"
", \\
\, \
[0-7]{1,3} octal numerical character value
(non-standard), \x
[0-9a-f]{0,2} hex numerical character value
(standard only with two digits), \u
[0-9a-f]{4} for unicode
codepoints (auto-merges surrogate pairs), \U
[0-9a-f]{8} for
extended unicode code points; a \
before any other character is
reserved.
Note that \x
XX produces raw bytes, while \u
XXXX and
\U
XXXXXXXX produce code points for the current encoding.
E.g., if we use UTF-8 encoding and want to encode ä (code point
U+00E4), you can write the letter ä itself, or write \xc3\xa4
(the UTF-8 bytes for this code point), \u00e4
, or \U000000e4
.
Note that, unlike in C, \n
produces the preferred newline
sequence for the host OS, which may consist of several chars. I.e.,
"\n"
is equivalent to newline
.
S"
( compilation ’ccc"’ – ; run-time – c-addr u ) core,file “s-quote”
Compilation: Parse a string ccc delimited by a "
(double quote). At run-time, return the length, u, and the
start address, c-addr of the string. Interpretation: parse
the string as before, and return c-addr, u. Gforth
allocate
s the string. The resulting memory leak is usually
not a problem; the exception is if you create strings containing
S"
and evaluate
them; then the leak is not bounded
by the size of the interpreted files and you may want to
free
the strings. Forth-2012 only guarantees two buffers of
80 characters each, so in standard programs you should assume that the
string lives only until the next-but-one s"
.
Likewise, You can get the code xc of a character
C with 'C'
. This way has been standardized since
Forth-2012. An older way to get it is to use one of the following
words:
char
( ’<spaces>ccc’ – c ) core,xchar-ext “char”
Skip leading spaces. Parse the string ccc and return c, the display code representing the first character of ccc.
[char]
( compilation ’<spaces>ccc’ – ; run-time – c ) core,xchar-ext “bracket-char”
Compilation: skip leading spaces. Parse the string ccc. Run-time: return c, the display code representing the first character of ccc. Interpretation semantics for this word are undefined.
You usually use char
outside and [char]
inside colon
definitions, or you just use 'C'
.
Note that, e.g.,
"C" type
is (slightly) more efficient than
'C' xemit
because the latter converts the code point into a sequence of bytes
and individually emit
s them. Similarly, dealing with general
characters is usually more efficient when representing them as strings
rather than code points.
There are the following words for producing commonly-used characters
and strings that cannot be produced with S"
or 'C'
:
newline
( – c-addr u ) gforth-0.5 “newline”
String containing the newline sequence of the host OS
bl
( – c-char ) core “b-l”
c-char is the character value for a space.
#tab
( – c ) gforth-0.2 “number-tab”
#lf
( – c ) gforth-0.2 “number-l-f”
#cr
( – c ) gforth-0.2 “number-c-r”
#ff
( – c ) gforth-0.2 “number-f-f”
#bs
( – c ) gforth-0.2 “number-b-s”
#del
( – c ) gforth-0.2 “number-del”
#bell
( – c ) gforth-0.2 “number-bell”
#esc
( – c ) gforth-0.5 “number-esc”
#eof
( – c ) gforth-0.7 “number-e-o-f”
actually EOT (ASCII code 4 aka ^D
)