String and character literals (Gforth Manual)

Next: String words, Previous: String representations, Up: Strings and Characters [Contents][Index]

6.8.3 String and Character literals ¶

The nicest way to write a string literal is to write it as "STRING". You can use the same \-escapes inside as for s\". However, this way is non-standard, so you may want to use one of the following words for improved portability:

s\" ( compilation ’ccc"’ – ; run-time – c-addr u  ) core-ext,file-ext “s-backslash-quote”

Like S", but translates C-like \-escape-sequences, as follows: \a BEL (alert), \b BS, \e ESC (not in C99), \f FF, \n newline, \r CR, \t HT, \v VT, \" ", \\ \, \[0-7]{1,3} octal numerical character value (non-standard), \x[0-9a-f]{0,2} hex numerical character value (standard only with two digits), \u[0-9a-f]{4} for unicode codepoints (auto-merges surrogate pairs), \U[0-9a-f]{8} for extended unicode code points; a \ before any other character is reserved.
Note that \xXX produces raw bytes, while \uXXXX and \UXXXXXXXX produce code points for the current encoding. E.g., if we use UTF-8 encoding and want to encode ä (code point U+00E4), you can write the letter ä itself, or write \xc3\xa4 (the UTF-8 bytes for this code point), \u00e4, or \U000000e4.
Note that, unlike in C, \n produces the preferred newline sequence for the host OS, which may consist of several chars. I.e., "\n" is equivalent to newline.

S" ( compilation ’ccc"’ – ; run-time – c-addr u  ) core,file “s-quote”

Compilation: Parse a string ccc delimited by a " (double quote). At run-time, return the length, u, and the start address, c-addr of the string. Interpretation: parse the string as before, and return c-addr, u. Gforth allocates the string. The resulting memory leak is usually not a problem; the exception is if you create strings containing S" and evaluate them; then the leak is not bounded by the size of the interpreted files and you may want to free the strings. Forth-2012 only guarantees two buffers of 80 characters each, so in standard programs you should assume that the string lives only until the next-but-one s".

Likewise, You can get the code xc of a character C with 'C'. This way has been standardized since Forth-2012. An older way to get it is to use one of the following words:

char ( ’<spaces>ccc’ – c  ) core,xchar-ext “char”

Skip leading spaces. Parse the string ccc and return c, the display code representing the first character of ccc.

[char] ( compilation ’<spaces>ccc’ – ; run-time – c  ) core,xchar-ext “bracket-char”

Compilation: skip leading spaces. Parse the string ccc. Run-time: return c, the display code representing the first character of ccc. Interpretation semantics for this word are undefined.

You usually use char outside and [char] inside colon definitions, or you just use 'C'.

Note that, e.g.,

"C" type

is (slightly) more efficient than

'C' xemit

because the latter converts the code point into a sequence of bytes and individually emits them. Similarly, dealing with general characters is usually more efficient when representing them as strings rather than code points.

There are the following words for producing commonly-used characters and strings that cannot be produced with S" or 'C':

newline ( – c-addr u ) gforth-0.5 “newline”

String containing the newline sequence of the host OS

bl ( – c-char  ) core “b-l”

c-char is the character value for a space.

#tab ( – c  ) gforth-0.2 “number-tab”

#lf ( – c  ) gforth-0.2 “number-l-f”

#cr ( – c  ) gforth-0.2 “number-c-r”

#ff ( – c  ) gforth-0.2 “number-f-f”

#bs ( – c  ) gforth-0.2 “number-b-s”

#del ( – c  ) gforth-0.2 “number-del”

#bell ( – c  ) gforth-0.2 “number-bell”

#esc ( – c  ) gforth-0.5 “number-esc”

#eof ( – c  ) gforth-0.7 “number-e-o-f”

actually EOT (ASCII code 4 aka ^D)