6.8.4 String words

Words that are used for memory blocks are also useful for strings, so for words that move, copy, compare and search strings, see Memory Blocks. For words that display characters and strings, see Displaying characters and strings.

The following words work on previously existing strings:

str= ( c-addr1 u1 c-addr2 u2 – f  ) gforth-0.6 “str-equals”
str< ( c-addr1 u1 c-addr2 u2 – f  ) gforth-0.6 “str-less-than”
string-prefix? ( c-addr1 u1 c-addr2 u2 – f  ) gforth-0.6 “string-prefix-question”

Is c-addr2 u2 a prefix of c-addr1 u1?

string-suffix? ( c-addr1 u1 c-addr2 u2 – f  ) gforth-1.0 “string-suffix-question”

Is c-addr2 u2 a suffix of c-addr1 u1?

search ( c-addr1 u1 c-addr2 u2 – c-addr3 u3 flag  ) string “search”

Search the string specified by c-addr1, u1 for the string specified by c-addr2, u2. If flag is true: match was found at c-addr3 with u3 characters remaining. If flag is false: no match was found; c-addr3, u3 are equal to c-addr1, u1.

scan ( c-addr1 u1 c – c-addr2 u2 ) gforth-0.2 “scan”

Skip all characters not equal to c. The result starts with c or is empty. Scan is limited to single-byte (ASCII) characters. Use search to search for multi-byte characters.

scan-back ( c-addr u1 c – c-addr u2  ) gforth-0.7 “scan-back”
skip ( c-addr1 u1 c – c-addr2 u2 ) gforth-0.2 “skip”

Skip all characters equal to c. The result starts with the first non-c character, or it is empty. Scan is limited to single-byte (ASCII) characters.

-trailing ( c_addr u1 – c_addr u2  ) string “dash-trailing”

Adjust the string specified by c-addr, u1 to remove all trailing spaces. u2 is the length of the modified string.

/string ( c-addr1 u1 n – c-addr2 u2 ) string “slash-string”

Adjust the string specified by c-addr1, u1 to remove n characters from the start of the string.

safe/string ( c-addr1 u1 n – c-addr2 u2 ) gforth-1.0 “safe-slash-string”

Adjust the string specified by c-addr1, u1 to remove n characters from the start of the string. Unlike /string, safe/string removes at least 0 and at most u1 characters.

cstring>sstring ( c-addr – c-addr u  ) gforth-0.2 “cstring-to-sstring”

C-addr is the start address of a zero-terminated string, u is its length.

The following words compare case-insensitively for ASCII characters, but case-sensitively for non-ASCII characters (like in lookup in wordlists).

capscompare ( c-addr1 u1 c-addr2 u2 – n ) gforth-0.7 “capscompare”

Compare two strings lexicographically, based on the values of the bytes in the strings, but comparing ASCII characters case-insensitively, and non-ASCII characters case-sensitively and without locale-specific collation order. If they are equal, n is 0; if the first string is smaller, n is -1; if the first string is larger, n is 1.

capsstring-prefix? ( c-addr1 u1 c-addr2 u2 – f  ) gforth-1.0 “capsstring-prefix?”

Like string-prefix?, but case-insensitive for ASCII characters: Is c-addr2 u2 a prefix of c-addr1 u1?

capssearch ( c-addr1 u1 c-addr2 u2 – c-addr3 u3 flag  ) gforth-1.0 “capssearch”

Like search, but case-insensitive for ASCII characters: Search for c-addr2 u2 in c-addr1 u1; flag is true if found.

The following words create or extend strings on the heap:

s+ ( c-addr1 u1 c-addr2 u2 – c-addr u  ) gforth-0.7 “s-plus”

c-addr u is a newly allocated string that contains the concatenation of c-addr1 u1 (first) and c-addr2 u2 (second).

append ( c-addr1 u1 c-addr2 u2 – c-addr u  ) gforth-0.7 “append”

C-addr u is the concatenation of c-addr1 u1 (first) and c-addr2 u2 (second). c-addr1 u1 is an allocated string, and append resizes it (possibly moving it to a new address) to accomodate u characters.

>string-execute ( ... xt – ... addr u  ) gforth-1.0 “>string-execute”

execute xt while the standard output (type, emit, and everything that uses them) is redirected to a string. The resulting string is addr u, which is in allocated memory; it is the responsibility of the caller of >string-execute to free this string.

One could define s+ using >string-execute, as follows:

: s+ ( c-addr1 u1 c-addr2 u2 – c-addr u ) [: 2swap type type ;] >string-execute ;

For concatenating just two strings >string-execute is inefficient, but for concatenating many strings >string-execute can be more efficient.