String class is designed to extend GNU C++ to support
string processing capabilities similar to those in languages like
Awk. The class provides facilities that ought to be convenient
and efficient enough to be useful replacements for
based processing via the C string library (i.e.,
strcmp, etc.) in many applications. Many details about String
representations are described in the Representation section.
SubString class supports substring extraction
and modification operations. This is implemented in a way that
user programs never directly construct or represent substrings,
which are only used indirectly via String operations.
Another separate class,
Regex is also used indirectly via String
operations in support of regular expression searching, matching, and the
like. The Regex class is based entirely on the GNU Emacs regex
functions. See section `Syntax of Regular Expressions' in GNU Emacs Manual, for a full
explanation of regular expression syntax. (For implementation details,
see the internal documentation in files `regex.h' and
Strings are initialized and assigned as in the following examples:
String x; String y = 0; String z = "";
String x = "Hello"; String y("Hello");
String x = 'A'; String y('A');
String u = x; String v(x);
String u = x.at(1,4); String v(x.at(1,4));
String x("abc", 2);
String x = dec(20);
There are no directly accessible forms for declaring SubString variables.
Regex r("[a-zA-Z_][a-zA-Z0-9_]*"); creates
a compiled regular expression suitable for use in String
operations described below. (In this case, one that matches any
C++ identifier). The first argument may also be a String.
Be careful in distinguishing the role of backslashes in quoted
GNU C++ char* constants versus those in Regexes. For example, a Regex
that matches either one or more tabs or all strings beginning
with "ba" and ending with any number of occurrences of "na"
could be declared as
Regex r = "\\(\t+\\)\\|\\(ba\\(na\\)*\\)"
Note that only one backslash is needed to signify the tab, but
two are needed for the parenthesization and virgule, since the
GNU C++ lexical analyzer decodes and strips backslashes before
they are seen by Regex.
There are three additional optional arguments to the Regex constructor that are less commonly useful:
fast (default 0)
fastmay be set to true (1) if the Regex should be "fast-compiled". This causes an additional compilation step that is generally worthwhile if the Regex will be used many times.
bufsize (default max(40, length of the string))
transtable (default none == 0)
As a convenience, several Regexes are predefined and usable in any program. Here are their declarations from `String.h'.
extern Regex RXwhite; // = "[ \n\t]+" extern Regex RXint; // = "-?[0-9]+" extern Regex RXdouble; // = "-?\\(\\([0-9]+\\.[0-9]*\\)\\| // \\([0-9]+\\)\\| // \\(\\.[0-9]+\\)\\) // \\([eE][---+]?[0-9]+\\)?" extern Regex RXalpha; // = "[A-Za-z]+" extern Regex RXlowercase; // = "[a-z]+" extern Regex RXuppercase; // = "[A-Z]+" extern Regex RXalphanum; // = "[0-9A-Za-z]+" extern Regex RXidentifier; // = "[A-Za-z_][A-Za-z0-9_]*"
String class capabilities are best shown via example.
The examples below use the following declarations.
String x = "Hello"; String y = "world"; String n = "123"; String z; char* s = ","; String lft, mid, rgt; Regex r = "e[a-z]*o"; Regex r2("/[a-z]*/"); char c; int i, pos, len; double f; String words; words = "a"; words = "b"; words = "c";
The usual lexicographic relational operators (
==, !=, <, <=, >, >=)
are defined. A functional form
compare(String, String) is also
provided, as is
fcompare(String, String), which compares
Strings without regard for upper vs. lower case.
All other matching and searching operations are based on some form of the
search differ in that
match attempts to match only at the
given starting position, while
search starts at the position, and
then proceeds left or right looking for a match. As seen in the following
examples, the second optional
startpos argument to functions using
search specifies the starting position of the
search: If non-negative, it results in a left-to-right search starting at
startpos, and if negative, a right-to-left search starting
x.length() + startpos. In all cases, the index returned
is that of the beginning of the match, or -1 if there is no match.
Three String functions serve as front ends to
index performs a search, returning the index,
a match, returning nonzero (actually, the length of the match) on success,
contains is a boolean function performing either a search or
match, depending on whether an index argument is provided:
pos = r.search("leo", 3, len, 0)
char*string of length 3, starting at position 0, also placing the length of the match in reference parameter len.
contains, if present, means to match the substring only at that position, and not to search elsewhere in the string.
RXwhiteis a global whitespace Regex.
int f = x.freq("l")
Substrings may be extracted via the
These behave as either lvalues or rvalues.
z = x.at(2, 3)
x.at(2, 2) = "r"
x.at("He") = "je";
x.at("l", -1) = "i";
z = x.at(r)
z = x.before("o")
x.before("ll") = "Bri";
z = x.before(2)
z = x.after("Hel")
z = x.through("el")
z = x.from("el")
x.after("Hel") = "p";
z = x.after(3)
z = " ab c"; z = z.after(RXwhite)
x = 'J';
z = x + s + ' ' + y.at("w") + y.after("w") + ".";
x += y;
cat(x, y, z)
cat(z, y, x, x)
z = replicate(x, 3);
z = join(words, 3, "/")
z = "this string has five words"; i = split(z, words, 10, RXwhite);
int nmatches x.gsub("l","ll")
z = x + y; z.del("loworl");
z = reverse(x)
z = upcase(x)
z = downcase(x)
z = capitalize(x)
x.reverse(), x.upcase(), x.downcase(), x.capitalize()
cout << x
cout << x.at(2, 3)
cin >> x
s = (const char*)x
char*char array. This coercion is useful for sending a String as an argument to any function expecting a
const char*argument (like
File::open). This operator must be used with care, since the conversion returns a pointer to
Stringinternals without copying the characters: The resulting
(char*)is only valid until the next String operation, and you must not modify it. (The conversion is defined to return a const value so that GNU C++ will produce warning and/or error messages if changes are attempted.)