This class can be used to convert text between multiple representations, e.g. More...
Public Types | |
enum | Encoding { EIso8859 = 0, EUtf8 = 1, EUnicode = 2 } |
Public Member Functions | |
TextEncoder () | |
TextEncoder (TextEncoder const copy) | |
appendText (string text) | |
Appends the indicates string to the end of the stored text. | |
appendUnicodeChar (int character) | |
Appends a single character to the end of the stored text. | |
appendWtext (string text) | |
Appends the indicates string to the end of the stored wide-character text. | |
clearText () | |
Removes the text from the TextEncoder. | |
string | decodeText (string text) |
Returns the given wstring decoded to a single-byte string, via the current encoding system. | |
string | encodeWtext (string wtext) |
Encodes a wide-text string into a single-char string, according to the current encoding. | |
string | getEncodedChar (int index) |
Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string. | |
string | getEncodedChar (int index, Encoding encoding) |
Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string. | |
Encoding | getEncoding () |
Returns the encoding by which the string set via set_text() is to be interpreted. | |
int | getNumChars () |
Returns the number of characters in the stored text. | |
string | getText () |
Returns the current text, as encoded via the current encoding system. | |
string | getText (Encoding encoding) |
Returns the current text, as encoded via the indicated encoding system. | |
string | getTextAsAscii () |
Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation. | |
int | getUnicodeChar (int index) |
Returns the Unicode value of the nth character in the stored text. | |
string | getWtext () |
Returns the text associated with the TextEncoder, as a wide-character string. | |
string | getWtextAsAscii () |
bool | hasText () |
bool | isWtext () |
makeLower () | |
makeUpper () | |
setEncoding (Encoding encoding) | |
Specifies how the string set via set_text() is to be interpreted. | |
setText (string text) | |
Changes the text that is stored in the encoder. | |
setText (string text, Encoding encoding) | |
The two-parameter version of set_text() accepts an explicit encoding; the text is immediately decoded and stored as a wide-character string. | |
setUnicodeChar (int index, int character) | |
Sets the Unicode value of the nth character in the stored text. | |
setWtext (string wtext) | |
Changes the text that is stored in the encoder. | |
Static Public Member Functions | |
static string | decodeText (string text, Encoding encoding) |
static string | encodeWchar (wchar_t ch, Encoding encoding) |
static string | encodeWtext (string wtext, Encoding encoding) |
static Encoding | getDefaultEncoding () |
Specifies the default encoding to be used for all subsequently created TextEncoder objects. | |
static string | lower (string source) |
Converts the string to lowercase, assuming the string is encoded in the default encoding. | |
static string | lower (string source, Encoding encoding) |
Converts the string to lowercase, assuming the string is encoded in the indicated encoding. | |
static string | reencodeText (string text, Encoding from, Encoding to) |
Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string. | |
static | setDefaultEncoding (Encoding encoding) |
Specifies the default encoding to be used for all subsequently created TextEncoder objects. | |
static bool | unicodeIsalpha (int character) |
Returns true if the indicated character is an alphabetic letter, false otherwise. | |
static bool | unicodeIsdigit (int character) |
Returns true if the indicated character is a numeric digit, false otherwise. | |
static bool | unicodeIslower (int character) |
Returns true if the indicated character is a lowercase letter, false otherwise. | |
static bool | unicodeIspunct (int character) |
Returns true if the indicated character is a punctuation mark, false otherwise. | |
static bool | unicodeIsspace (int character) |
Returns true if the indicated character is a whitespace letter, false otherwise. | |
static bool | unicodeIsupper (int character) |
Returns true if the indicated character is an uppercase letter, false otherwise. | |
static int | unicodeTolower (int character) |
Returns the uppercase equivalent of the given Unicode character. | |
static int | unicodeToupper (int character) |
Returns the uppercase equivalent of the given Unicode character. | |
static string | upper (string source) |
Converts the string to uppercase, assuming the string is encoded in the default encoding. | |
static string | upper (string source, Encoding encoding) |
Converts the string to uppercase, assuming the string is encoded in the indicated encoding. |
This class can be used to convert text between multiple representations, e.g.
utf-8 to Unicode. You may use it as a static class object, passing the encoding each time, or you may create an instance and use that object, which will record the current encoding and retain the current string.
This class is also a base class of TextNode, which inherits this functionality.
enum Encoding |
TextEncoder | ( | ) |
TextEncoder | ( | TextEncoder const | copy | ) |
appendText | ( | string | text | ) |
Appends the indicates string to the end of the stored text.
Reimplemented in TextNode.
appendUnicodeChar | ( | int | character | ) |
Appends a single character to the end of the stored text.
This may be a wide character, up to 16 bits in Unicode.
appendWtext | ( | string | text | ) |
Appends the indicates string to the end of the stored wide-character text.
Reimplemented in TextNode.
clearText | ( | ) |
Removes the text from the TextEncoder.
Reimplemented in TextNode.
string decodeText | ( | string | text | ) |
Returns the given wstring decoded to a single-byte string, via the current encoding system.
static string decodeText | ( | string | text, |
Encoding | encoding | ||
) | [static] |
static string encodeWchar | ( | wchar_t | ch, |
Encoding | encoding | ||
) | [static] |
string encodeWtext | ( | string | wtext | ) |
Encodes a wide-text string into a single-char string, according to the current encoding.
static string encodeWtext | ( | string | wtext, |
Encoding | encoding | ||
) | [static] |
static Encoding getDefaultEncoding | ( | ) | [static] |
Specifies the default encoding to be used for all subsequently created TextEncoder objects.
See set_encoding().
string getEncodedChar | ( | int | index | ) |
Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.
string getEncodedChar | ( | int | index, |
Encoding | encoding | ||
) |
Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.
Encoding getEncoding | ( | ) |
Returns the encoding by which the string set via set_text() is to be interpreted.
See set_encoding().
int getNumChars | ( | ) |
Returns the number of characters in the stored text.
This is a count of wide characters, after the string has been decoded according to set_encoding().
string getText | ( | ) |
Returns the current text, as encoded via the current encoding system.
Returns the current text, as encoded via the indicated encoding system.
string getTextAsAscii | ( | ) |
Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.
This means replacing accented letters with their unaccented ASCII equivalents.
It is possible that some characters in the string cannot be converted to ASCII. (The string may involve symbols like the copyright symbol, for instance, or it might involve letters in some other alphabet such as Greek or Cyrillic, or even Latin letters like thorn or eth that are not part of the ASCII character set.) In this case, as much of the string as possible will be converted to ASCII, and the nonconvertible characters will remain encoded in the encoding specified by set_encoding().
int getUnicodeChar | ( | int | index | ) |
Returns the Unicode value of the nth character in the stored text.
This may be a wide character (greater than 255), after the string has been decoded according to set_encoding().
string getWtext | ( | ) |
Returns the text associated with the TextEncoder, as a wide-character string.
string getWtextAsAscii | ( | ) |
bool hasText | ( | ) |
bool isWtext | ( | ) |
static string lower | ( | string | source | ) | [static] |
Converts the string to lowercase, assuming the string is encoded in the default encoding.
Converts the string to lowercase, assuming the string is encoded in the indicated encoding.
makeLower | ( | ) |
makeUpper | ( | ) |
static string reencodeText | ( | string | text, |
Encoding | from, | ||
Encoding | to | ||
) | [static] |
Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string.
This does not change or affect any properties on the TextEncoder itself.
static setDefaultEncoding | ( | Encoding | encoding | ) | [static] |
Specifies the default encoding to be used for all subsequently created TextEncoder objects.
See set_encoding().
setEncoding | ( | Encoding | encoding | ) |
Specifies how the string set via set_text() is to be interpreted.
The default, E_iso8859, means a standard string with one-byte characters (i.e. ASCII). Other encodings are possible to take advantage of character sets with more than 256 characters.
This affects only future calls to set_text(); it does not change text that was set previously.
setText | ( | string | text | ) |
Changes the text that is stored in the encoder.
The text should be encoded according to the method indicated by set_encoding(). Subsequent calls to get_text() will return this same string, while get_wtext() will return the decoded version of the string.
Reimplemented in TextNode.
The two-parameter version of set_text() accepts an explicit encoding; the text is immediately decoded and stored as a wide-character string.
Subsequent calls to get_text() will return the same text re-encoded using whichever encoding is specified by set_encoding().
Reimplemented in TextNode.
setUnicodeChar | ( | int | index, |
int | character | ||
) |
Sets the Unicode value of the nth character in the stored text.
This may be a wide character (greater than 255), after the string has been decoded according to set_encoding().
setWtext | ( | string | wtext | ) |
Changes the text that is stored in the encoder.
Subsequent calls to get_wtext() will return this same string, while get_text() will return the encoded version of the string.
Reimplemented in TextNode.
static bool unicodeIsalpha | ( | int | character | ) | [static] |
Returns true if the indicated character is an alphabetic letter, false otherwise.
This is akin to ctype's isalpha(), extended to Unicode.
static bool unicodeIsdigit | ( | int | character | ) | [static] |
Returns true if the indicated character is a numeric digit, false otherwise.
This is akin to ctype's isdigit(), extended to Unicode.
static bool unicodeIslower | ( | int | character | ) | [static] |
Returns true if the indicated character is a lowercase letter, false otherwise.
This is akin to ctype's islower(), extended to Unicode.
static bool unicodeIspunct | ( | int | character | ) | [static] |
Returns true if the indicated character is a punctuation mark, false otherwise.
This is akin to ctype's ispunct(), extended to Unicode.
static bool unicodeIsspace | ( | int | character | ) | [static] |
Returns true if the indicated character is a whitespace letter, false otherwise.
This is akin to ctype's isspace(), extended to Unicode.
static bool unicodeIsupper | ( | int | character | ) | [static] |
Returns true if the indicated character is an uppercase letter, false otherwise.
This is akin to ctype's isupper(), extended to Unicode.
static int unicodeTolower | ( | int | character | ) | [static] |
Returns the uppercase equivalent of the given Unicode character.
This is akin to ctype's tolower(), extended to Unicode.
static int unicodeToupper | ( | int | character | ) | [static] |
Returns the uppercase equivalent of the given Unicode character.
This is akin to ctype's toupper(), extended to Unicode.
static string upper | ( | string | source | ) | [static] |
Converts the string to uppercase, assuming the string is encoded in the default encoding.