This class can be used to convert text between multiple representations, e.g. More...

#include "textEncoder.h"

Inheritance diagram for TextEncoder:

Public Types
enum	Encoding { E_iso8859, E_utf8, E_utf16be, E_unicode = E_utf16be }

Public Member Functions
	TextEncoder (const TextEncoder &copy)

void	append_text (const std::string &text)
	Appends the indicates string to the end of the stored text. More...

void	append_unicode_char (char32_t character)
	Appends a single character to the end of the stored text. More...

void	append_wtext (const std::wstring &text)
	Appends the indicates string to the end of the stored wide-character text. More...

void	clear_text ()
	Removes the text from the TextEncoder. More...

std::wstring	decode_text (const std::string &text) const
	Returns the given wstring decoded to a single-byte string, via the current encoding system. More...

std::string	encode_wtext (const std::wstring &wtext) const
	Encodes a wide-text string into a single-char string, according to the current encoding. More...

std::string	get_encoded_char (size_t index) const
	Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string. More...

std::string	get_encoded_char (size_t index, Encoding encoding) const
	Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string. More...

Encoding	get_encoding () const
	Returns the encoding by which the string set via set_text() is to be interpreted. More...

size_t	get_num_chars () const
	Returns the number of characters in the stored text. More...

std::string	get_text () const

std::string	get_text (Encoding encoding) const

std::string	get_text_as_ascii () const
	Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation. More...

int	get_unicode_char (size_t index) const
	Returns the Unicode value of the nth character in the stored text. More...

const std::wstring &	get_wtext () const
	Returns the text associated with the TextEncoder, as a wide-character string. More...

std::wstring	get_wtext_as_ascii () const
	Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation. More...

bool	has_text () const

bool	is_wtext () const
	Returns true if any of the characters in the string returned by get_wtext() are out of the range of an ASCII character (and, therefore, get_wtext() should be called in preference to get_text()). More...

void	make_lower ()
	Adjusts the text stored within the encoder to all lowercase letters (preserving accent marks correctly). More...

void	make_upper ()
	Adjusts the text stored within the encoder to all uppercase letters (preserving accent marks correctly). More...

void	set_encoding (Encoding encoding)
	Specifies how the string set via set_text() is to be interpreted. More...

void	set_text (const std::string &text)

void	set_text (const std::string &text, Encoding encoding)

void	set_unicode_char (size_t index, char32_t character)
	Sets the Unicode value of the nth character in the stored text. More...

void	set_wtext (const std::wstring &wtext)
	Changes the text that is stored in the encoder. More...

Static Public Member Functions
static std::wstring	decode_text (const std::string &text, Encoding encoding)
	Returns the given wstring decoded to a single-byte string, via the given encoding system. More...

static std::string	encode_wchar (char32_t ch, Encoding encoding)
	Encodes a single Unicode character into a one-, two-, three-, or four-byte string, according to the given encoding system. More...

static std::string	encode_wtext (const std::wstring &wtext, Encoding encoding)
	Encodes a wide-text string into a single-char string, according to the given encoding. More...

static Encoding	get_default_encoding ()

static std::string	lower (const std::string &source)
	Converts the string to lowercase, assuming the string is encoded in the default encoding. More...

static std::string	lower (const std::string &source, Encoding encoding)
	Converts the string to lowercase, assuming the string is encoded in the indicated encoding. More...

static std::string	reencode_text (const std::string &text, Encoding from, Encoding to)
	Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string. More...

static void	set_default_encoding (Encoding encoding)

static bool	unicode_isalpha (char32_t character)
	Returns true if the indicated character is an alphabetic letter, false otherwise. More...

static bool	unicode_isdigit (char32_t character)
	Returns true if the indicated character is a numeric digit, false otherwise. More...

static bool	unicode_islower (char32_t character)
	Returns true if the indicated character is a lowercase letter, false otherwise. More...

static bool	unicode_ispunct (char32_t character)
	Returns true if the indicated character is a punctuation mark, false otherwise. More...

static bool	unicode_isspace (char32_t character)
	Returns true if the indicated character is a whitespace letter, false otherwise. More...

static bool	unicode_isupper (char32_t character)
	Returns true if the indicated character is an uppercase letter, false otherwise. More...

static int	unicode_tolower (char32_t character)
	Returns the uppercase equivalent of the given Unicode character. More...

static int	unicode_toupper (char32_t character)
	Returns the uppercase equivalent of the given Unicode character. More...

static std::string	upper (const std::string &source)
	Converts the string to uppercase, assuming the string is encoded in the default encoding. More...

static std::string	upper (const std::string &source, Encoding encoding)
	Converts the string to uppercase, assuming the string is encoded in the indicated encoding. More...

Public Attributes
	get_default_encoding
	Specifies the default encoding to be used for all subsequently created TextEncoder objects. More...

	get_text
	Returns the current text, as encoded via the current encoding system. More...

	set_default_encoding
	Specifies the default encoding to be used for all subsequently created TextEncoder objects. More...

	set_text
	Changes the text that is stored in the encoder. More...

Detailed Description

This class can be used to convert text between multiple representations, e.g.

UTF-8 to UTF-16. You may use it as a static class object, passing the encoding each time, or you may create an instance and use that object, which will record the current encoding and retain the current string.

This class is also a base class of TextNode, which inherits this functionality.

Definition at line 33 of file textEncoder.h.

Member Function Documentation

◆ append_text()

void TextEncoder::append_text ( const std::string & text )

inline

Appends the indicates string to the end of the stored text.

Definition at line 159 of file textEncoder.I.

References get_text.

◆ append_unicode_char()

void TextEncoder::append_unicode_char ( char32_t character )

inline

Appends a single character to the end of the stored text.

This may be a wide character, up to 16 bits in Unicode.

Definition at line 172 of file textEncoder.I.

References get_wtext().

◆ append_wtext()

void TextEncoder::append_wtext ( const std::wstring & text )

inline

Appends the indicates string to the end of the stored wide-character text.

Definition at line 468 of file textEncoder.I.

References get_wtext().

◆ clear_text()

void TextEncoder::clear_text ( )

inline

Removes the text from the TextEncoder.

Definition at line 116 of file textEncoder.I.

◆ decode_text() [1/2]

std::wstring TextEncoder::decode_text ( const std::string & text ) const

inline

Returns the given wstring decoded to a single-byte string, via the current encoding system.

Definition at line 490 of file textEncoder.I.

Referenced by TextNode::calc_width(), get_wtext(), ButtonEvent::read_datagram(), reencode_text(), and WinGraphicsWindow::set_properties_now().

◆ decode_text() [2/2]

wstring TextEncoder::decode_text	(	const std::string &	text,
		TextEncoder::Encoding	encoding
	)

static

Returns the given wstring decoded to a single-byte string, via the given encoding system.

Definition at line 222 of file textEncoder.cxx.

◆ encode_wchar()

string TextEncoder::encode_wchar	(	char32_t	ch,
		TextEncoder::Encoding	encoding
	)

static

Encodes a single Unicode character into a one-, two-, three-, or four-byte string, according to the given encoding system.

Definition at line 116 of file textEncoder.cxx.

References UnicodeLatinMap::look_up().

◆ encode_wtext() [1/2]

std::string TextEncoder::encode_wtext ( const std::wstring & wtext ) const

inline

Encodes a wide-text string into a single-char string, according to the current encoding.

Definition at line 481 of file textEncoder.I.

Referenced by MouseWatcherParameter::get_candidate_string_encoded(), get_encoded_char(), get_text_as_ascii(), TextNode::get_wordwrapped_text(), reencode_text(), and ButtonEvent::write_datagram().

◆ encode_wtext() [2/2]

string TextEncoder::encode_wtext	(	const std::wstring &	wtext,
		TextEncoder::Encoding	encoding
	)

static

Encodes a wide-text string into a single-char string, according to the given encoding.

Definition at line 190 of file textEncoder.cxx.

◆ get_encoded_char() [1/2]

std::string TextEncoder::get_encoded_char ( size_t index ) const

inline

Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.

Definition at line 237 of file textEncoder.I.

References get_encoding().

◆ get_encoded_char() [2/2]

std::string TextEncoder::get_encoded_char	(	size_t	index,
		TextEncoder::Encoding	encoding
	)		const

inline

Returns the nth char of the stored text, as a one-, two-, or three-byte encoded string.

Definition at line 246 of file textEncoder.I.

References encode_wtext(), and get_unicode_char().

◆ get_encoding()

TextEncoder::Encoding TextEncoder::get_encoding ( ) const

inline

Returns the encoding by which the string set via set_text() is to be interpreted.

See set_encoding().

Definition at line 60 of file textEncoder.I.

Referenced by get_encoded_char().

◆ get_num_chars()

size_t TextEncoder::get_num_chars ( ) const

inline

Returns the number of characters in the stored text.

This is a count of wide characters, after the string has been decoded according to set_encoding().

Definition at line 199 of file textEncoder.I.

References get_wtext().

◆ get_text_as_ascii()

std::string TextEncoder::get_text_as_ascii ( ) const

inline

Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.

This means replacing accented letters with their unaccented ASCII equivalents.

It is possible that some characters in the string cannot be converted to ASCII. (The string may involve symbols like the copyright symbol, for instance, or it might involve letters in some other alphabet such as Greek or Cyrillic, or even Latin letters like thorn or eth that are not part of the ASCII character set.) In this case, as much of the string as possible will be converted to ASCII, and the nonconvertible characters will remain encoded in the encoding specified by set_encoding().

Definition at line 265 of file textEncoder.I.

References encode_wtext(), and get_wtext_as_ascii().

◆ get_unicode_char()

int TextEncoder::get_unicode_char ( size_t index ) const

inline

Returns the Unicode value of the nth character in the stored text.

This may be a wide character (greater than 255), after the string has been decoded according to set_encoding().

Definition at line 209 of file textEncoder.I.

References get_wtext().

Referenced by get_encoded_char().

◆ get_wtext()

const std::wstring & TextEncoder::get_wtext ( ) const

inline

Returns the text associated with the TextEncoder, as a wide-character string.

Definition at line 456 of file textEncoder.I.

References decode_text().

Referenced by append_unicode_char(), append_wtext(), PNMTextMaker::calc_width(), PNMTextMaker::generate_into(), Filename::get_fullpath_w(), get_num_chars(), get_unicode_char(), get_wtext_as_ascii(), is_wtext(), make_lower(), make_upper(), set_encoding(), set_unicode_char(), and Filename::to_os_specific_w().

◆ get_wtext_as_ascii()

wstring TextEncoder::get_wtext_as_ascii ( ) const

Returns the text associated with the node, converted as nearly as possible to a fully-ASCII representation.

This means replacing accented letters with their unaccented ASCII equivalents.

It is possible that some characters in the string cannot be converted to ASCII. (The string may involve symbols like the copyright symbol, for instance, or it might involve letters in some other alphabet such as Greek or Cyrillic, or even Latin letters like thorn or eth that are not part of the ASCII character set.) In this case, as much of the string as possible will be converted to ASCII, and the nonconvertible characters will remain in their original form.

Definition at line 70 of file textEncoder.cxx.

References get_wtext(), and UnicodeLatinMap::look_up().

Referenced by get_text_as_ascii().

◆ is_wtext()

bool TextEncoder::is_wtext ( ) const

Returns true if any of the characters in the string returned by get_wtext() are out of the range of an ASCII character (and, therefore, get_wtext() should be called in preference to get_text()).

Definition at line 99 of file textEncoder.cxx.

References get_wtext().

◆ lower() [1/2]

std::string TextEncoder::lower ( const std::string & source )

inlinestatic

Converts the string to lowercase, assuming the string is encoded in the default encoding.

Definition at line 420 of file textEncoder.I.

References get_default_encoding.

◆ lower() [2/2]

std::string TextEncoder::lower	(	const std::string &	source,
		TextEncoder::Encoding	encoding
	)

inlinestatic

Converts the string to lowercase, assuming the string is encoded in the indicated encoding.

Definition at line 429 of file textEncoder.I.

References get_text, make_lower(), set_encoding(), and set_text.

◆ make_lower()

void TextEncoder::make_lower ( )

Adjusts the text stored within the encoder to all lowercase letters (preserving accent marks correctly).

Definition at line 46 of file textEncoder.cxx.

References get_wtext(), and unicode_tolower().

Referenced by lower().

◆ make_upper()

void TextEncoder::make_upper ( )

Adjusts the text stored within the encoder to all uppercase letters (preserving accent marks correctly).

Definition at line 31 of file textEncoder.cxx.

References get_wtext(), and unicode_toupper().

Referenced by upper().

◆ reencode_text()

std::string TextEncoder::reencode_text	(	const std::string &	text,
		TextEncoder::Encoding	from,
		TextEncoder::Encoding	to
	)

inlinestatic

Given the indicated text string, which is assumed to be encoded via the encoding "from", decodes it and then reencodes it into the encoding "to", and returns the newly encoded string.

This does not change or affect any properties on the TextEncoder itself.

Definition at line 276 of file textEncoder.I.

References decode_text(), and encode_wtext().

◆ set_encoding()

void TextEncoder::set_encoding ( TextEncoder::Encoding encoding )

inline

Specifies how the string set via set_text() is to be interpreted.

The default, E_iso8859, means a standard string with one-byte characters (i.e. ASCII). Other encodings are possible to take advantage of character sets with more than 256 characters.

This affects only future calls to set_text(); it does not change text that was set previously.

Definition at line 48 of file textEncoder.I.

References get_text, and get_wtext().

Referenced by Filename::from_os_specific_w(), Filename::get_fullpath_w(), lower(), Filename::scan_directory(), Filename::to_os_long_name(), Filename::to_os_short_name(), Filename::to_os_specific_w(), and upper().

◆ set_unicode_char()

void TextEncoder::set_unicode_char	(	size_t	index,
		char32_t	character
	)

inline

Sets the Unicode value of the nth character in the stored text.

This may be a wide character (greater than 255), after the string has been decoded according to set_encoding().

Definition at line 223 of file textEncoder.I.

References get_wtext().

◆ set_wtext()

void TextEncoder::set_wtext ( const std::wstring & wtext )

inline

Changes the text that is stored in the encoder.

Subsequent calls to get_wtext() will return this same string, while get_text() will return the encoded version of the string.

Definition at line 443 of file textEncoder.I.

Referenced by Filename::from_os_specific_w(), operator<<(), Filename::scan_directory(), Filename::to_os_long_name(), and Filename::to_os_short_name().

◆ unicode_isalpha()

bool TextEncoder::unicode_isalpha ( char32_t character )

inlinestatic

Returns true if the indicated character is an alphabetic letter, false otherwise.

This is akin to ctype's isalpha(), extended to Unicode.

Definition at line 286 of file textEncoder.I.

References UnicodeLatinMap::look_up().

◆ unicode_isdigit()

bool TextEncoder::unicode_isdigit ( char32_t character )

inlinestatic

Returns true if the indicated character is a numeric digit, false otherwise.

This is akin to ctype's isdigit(), extended to Unicode.

Definition at line 300 of file textEncoder.I.

References UnicodeLatinMap::look_up().

◆ unicode_islower()

bool TextEncoder::unicode_islower ( char32_t character )

inlinestatic

Returns true if the indicated character is a lowercase letter, false otherwise.

This is akin to ctype's islower(), extended to Unicode.

Definition at line 359 of file textEncoder.I.

References UnicodeLatinMap::look_up().

◆ unicode_ispunct()

bool TextEncoder::unicode_ispunct ( char32_t character )

inlinestatic

Returns true if the indicated character is a punctuation mark, false otherwise.

This is akin to ctype's ispunct(), extended to Unicode.

Definition at line 315 of file textEncoder.I.

References UnicodeLatinMap::look_up().

◆ unicode_isspace()

bool TextEncoder::unicode_isspace ( char32_t character )

inlinestatic

Returns true if the indicated character is a whitespace letter, false otherwise.

This is akin to ctype's isspace(), extended to Unicode.

Definition at line 342 of file textEncoder.I.

Referenced by extract_words(), trim(), trim_left(), and trim_right().

◆ unicode_isupper()

bool TextEncoder::unicode_isupper ( char32_t character )

inlinestatic

Returns true if the indicated character is an uppercase letter, false otherwise.

This is akin to ctype's isupper(), extended to Unicode.

Definition at line 329 of file textEncoder.I.

References UnicodeLatinMap::look_up().

◆ unicode_tolower()

int TextEncoder::unicode_tolower ( char32_t character )

inlinestatic

Returns the uppercase equivalent of the given Unicode character.

This is akin to ctype's tolower(), extended to Unicode.

Definition at line 385 of file textEncoder.I.

References UnicodeLatinMap::look_up().

Referenced by make_lower().

◆ unicode_toupper()

int TextEncoder::unicode_toupper ( char32_t character )

inlinestatic

Returns the uppercase equivalent of the given Unicode character.

This is akin to ctype's toupper(), extended to Unicode.

Definition at line 372 of file textEncoder.I.

References UnicodeLatinMap::look_up().

Referenced by make_upper().

◆ upper() [1/2]

std::string TextEncoder::upper ( const std::string & source )

inlinestatic

Converts the string to uppercase, assuming the string is encoded in the default encoding.

Definition at line 398 of file textEncoder.I.

References get_default_encoding.

◆ upper() [2/2]

std::string TextEncoder::upper	(	const std::string &	source,
		TextEncoder::Encoding	encoding
	)

inlinestatic

Converts the string to uppercase, assuming the string is encoded in the indicated encoding.

Definition at line 407 of file textEncoder.I.

References get_text, make_upper(), set_encoding(), and set_text.

Member Data Documentation

◆ get_default_encoding

TextEncoder::Encoding TextEncoder::get_default_encoding

inline

Specifies the default encoding to be used for all subsequently created TextEncoder objects.

See set_encoding().

Definition at line 54 of file textEncoder.h.

Referenced by lower(), and upper().

◆ get_text

std::string TextEncoder::get_text

inline

Returns the current text, as encoded via the current encoding system.

Returns the current text, as encoded via the indicated encoding system.

Definition at line 124 of file textEncoder.h.

Referenced by append_text(), Filename::from_os_specific_w(), lower(), operator<<(), Filename::scan_directory(), set_encoding(), Filename::to_os_long_name(), Filename::to_os_short_name(), and upper().

◆ set_default_encoding

void TextEncoder::set_default_encoding

inline

Specifies the default encoding to be used for all subsequently created TextEncoder objects.

See set_encoding().

Definition at line 54 of file textEncoder.h.

◆ set_text

void TextEncoder::set_text

inline

Changes the text that is stored in the encoder.

The two-parameter version of set_text() accepts an explicit encoding; the text is immediately decoded and stored as a wide-character string.

The text should be encoded according to the method indicated by set_encoding(). Subsequent calls to get_text() will return this same string, while get_wtext() will return the decoded version of the string.

Subsequent calls to get_text() will return the same text re-encoded using whichever encoding is specified by set_encoding().

Definition at line 124 of file textEncoder.h.

Referenced by PNMTextMaker::calc_width(), PNMTextMaker::generate_into(), Filename::get_fullpath_w(), lower(), PGButton::setup(), Filename::to_os_specific_w(), and upper().

The documentation for this class was generated from the following files:

dtool/src/dtoolutil/textEncoder.h
dtool/src/dtoolutil/textEncoder.cxx
dtool/src/dtoolutil/textEncoder.I

Public Types

Public Member Functions

Static Public Member Functions

Public Attributes

Detailed Description

Member Function Documentation

◆ append_text()

◆ append_unicode_char()

◆ append_wtext()

◆ clear_text()

◆ decode_text() [1/2]

◆ decode_text() [2/2]

◆ encode_wchar()

◆ encode_wtext() [1/2]

◆ encode_wtext() [2/2]

◆ get_encoded_char() [1/2]

◆ get_encoded_char() [2/2]

◆ get_encoding()

◆ get_num_chars()

◆ get_text_as_ascii()

◆ get_unicode_char()

◆ get_wtext()

◆ get_wtext_as_ascii()

◆ is_wtext()

◆ lower() [1/2]

◆ lower() [2/2]

◆ make_lower()

◆ make_upper()

◆ reencode_text()

◆ set_encoding()

◆ set_unicode_char()

◆ set_wtext()

◆ unicode_isalpha()

◆ unicode_isdigit()

◆ unicode_islower()

◆ unicode_ispunct()

◆ unicode_isspace()

◆ unicode_isupper()

◆ unicode_tolower()

◆ unicode_toupper()

◆ upper() [1/2]

◆ upper() [2/2]

Member Data Documentation

◆ get_default_encoding

◆ get_text

◆ set_default_encoding

◆ set_text