UTF8

class HPS::UTF8

The UTF8 class encapsulates a utf8 encoded array of characters and allows for easy encoding and decoding.

Public Functions

UTF8 &Assign(UTF8 &&in_utf8)

Moves the source UTF8 object to this object. This method is functionally equivalent to the overloaded assignment operator.

Parameters

in_utf8 – The source of the move.

Returns

A reference to this object.

UTF8 &Assign(UTF8 const &in_utf8)

Copies the source UTF8 object to this object. This method is functionally equivalent to the overloaded assignment operator.

Parameters

in_utf8 – The source of the copy.

Returns

A reference to this object.

inline char At(size_t in_index) const

Retrieves the utf8 encoded character at the specified index. This method may split up individual code points.

Returns

The utf8 encoded character array.

void Clear()

Reset all string data.

inline bool Empty() const

Indicates whether this utf8 string is empty.

Returns

true if the UTF8 string is empty, false otherwise.

inline char const *GetBytes() const

Retrieves the raw, utf8 encoded character array.

Returns

The utf8 encoded character array.

size_t GetHash() const

Returns a hash code for the utf8 encoded characters.

Returns

The size_t hash code.

inline size_t GetLength() const

Retrieves the number of bytes in the utf8 encoded string up to but not including the null terminator. This will return 0 if the utf8 object is uninitialized.

Returns

The number of bytes.

inline size_t GetWStrLength() const

Retrieves the number of wide characters in the wchar_t string up to but not including the null terminator. This will return 0 if the utf8 object is uninitialized.

Returns

The number of wide characters.

inline bool IsValid() const

Indicates whether this utf8 string has been initialized.

Returns

true if the UTF8 string has been initialized, false otherwise.

inline operator char const*() const

Allows typecasting to const char * by retrieves the raw, utf8 encoded character array.

Returns

The utf8 encoded character array.

inline bool operator!=(char const *in_utf8) const

This function is used to check a utf8-encoded character string for equivalence to this.

Parameters

in_utf8 – The object to compare to this.

Returns

true if the objects are not equivalent, false otherwise.

inline bool operator!=(UTF8 const &in_utf8) const

This function is used to check an object for equivalence to this.

Parameters

in_utf8 – The object to compare to this.

Returns

true if the objects are not equivalent, false otherwise.

UTF8 operator+(char const *in_utf8) const

Creates a new UTF8 object by appending a utf8 encoded string to the end of this object.

Parameters

in_utf8 – A string, assumed to be utf8 encoded, used as the tail end of the new string.

Returns

A new UTF8 object representing the concatenation of 2 strings.

UTF8 operator+(UTF8 const &in_utf8) const

Creates a new UTF8 object by appending a UTF8 object to the end of this object.

Parameters

in_utf8 – The tail end of the new string.

Returns

A new UTF8 object representing the concatenation of 2 strings.

UTF8 &operator+=(char const *in_utf8)

Appends a utf8 encoded string to the end of this object.

Parameters

in_utf8 – A string, assumed to be utf8 encoded, used as the tail end of the new string.

Returns

A reference to this object.

UTF8 &operator+=(UTF8 const &in_utf8)

Appends a UTF8 object to the end of this object.

Parameters

in_utf8 – The tail end of the new string.

Returns

A reference to this object.

inline UTF8 &operator=(UTF8 &&in_utf8)

The move assignment operator takes control of the underlying data from the source utf8 string.

Parameters

the – source of the move.

inline UTF8 &operator=(UTF8 const &in_utf8)

Copies the source UTF8 object to this object.

Parameters

in_utf8 – The source of the copy.

Returns

A reference to this object.

bool operator==(char const *in_utf8) const

This function is used to check a utf8-encoded character string for equivalence to this.

Parameters

in_utf8 – The object to compare to this.

Returns

true if the objects are equivalent, false otherwise.

bool operator==(UTF8 const &in_utf8) const

This function is used to check an object for equivalence to this.

Parameters

in_utf8 – The object to compare to this.

Returns

true if the objects are equivalent, false otherwise.

inline void Reset()

Resets this object to its initial, uninitialized state.

size_t ToWStr(wchar_t *out_wide_string) const

Decode a utf8 encoded string into a wide character buffer

Parameters

out_wide_string

Returns

the number of wide characters (code points) in the wide string.

size_t ToWStr(WCharArray &out_wide_string) const

Decode a utf8 encoded string into a wide character buffer

Returns

The number of wide characters (code points) in the wide string.

UTF8()

The default constructor creates an empty UTF8 string.

UTF8(char const *in_string, char const *in_locale = 0)

This constructor can be used to encode a string from any known locale to utf8. Be careful not to re-encode a string that’s already utf8 encoded.

Parameters
  • in_string – The string to be encoded.

  • in_locale – A string identifying the source locale of in_string. If none is specified, the default locale on the local machine will be used. If in_string is already utf8 encoded, specify the locale as “utf8” to prevent re-encoding.

UTF8(UTF8 &&in_that)

The move constructor takes control of the underlying data from the source utf8 string.

Parameters

the – source of the move.

UTF8(UTF8 const &in_that)

The copy constructor copies the source utf8 string.

Parameters

in_that – the source to be copied.

UTF8(wchar_t const *in_string)

This constructor can be used to encode a wide character string to utf8.

Parameters

in_string – The string to be encoded.

~UTF8()

A destructor for a UTF8 string.

Friends

inline friend bool operator!=(char const *in_left, UTF8 const &in_right)

This function is used to check a utf8-encoded character string for equivalence to a UTF8 object.

Parameters
  • in_left – A utf8-encoded character string.

  • in_right – A UTF8 object.

Returns

true if the objects are not equivalent, false otherwise.

inline friend bool operator!=(wchar_t const *in_left, UTF8 const &in_right)

This function is used to check a wide character string for equivalence to a UTF8 object.

Parameters
  • in_left – A wide character string.

  • in_right – A UTF8 object.

Returns

true if the objects are not equivalent, false otherwise.

inline friend UTF8 operator+(char const *in_left, UTF8 const &in_right)

Creates a new UTF8 object by appending a UTF8 object to the end of a utf8-encoded character string.

Parameters
  • in_left – A string, assumed to be utf8 encoded, used as the head end of the new string.

  • in_right – A UTF8 object used as the tail end of the new string.

Returns

A new UTF8 object representing the concatenation of 2 strings.

inline friend UTF8 operator+(wchar_t const *in_left, UTF8 const &in_right)

Creates a new UTF8 object by appending a UTF8 object to the end of a wide character string.

Parameters
  • in_left – A wide character string used as the head end of the new string.

  • in_right – A UTF8 object used as the tail end of the new string.

Returns

A new UTF8 object representing the concatenation of 2 strings.

inline friend bool operator==(char const *in_left, UTF8 const &in_right)

This function is used to check a utf8-encoded character string for equivalence to a UTF8 object.

Parameters
  • in_left – A utf8-encoded character string.

  • in_right – A UTF8 object.

Returns

true if the objects are equivalent, false otherwise.

inline friend bool operator==(wchar_t const *in_left, UTF8 const &in_right)

This function is used to check a wide character string for equivalence to a UTF8 object.

Parameters
  • in_left – A wide character string.

  • in_right – A UTF8 object.

Returns

true if the objects are equivalent, false otherwise.