C++: Parsing length prefixed strings

Most programmers are fairly familiar with null-terminated strings: A sequence of characters that is delimited by \0 (null) to indicate the end of the string. However, there are other types of string representations out there. Another fairly common one (especially if you work with Microsoft systems) are length prefixed strings.

Length-prefixed strings are, as the name suggests, prefixed with the string length. These strings are still represented by a pointer pointing to the first character of the string. However, instead of having a trailing null terminator (\0) to indicate the end of the string, these strings instead record the length of the string in the four bytes preceding the first character.

What follows is a C++20 function which can be used to parse null-terminated or length-prefixed strings into an std::string:

/**
 * Function template to convert from raw-strings to std::basic_string.
 *
 * @details If `length_prefixed` is `true`, `raw_string` must adhere to these requirements:
 *             1. It is aligned.
 *             2. It points to the first character of the string.
 *             3. The four bytes preceding the first character express the length of the string in number of bytes.
 *
 * @note All of this is constexpr and can be completely evaluated at compile-time.
 *
 * @tparam CharT The string type. Usually `char` or `wchar_t`.
 * @tparam length_prefixed `true` if the raw string is length-prefixed. `false` if it is null-terminated.
 * @param raw_string The raw string.
 * @return The corresponding std::basic_string.
 */
template<typename CharT, bool length_prefixed>
[[nodiscard]]
static
constexpr
std::basic_string<CharT>
to_cpp_string(const CharT* const raw_string) noexcept
{
    // Length-prefixed string
    if constexpr (length_prefixed)
        return std::basic_string<CharT>(raw_string, *reinterpret_cast<const std::uint32_t*>( raw_string - (sizeof(uint32_t) / sizeof(CharT)) ));

    // Null-terminated string
    else
        return std::basic_string<CharT>(raw_string);
}

Feel free to use this however you see fit. However, as usual this comes without any form of warranty - use at your own risk!

comments powered by Disqus