* removed container interface (iterators and `size()`) because, fundamentally, a uuid is a single entity, an identifier, and not a container of other things
Universally unique identifiers (*uuid*), also known as Globally Unique Identifiers (*guid*s), are commonly used in many types of applications to uniquely identify data. A standard uuid library would benefit developers that currently have to either use operating system specific APIs for creating new uuids or resort to 3rd party libraries, such as *boost::uuid*.
UUIDs are 128-bit numbers that are for most practical purposes unique, without depending on a central registration authority for ensuring their uniqueness. Although the probability of UUID duplication exists, it is negligible. According to Wikipedia, "*for there to be a one in a billion chance of duplication, 103 trillion version 4 UUIDs must be generated.*" UUID is an Internet Engineering Task Force standard described by RFC 4122.
The library proposed on this paper is a light one: it enables developers to generate random and name-based UUIDs, serialize and deserialize UUIDs to and from strings, validate UUIDs and other common operations.
This proposal is a pure library extension. It does not require changes to any standard classes, functions or headers. It does not require any changes in the core language, and it has been implemented in standard C++ as per ISO/IEC 14882:2017. The implementation is available at https://github.com/mariusbancila/stduuid.
* function objects that generate UUIDs, called generators: `basic_uuid_random_generator<T>`, `uuid_random_generator`, `uuid_name_generator`, `uuid_time_generator`
* conversion functions from strings `from_string()` and to strings `std::to_string()`, as well as an overloaded `operator<<` for `std::basic_ostream`
The conversion constructor that takes two forward iterators constructs an `uuid` with the content of the range \[first, last). It requires the range to have exactly 16 elements, otherwise the behaviour is undefined. This constructor follows the conventions of other containers in the standard library.
A nil UUID is a special UUID that has all the bits set to 0. Its canonical textual representation is `00000000-0000-0000-0000-000000000000`. Member function `is_nil()` indicates whether the `uuid` has all the bits set to 0. A nil UUID is created by the default constructor or by parsing the strings `00000000-0000-0000-0000-000000000000` or `{00000000-0000-0000-0000-000000000000}`.
Member functions `variant()` and `version()` allow checking the variant type of the uuid and, respectively, the version type. These are defined by two strongly typed enums called `uuid_variant` and `uuid_version`.
Although it does not make sense to check whether a UUID is less (or less or equal) then another UUID, the overloading of this operator for `uuid` is necessary in order to be able to store `uuid` values in containers such as `std::set` that by default use `operator <` to compare keys. Because operators `==` and `!=` are needed for checking whether two UUIDs are the same (a basic operation for an identifier type) the three-way comparison operator `<=>` is defined so that all comparison operators are provided.
The input argument must have the form `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx` or `{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}` where `x` is a hexadecimal digit. The return value is an `std::optional<uuid>` that contains a valid `uuid` if the parsing completed successful.
The order of the bytes in the input string reflects directly into the internal representation of the UUID. That is, for an input string in the form `"aabbccdd-eeff-gghh-iijj-kkllmmnnoopp"` or `"{aabbccdd-eeff-gghh-iijj-kkllmmnnoopp}"` the internal byte order of the resulted UUID is `aa,bb,cc,dd,ee,ff,gg,hh,ii,jj,kk,ll,mm,nn,oo,pp`.
Non-member template function `to_string()` returns a string with the UUID formatted to the canonical textual representation `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`, where `x` is a lower case hexadecimal digit.
The order of the internal UUID bytes reflects directly into the string bytes order. That is, for a UUID with the internal bytes in the form `aa,bb,cc,dd,ee,ff,gg,hh,ii,jj,kk,ll,mm,nn,oo,pp` the resulted string has the form `"aabbccdd-eeff-gghh-iijj-kkllmmnnoopp"`.
A `std::hash<>` specialization for `uuid` is provided in order to enable the use of `uuid`s in associative unordered containers such as `std::unordered_set`.
**Footnote**: Microsoft is a registered trademark of Microsoft Corporation. This information is given for the convenience of users of this document and does not constitute an endorsement by ISO or IEC of these products.
`basic_uuid_random_generator<T>` is a class template for generating random or pseudo-random UUIDs (version 4, i.e. `uuid_version::random_number_based`).
The type template parameter represents a function object that implements both the [`RandomNumberEngine`](http://en.cppreference.com/w/cpp/concept/UniformRandomBitGenerator) and [`UniformRandomBitGenerator`](http://en.cppreference.com/w/cpp/concept/RandomNumberEngine) concepts.
`basic_uuid_random_generator` can be constructed with a reference or pointer to a an objects that satisfies the `UniformRandomNumberGenerator` requirements.
A type alias `uuid_random_generator` is provided for convenience as `std::mt19937` is probably the preferred choice of a pseudo-random number generator engine in most cases.
The algorithm for generating random uuids is as follows:
* Set the two most significant bits (bits 6 and 7) of the `clock_seq_hi_and_reserved` to zero and one, respectively.
* Set the four most significant bits (bits 12 through 15) of the `time_hi_and_version` to the binary value 0100 (representing version 3 as defined in the section _VII. UUID format specification_).
* Set all the other bits to randomly (or pseudo-randomly) chosen values.
#### Name-base uuid generator
`uuid_name_generator` is a function object that generates new UUIDs from a name and has to be initialized with another UUID.
This generator produces different uuids for the same text represented in different character sets or encodings. In order words, the uuids generated from "jane" and L"jane" are different.
The algorithm for generating name-based uuids is as follows:
* Use a uuid as a namespace identifier for all uuids generated from names in that namespace
* Convert the name to a canonical sequence of octets (as defined by the standards or conventions of its name space); put the name space ID in network byte order.
* Compute the SHA-1 hash of the name space ID concatenated with the name.
* Copy the octects of the hash to the octets of the uuid as follows:
* octets 0 to 3 of the hash to octets 0 to 3 of `time_low field`,
* octets 4 and 5 of the hash to octets 0 and 1 of `time_mid`,
* octets 6 and 7 of the hash to octets 0 and 1 of `time_hi_and_version`
* octet 8 of the hash to `clock_seq_hi_and_reserved`
* octet 9 of the hash to `clock_seq_low`
* octets 10 to 15 of the hash to octets 0 to 5 of `node`
* Set the four most significant bits (bits 12 through 15) of the `time_hi_and_version` field to binary value 0101 (representing version 5 as defined in the section _VII. UUID format specification_).
* Set the two most significant bits (bits 6 and 7) of the `clock_seq_hi_and_reserved` to zero and one, respectively.
* Convert the resulting uuid to local byte order.
#### Time-based uuid generator
`uuid_time_generator` is a function object that generates a time-based UUID as described in the RFC4122 document.
* the timestamp is a 60-bit value, representing the number of 100 nanosecond intervals since 15 October 1582 00:00:000000000.
* the clock sequence is a 14-bit value, that is initially a high-quality pseudo-random value; when the previous value is known, it is simply incremented by one.
* the node identifier is an IEEE 802 MAC address (when multiple are available any could be used); if no such address is available, a pseudo-randomly generated value may be used, in which case the multicast bit (least significant bit of the first byte) is set to 1, this to avoid clashes with legitimate IEEE 802 addresses.
The algorithm for generating time-based uuids is as follows:
* Consider the timestamp to be a 60-bit unsigned integer and the clock sequence to be a 14-bit unsigned integer. Sequentially number the bits in a field, starting with zero for the least significant bit.
* Determine the values for the UTC-based timestamp and clock sequence to be used in the UUID as a 60-bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.
* Copy the bits of the timestamp to the uuid bits as follows, using the same order of significance:
* bits 0 through 31 to the `time_low` field
* bits 32 through 47 to the `time_mid` field
* bits 48 through 59 to the 12 least significant bits (bits 0 through 11) of the `time_hi_and_version` field
* Copy the bits of the clock sequence to the uuid bits as follows, using the same order of significance:
* bits 0 through 7 to the `clock_seq_low` field
* bits 8 through 13 to the 6 least significant bits (bits 0 through 5) of the `clock_seq_hi_and_reserved` field
* Set the four most significant bits (bits 12 through 15) of the `time_hi_and_version` field to the binary value 0001 (representing version 1 as defined in the section _VII. UUID format specification_).
* Set the two most significant bits (bits 6 and 7) of the `clock_seq_hi_and_reserved` to zero and one, respectively.
* Set the `node` field to the 48-bit IEEE address in the same order of significance as the address.
The comparison of UUIDs is not an operation that makes logical sense but it is required for using UUIDs as keys for containers such as `std::set` or `std::map`. The technical specification of the `uuid` class given in the previous section features the three-way operator `<=>` as member of the class. The C++ community, however, is divided on this particular aspect (with long discussions happening in the [ISO C++ proposals forum](https://groups.google.com/a/isocpp.org/forum/#!forum/std-proposals)) between those that want convenience and those that want rigour. Although the RFC 4122 document, as quoted in the next section, does specify rules for lexical equivalence, not everybody is happy with the definition of comparison operator `<`.
The alternative put forward is to define non-member ordering functors. In this case, the library can define the following functors:
*`uuid_lexicographical_order` performs a lexicographic comparison (can provide compatibility with other systems)
*`uuid_fast_order` performs a memberwise comparison for efficiency.
These could be used as the comparison function for containers such as `std::set` or `std::map`:
The content of this section is copied from the RFC 4122 document that describes the UUID standard.
The UUID format is 16 octets; some bits of the eight octet variant field determine the layout of the UUID. That is, the interpretation of all other bits in the UUID depends on the setting of the bits in the variant field. The variant field consists of a variable number of the most significant bits of octet 8 of the UUID. The following table lists the contents of the variant field, where the letter "x" indicates a "don't-care" value.
| Msb0 | Msb1 | Msb2 | Description |
| ---- | ---- | ---- | ----------- |
| 0 | x | x | Reserved, NCS backward compatibility. |
| 1 | 0 | x | The variant specified in this document. |
Only UUIDs of the variant with the bit pattern 10x are considered in this document. All the other references to UUIDs in this document refer to this variant. For simplicity, we will call this the RFC variant UUID.
The UUID record definition is defined only in terms of fields that are integral numbers of octets. The fields are presented with the most significant one first.
| Field | Data Type | Octet number | Note |
| ----- | --------- | ------------ | ---- |
| `time_low` | unsigned 32 bit integer | 0-3 | The low field of the timestamp |
| `time_mid` | unsigned 16 bit integer | 4-5 | The middle field of the timestamp |
| `time_hi_and_version` | unsigned 16 bit integer | 6-7 | The high field of the timestamp multiplexed with the version number |
| `clock_seq_hi_and_reserved` | unsigned 8 bit integer | 8 | The high field of the clock sequence multiplexed with the variant |
| `clock_seq_low` | unsigned 8 bit integer | 9 | The low field of the clock sequence |
| `node` | unsigned 48 bit integer | 10-15 | The spatially unique node identifier |
In the absence of explicit application or presentation protocol specification to the contrary, a UUID is encoded as a 128-bit object, as follows:
The fields are encoded as 16 octets, with the sizes and order of the fields defined above, and with each field encoded with the Most Significant Byte first (known as network byte order).
The version number is in the most significant 4 bits of the timestamp (bits 4 through 7 of the `time_hi_and_version` field). The following table lists the currently-defined versions for the RFC variant.
Consider each field of the UUID to be an unsigned integer as shown in the table in the section above. Then, to compare a pair of UUIDs, arithmetically compare the corresponding fields from each UUID in order of significance and according to their data type. Two UUIDs are equal if and only if all the corresponding fields are equal.
As an implementation note, equality comparison can be performed on many systems by doing the appropriate byte-order canonicalization, and then treating the two UUIDs as 128-bit unsigned integers.
UUIDs, as defined in this document, can also be ordered lexicographically. For a pair of UUIDs, the first one follows the second if the most significant field in which the UUIDs differ is greater for the first UUID. The second precedes the first if the most significant field in which the UUIDs differ is greater for the second UUID.
Because template argument deduction does not work, both [1] and [2] would produce compiler errors. Instead, it should be one of the following, neither being desirable.