From ddf92606ab848617250214428604c903f64b4b84 Mon Sep 17 00:00:00 2001 From: Niels Lohmann Date: Sun, 24 May 2020 21:05:35 +0200 Subject: [PATCH] :memo: add PlantUML --- .../docs/features/binary_formats/index.md | 8 +- doc/mkdocs/docs/features/binary_values.md | 23 +- doc/mkdocs/docs/features/sax_interface.md | 23 ++ doc/mkdocs/docs/features/types.md | 264 ++++++++++++++++++ doc/mkdocs/mkdocs.yml | 8 + doc/mkdocs/requirements.txt | 6 + 6 files changed, 323 insertions(+), 9 deletions(-) create mode 100644 doc/mkdocs/docs/features/types.md diff --git a/doc/mkdocs/docs/features/binary_formats/index.md b/doc/mkdocs/docs/features/binary_formats/index.md index 3583f43c..6d0ff82f 100644 --- a/doc/mkdocs/docs/features/binary_formats/index.md +++ b/doc/mkdocs/docs/features/binary_formats/index.md @@ -2,10 +2,10 @@ Though JSON is a ubiquitous data format, it is not a very compact format suitable for data exchange, for instance over a network. Hence, the library supports -- [BSON](bson) (Binary JSON), -- [CBOR](cbor) (Concise Binary Object Representation), -- [MessagePack](messagepack), and -- [UBJSON](ubjson) (Universal Binary JSON Specification) +- [BSON](bson.md) (Binary JSON), +- [CBOR](cbor.md) (Concise Binary Object Representation), +- [MessagePack](messagepack.md), and +- [UBJSON](ubjson.md) (Universal Binary JSON Specification) to efficiently encode JSON values to byte vectors and to decode such vectors. diff --git a/doc/mkdocs/docs/features/binary_values.md b/doc/mkdocs/docs/features/binary_values.md index e5444a48..764b75be 100644 --- a/doc/mkdocs/docs/features/binary_values.md +++ b/doc/mkdocs/docs/features/binary_values.md @@ -1,11 +1,24 @@ # Binary Values -The library implements several [binary formats](binary_formats/index) that encode JSON in an efficient way. Most of these formats support binary values; that is, values that have semantics define outside the library and only define a sequence of bytes to be stored. +The library implements several [binary formats](binary_formats/index.md) that encode JSON in an efficient way. Most of these formats support binary values; that is, values that have semantics define outside the library and only define a sequence of bytes to be stored. JSON itself does not have a binary value. As such, binary values are an extension that this library implements to store values received by a binary format. Binary values are never created by the JSON parser, and are only part of a serialized JSON text if they have been created manually or via a binary format. ## API for binary values +```plantuml +class json::binary_t { + -- setters -- + +void set_subtype(std::uint8_t subtype) + +void clear_subtype() + -- getters -- + +std::uint8_t subtype() const + +bool has_subtype() const +} + +"std::vector" <|-- json::binary_t +``` + By default, binary values are stored as `std::vector`. This type can be changed by providing a template parameter to the `basic_json` type. To store binary subtypes, the storage type is extended and exposed as `json::binary_t`: ```cpp @@ -105,7 +118,7 @@ JSON does not have a binary type, and this library does not introduce a new type ### BSON -[BSON](binary_formats/bson) supports binary values and subtypes. If a subtype is given, it is used and added as unsigned 8-bit integer. If no subtype is given, the generic binary subtype 0x00 is used. +[BSON](binary_formats/bson.md) supports binary values and subtypes. If a subtype is given, it is used and added as unsigned 8-bit integer. If no subtype is given, the generic binary subtype 0x00 is used. !!! example @@ -145,7 +158,7 @@ JSON does not have a binary type, and this library does not introduce a new type ### CBOR -[CBOR](binary_formats/cbor) supports binary values, but no subtypes. Any binary value will be serialized as byte strings. The library will choose the smallest representation using the length of the byte array. +[CBOR](binary_formats/cbor.md) supports binary values, but no subtypes. Any binary value will be serialized as byte strings. The library will choose the smallest representation using the length of the byte array. !!! example @@ -183,7 +196,7 @@ JSON does not have a binary type, and this library does not introduce a new type ### MessagePack -[MessagePack](binary_formats/messagepack) supports binary values and subtypes. If a subtype is given, the ext family is used. The library will choose the smallest representation among fixext1, fixext2, fixext4, fixext8, ext8, ext16, and ext32. The subtype is then added as singed 8-bit integer. +[MessagePack](binary_formats/messagepack.md) supports binary values and subtypes. If a subtype is given, the ext family is used. The library will choose the smallest representation among fixext1, fixext2, fixext4, fixext8, ext8, ext16, and ext32. The subtype is then added as singed 8-bit integer. If no subtype is given, the bin family (bin8, bin16, bin32) is used. @@ -224,7 +237,7 @@ If no subtype is given, the bin family (bin8, bin16, bin32) is used. ### UBJSON -[UBJSON](binary_formats/ubjson) neither supports binary values nor subtypes, and proposes to serialize binary values as array of uint8 values. This translation is implemented by the library. +[UBJSON](binary_formats/ubjson.md) neither supports binary values nor subtypes, and proposes to serialize binary values as array of uint8 values. This translation is implemented by the library. !!! example diff --git a/doc/mkdocs/docs/features/sax_interface.md b/doc/mkdocs/docs/features/sax_interface.md index 135fc23b..88ed1100 100644 --- a/doc/mkdocs/docs/features/sax_interface.md +++ b/doc/mkdocs/docs/features/sax_interface.md @@ -2,6 +2,29 @@ The library uses a SAX-like interface with the following functions: +```plantuml +class sax { + + {abstract} bool null() + + + {abstract} bool boolean(bool val) + + + {abstract} bool number_integer(number_integer_t val) + + {abstract} bool number_unsigned(number_unsigned_t val) + + + {abstract} bool number_float(number_float_t val, const string_t& s) + + + {abstract} bool string(string_t& val) + + + {abstract} bool start_object(std::size_t elements) + + {abstract} bool end_object() + + {abstract} bool start_array(std::size_t elements) + + {abstract} bool end_array() + + {abstract} bool key(string_t& val) + + + {abstract} bool parse_error(std::size_t position, const std::string& last_token, const detail::exception& ex) +} +``` + ```cpp // called when null is parsed bool null(); diff --git a/doc/mkdocs/docs/features/types.md b/doc/mkdocs/docs/features/types.md new file mode 100644 index 00000000..72a09f31 --- /dev/null +++ b/doc/mkdocs/docs/features/types.md @@ -0,0 +1,264 @@ +# Types + +This page gives an overview how JSON values are stored and how this can be configured. + +## Overview + +By default, JSON values are stored as follows: + +| JSON type | C++ type | +| --------- | -------- | +| object | `std::map` | +| array | `std::vector` | +| null | `std::nullptr_t` | +| string | `std::string` | +| boolean | `bool` | +| number | `std::int64_t`, `std::uint64_t`, and `double` | + +Note there are three different types for numbers - when parsing JSON text, the best fitting type is chosen. + +```plantuml +enum value_t { + null + object + array + string + boolean + number_integer + number_unsigned + number_float + binary + discarded + +} + +class json_value << (U,orchid) >> { + object_t* object + array_t* array + string_t* string + binary_t* binary + boolean_t boolean + number_integer_t number_integer + number_unsigned_t number_unsigned + number_float_t number_float +} + +class basic_json { + value_t m_type + json_value m_value + + typedef object_t + + typedef array_t + + typedef binary_t + + typedef boolean_t + + typedef number_integer_t + + typedef number_unsigned_t + + typedef number_float_t +} + +basic_json .. json_value +basic_json .. value_t +``` + +## Template arguments + +The data types to store a JSON value are derived from the template arguments passed to class `basic_json`: + +```cpp +template< + template class ObjectType = std::map, + template class ArrayType = std::vector, + class StringType = std::string, + class BooleanType = bool, + class NumberIntegerType = std::int64_t, + class NumberUnsignedType = std::uint64_t, + class NumberFloatType = double, + template class AllocatorType = std::allocator, + template class JSONSerializer = adl_serializer, + class BinaryType = std::vector +> +class basic_json; +``` + +Type `json` is an alias for `basic_json<>` and uses the default types. + +From the template arguments, the following types are derived: + +```cpp +using object_comparator_t = std::less<>; +using object_t = ObjectType>>; + +using array_t = ArrayType>; + +using string_t = StringType; + +using boolean_t = BooleanType; + +using number_integer_t = NumberIntegerType; +using number_unsigned_t = NumberUnsignedType; +using number_float_t = NumberFloatType; + +using binary_t = nlohmann::byte_container_with_subtype; +``` + + +## Objects + +[RFC 7159](http://rfc7159.net/rfc7159) describes JSON objects as follows: + +> An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array. + +### Default type + +With the default values for *ObjectType* (`std::map`), *StringType* (`std::string`), and *AllocatorType* (`std::allocator`), the default value for `object_t` is: + +```cpp +std::map< + std::string, // key_type + basic_json, // value_type + std::less<>, // key_compare + std::allocator> // allocator_type +> +``` + +### Behavior + +The choice of `object_t` influences the behavior of the JSON class. With the default type, objects have the following behavior: + +- When all names are unique, objects will be interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. +- When the names within an object are not unique, it is unspecified which one of the values for a given key will be chosen. For instance, `#!json {"key": 2, "key": 1}` could be equal to either `#!json {"key": 1}` or `#!json {"key": 2}`. +- Internally, name/value pairs are stored in lexicographical order of the names. Objects will also be serialized (see `dump`) in this order. For instance, both `#!json {"b": 1, "a": 2}` and `#!json {"a": 2, "b": 1}` will be stored and serialized as `#!json {"a": 2, "b": 1}`. +- When comparing objects, the order of the name/value pairs is irrelevant. This makes objects interoperable in the sense that they will not be affected by these differences. For instance, `#!json {"b": 1, "a": 2}` and `#!json {"a": 2, "b": 1}` will be treated as equal. + +### Key order + +The order name/value pairs are added to the object is *not* preserved by the library. Therefore, iterating an object may return name/value pairs in a different order than they were originally stored. In fact, keys will be traversed in alphabetical order as `std::map` with `std::less` is used by default. Please note this behavior conforms to [RFC 7159](http://rfc7159.net/rfc7159), because any order implements the specified "unordered" nature of JSON objects. + +### Limits + +[RFC 7159](http://rfc7159.net/rfc7159) specifies: + +> An implementation may set limits on the maximum depth of nesting. + +In this class, the object's limit of nesting is not explicitly constrained. However, a maximum depth of nesting may be introduced by the compiler or runtime environment. A theoretical limit can be queried by calling the `max_size` function of a JSON object. + +### Storage + +Objects are stored as pointers in a `basic_json` type. That is, for any access to object values, a pointer of type `object_t*` must be dereferenced. + + +## Arrays + +[RFC 7159](http://rfc7159.net/rfc7159) describes JSON arrays as follows: + +> An array is an ordered sequence of zero or more values. + +### Default type + +With the default values for *ArrayType* (`std::vector`) and *AllocatorType* (`std::allocator`), the default value for `array_t` is: + +```cpp +std::vector< + basic_json, // value_type + std::allocator // allocator_type +> +``` + +### Limits + +[RFC 7159](http://rfc7159.net/rfc7159) specifies: + +> An implementation may set limits on the maximum depth of nesting. + +In this class, the array's limit of nesting is not explicitly constrained. However, a maximum depth of nesting may be introduced by the compiler or runtime environment. A theoretical limit can be queried by calling the `max_size` function of a JSON array. + +### Storage + +Arrays are stored as pointers in a `basic_json` type. That is, for any access to array values, a pointer of type `array_t*` must be dereferenced. + + +## Strings + +[RFC 7159](http://rfc7159.net/rfc7159) describes JSON strings as follows: + +> A string is a sequence of zero or more Unicode characters. + +Unicode values are split by the JSON class into byte-sized characters during deserialization. + +### Default type + +With the default values for *StringType* (`std::string`), the default value for `string_t` is `#!cpp std::string`. + +### Encoding + +Strings are stored in UTF-8 encoding. Therefore, functions like `std::string::size()` or `std::string::length()` return the number of **bytes** in the string rather than the number of characters or glyphs. + +### String comparison + +[RFC 7159](http://rfc7159.net/rfc7159) states: + +> Software implementations are typically required to test names of object members for equality. Implementations that transform the textual representation into sequences of Unicode code units and then perform the comparison numerically, code unit by code unit, are interoperable in the sense that implementations will agree in all cases on equality or inequality of two strings. For example, implementations that compare strings with escaped characters unconverted may incorrectly find that `"a\\b"` and `"a\u005Cb"` are not equal. + +This implementation is interoperable as it does compare strings code unit by code unit. + +### Storage + +String values are stored as pointers in a `basic_json` type. That is, for any access to string values, a pointer of type `string_t*` must be dereferenced. + + +## Booleans + +[RFC 7159](http://rfc7159.net/rfc7159) implicitly describes a boolean as a type which differentiates the two literals `true` and `false`. + +### Default type + +With the default values for *BooleanType* (`#!cpp bool`), the default value for `boolean_t` is `#!cpp bool`. + +### Storage + +Boolean values are stored directly inside a `basic_json` type. + +## Numbers + +[RFC 7159](http://rfc7159.net/rfc7159) describes numbers as follows: + +> The representation of numbers is similar to that used in most programming languages. A number is represented in base 10 using decimal digits. It contains an integer component that may be prefixed with an optional minus sign, which may be followed by a fraction part and/or an exponent part. Leading zeros are not allowed. (...) Numeric values that cannot be represented in the grammar below (such as Infinity and NaN) are not permitted. + +This description includes both integer and floating-point numbers. However, C++ allows more precise storage if it is known whether the number is a signed integer, an unsigned integer or a floating-point number. Therefore, three different types, `number_integer_t`, `number_unsigned_t`, and `number_float_t` are used. + +### Default types + +With the default values for *NumberIntegerType* (`std::int64_t`), the default value for `number_integer_t` is `std::int64_t`. +With the default values for *NumberUnsignedType* (`std::uint64_t`), the default value for `number_unsigned_t` is `std::uint64_t`. +With the default values for *NumberFloatType* (`#!cpp double`), the default value for `number_float_t` is `#!cpp double`. + +### Default behavior + +- The restrictions about leading zeros is not enforced in C++. Instead, leading zeros in integer literals lead to an interpretation as octal number. Internally, the value will be stored as decimal number. For instance, the C++ integer literal `#!c 010` will be serialized to `#!c 8`. During deserialization, leading zeros yield an error. +- Not-a-number (NaN) values will be serialized to `#!json null`. + +### Limits + +[RFC 7159](http://rfc7159.net/rfc7159) specifies: + +> An implementation may set limits on the range and precision of numbers. + +When the default type is used, the maximal integer number that can be stored is `#!c 9223372036854775807` (`INT64_MAX`) and the minimal integer number that can be stored is `#!c -9223372036854775808` (`INT64_MIN`). Integer numbers that are out of range will yield over/underflow when used in a constructor. During deserialization, too large or small integer numbers will be automatically be stored as `number_unsigned_t` or `number_float_t`. + +When the default type is used, the maximal unsigned integer number that can be stored is `#!c 18446744073709551615` (`UINT64_MAX`) and the minimal integer number that can be stored is `#!c 0`. Integer numbers that are out of range will yield over/underflow when used in a constructor. During deserialization, too large or small integer numbers will be automatically be stored as `number_integer_t` or `number_float_t`. + +[RFC 7159](http://rfc7159.net/rfc7159) further states: + +> Note that when such software is used, numbers that are integers and are in the range $[-2^{53}+1, 2^{53}-1]$ are interoperable in the sense that implementations will agree exactly on their numeric values. + +As this range is a subrange of the exactly supported range [`INT64_MIN`, `INT64_MAX`], this class's integer type is interoperable. + +[RFC 7159](http://rfc7159.net/rfc7159) states: + +> This specification allows implementations to set limits on the range and precision of numbers accepted. Since software that implements IEEE 754-2008 binary64 (double precision) numbers is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. + +This implementation does exactly follow this approach, as it uses double precision floating-point numbers. Note values smaller than `#!c -1.79769313486232e+308` and values greater than `#!c 1.79769313486232e+308` will be stored as NaN internally and be serialized to `#!json null`. + +### Storage + +Integer number values, unsigned integer number values, and floating-point number values are stored directly inside a `basic_json` type. diff --git a/doc/mkdocs/mkdocs.yml b/doc/mkdocs/mkdocs.yml index f505c3a1..8e31be07 100644 --- a/doc/mkdocs/mkdocs.yml +++ b/doc/mkdocs/mkdocs.yml @@ -49,6 +49,7 @@ nav: - features/merge_patch.md - features/enum_conversion.md - features/sax_interface.md + - features/types.md - Integration: - integration/index.md - integration/cmake.md @@ -98,6 +99,8 @@ markdown_extensions: - pymdownx.snippets: base_path: docs check_paths: true + - plantuml_markdown: + format: svg plugins: - search: @@ -105,3 +108,8 @@ plugins: - mkdocs-simple-hooks: hooks: on_post_build: "docs.hooks:copy_doxygen" + - minify: + minify_html: true + +extra_javascript: + - https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-MML-AM_CHTML diff --git a/doc/mkdocs/requirements.txt b/doc/mkdocs/requirements.txt index 72a03a9a..b64e9b87 100644 --- a/doc/mkdocs/requirements.txt +++ b/doc/mkdocs/requirements.txt @@ -1,8 +1,11 @@ click>=7.1.2 future>=0.18.2 +htmlmin>=0.1.12 +httplib2>=0.18.1 importlib-metadata>=1.6.0 Jinja2>=2.11.2 joblib>=0.15.1 +jsmin>=2.2.2 livereload>=2.6.1 lunr>=0.5.8 Markdown>=3.2.2 @@ -11,8 +14,11 @@ MarkupSafe>=1.1.1 mkdocs>=1.1.2 mkdocs-material>=5.2.1 mkdocs-material-extensions>=1.0 +mkdocs-minify-plugin>=0.3.0 mkdocs-simple-hooks>=0.1.1 nltk>=3.5 +plantuml>=0.3.0 +plantuml-markdown>=3.2.2 Pygments>=2.6.1 pymdown-extensions>=7.1 PyYAML>=5.3.1