📝 added documentation for binary formats

2018-01-30 23:20:36 +01:00 · 2018-01-30 23:20:36 +01:00 · ce273af9b6
commit ce273af9b6
parent 8049442c2a
4 changed files with 175 additions and 2 deletions
--- a/doc/Doxyfile
+++ b/doc/Doxyfile
@ -109,7 +109,8 @@ WARN_LOGFILE           =
 #---------------------------------------------------------------------------
 INPUT                  = ../src/json.hpp \
                         index.md \
-                         faq.md
+                         faq.md \
+                         binary_formats.md
 INPUT_ENCODING         = UTF-8
 FILE_PATTERNS          =
 RECURSIVE              = NO
--- a/doc/binary_formats.md
+++ b/doc/binary_formats.md
@ -0,0 +1,172 @@
+# Binary formats
+
+![conversion between JSON and binary formats](images/binary.png)
+
+Several formats exist that encode JSON values in a binary format to reduce the size of the encoded value as well as the required effort to parse encoded value. The library implements three formats, namely
+
+- [CBOR](https://tools.ietf.org/html/rfc7049) (Concise Binary Object Representation)
+- [MessagePack](https://msgpack.org)
+- [UBJSON](http://ubjson.org) (Universal Binary JSON)
+
+## Interface
+
+### JSON to binary format
+
+For each format, the `to_*` functions (i.e., `to_cbor`, `to_msgpack`, and `to_ubjson`) convert a JSON value into the respective binary format. Taking CBOR as example, the concrete prototypes are:
+
+```cpp
+static std::vector<uint8_t> to_cbor(const basic_json& j);                    // 1
+static void to_cbor(const basic_json& j, detail::output_adapter<uint8_t> o); // 2
+static void to_cbor(const basic_json& j, detail::output_adapter<char> o);    // 3
+```
+
+The first function creates a byte vector from the given JSON value. The second and third function writes to an output adapter of `uint8_t` and `char`, respectively. Output adapters are implemented for strings, output streams, and vectors.
+
+Given a JSON value `j`, the following calls are possible:
+
+```cpp
+std::vector<uint8_t> v;
+v = json::to_cbor(j);   // 1
+
+json::to_cbor(j, v);    // 2
+
+std::string s;
+json::to_cbor(j, s);    // 3
+
+std::ostringstream oss;
+json::to_cbor(j, oss);  // 3
+```
+
+### Binary format to JSON
+
+Likewise, the `from_*` functions (i.e, `from_cbor`, `from_msgpack`, and `from_ubjson`) convert a binary encoded value into a JSON value. Taking CBOR as example, the concrete prototypes are:
+
+```cpp
+static basic_json from_cbor(detail::input_adapter i, const bool strict = true); // 1
+static basic_json from_cbor(A1 && a1, A2 && a2, const bool strict = true);      // 2
+```
+
+Both functions read from an input adapter: the first function takes it directly form argument `i`, whereas the second function creates it from the provided arguments `a1` and `a2`. If the optional parameter `strict` is true, the input must be read completely (or a parse error exception is thrown). If it is false, parsing succeeds even if the input is not completely read.
+
+Input adapters are implemented for input streams, character buffers, string literals, and iterator ranges.
+
+Given several inputs (which we assume to be filled with a CBOR value), the following calls are possible:
+
+```cpp
+std::string s;
+json j1 = json::from_cbor(s);                         // 1
+
+std::ifstream is("somefile.cbor", std::ios::binary);
+json j2 = json::from_cbor(is);                        // 1
+
+std::vector<uint8_t> v;
+json j3 = json::from_cbor(v);                         // 1
+
+const char* buff;
+ize_t buff_size;
+json j4 = json::from_cbor(buff, buff_size);           // 2
+```
+
+## Details
+
+### CBOR
+
+The mapping from CBOR to JSON is **incomplete** in the sense that not all CBOR types can be converted to a JSON value. The following CBOR types are not supported and will yield parse errors (parse_error.112):
+
+- byte strings (0x40..0x5F)
+- date/time (0xC0..0xC1)
+- bignum (0xC2..0xC3)
+- decimal fraction (0xC4)
+- bigfloat (0xC5)
+- tagged items (0xC6..0xD4, 0xD8..0xDB)
+- expected conversions (0xD5..0xD7)
+- simple values (0xE0..0xF3, 0xF8)
+- undefined (0xF7)
+
+CBOR further allows map keys of any type, whereas JSON only allows strings as keys in object values. Therefore, CBOR maps with keys other than UTF-8 strings are rejected (parse_error.113).
+
+The mapping from JSON to CBOR is **complete** in the sense that any JSON value type can be converted to a CBOR value.
+
+If NaN or Infinity are stored inside a JSON number, they are serialized properly. This behavior differs from the dump() function which serializes NaN or Infinity to null.
+
+The following CBOR types are not used in the conversion:
+
+- byte strings (0x40..0x5F)
+- UTF-8 strings terminated by "break" (0x7F)
+- arrays terminated by "break" (0x9F)
+- maps terminated by "break" (0xBF)
+- date/time (0xC0..0xC1)
+- bignum (0xC2..0xC3)
+- decimal fraction (0xC4)
+- bigfloat (0xC5)
+- tagged items (0xC6..0xD4, 0xD8..0xDB)
+- expected conversions (0xD5..0xD7)
+- simple values (0xE0..0xF3, 0xF8)
+- undefined (0xF7)
+- half and single-precision floats (0xF9-0xFA)
+- break (0xFF)
+
+### MessagePack
+
+The mapping from MessagePack to JSON is **incomplete** in the sense that not all MessagePack types can be converted to a JSON value. The following MessagePack types are not supported and will yield parse errors:
+
+- bin 8 - bin 32 (0xC4..0xC6)
+- ext 8 - ext 32 (0xC7..0xC9)
+- fixext 1 - fixext 16 (0xD4..0xD8)
+
+The mapping from JSON to MessagePack is **complete** in the sense that any JSON value type can be converted to a MessagePack value.
+
+The following values can not be converted to a MessagePack value:
+
+- strings with more than 4294967295 bytes
+- arrays with more than 4294967295 elements
+- objects with more than 4294967295 elements
+
+The following MessagePack types are not used in the conversion:
+
+- bin 8 - bin 32 (0xC4..0xC6)
+- ext 8 - ext 32 (0xC7..0xC9)
+- float 32 (0xCA)
+- fixext 1 - fixext 16 (0xD4..0xD8)
+
+Any MessagePack output created `to_msgpack` can be successfully parsed by `from_msgpack`.
+
+If NaN or Infinity are stored inside a JSON number, they are serialized properly. This behavior differs from the `dump()` function which serializes NaN or Infinity to `null`.
+
+### UBJSON
+
+The mapping from UBJSON to JSON is **complete** in the sense that any UBJSON value can be converted to a JSON value.
+
+The mapping from JSON to UBJSON is **complete** in the sense that any JSON value type can be converted to a UBJSON value.
+
+The following values can not be converted to a UBJSON value:
+
+- strings with more than 9223372036854775807 bytes (theoretical)
+- unsigned integer numbers above 9223372036854775807
+
+The following markers are not used in the conversion:
+
+- `Z`: no-op values are not created.
+- `C`: single-byte strings are serialized with S markers.
+
+Any UBJSON output created to_ubjson can be successfully parsed by from_ubjson.
+
+If NaN or Infinity are stored inside a JSON number, they are serialized properly. This behavior differs from the `dump()` function which serializes NaN or Infinity to null.
+
+The optimized formats for containers are supported: Parameter `use_size` adds size information to the beginning of a container and removes the closing marker. Parameter `use_type` further checks whether all elements of a container have the same type and adds the type marker to the beginning of the container. The `use_type` parameter must only be used together with `use_size = true`. Note that `use_size = true` alone may result in larger representations - the benefit of this parameter is that the receiving side is immediately informed on the number of elements of the container.
+
+## Size comparison examples
+
+The following table shows the size (in bytes) of different files in the `test/data` directory for the different formats.
+
+| format                  | sample.json | floats.json | all_unicode.json |
+| ----------------------- | -----------:| -----------:| ----------------:|
+| JSON                    | 687491      | 22670390    | 13279259         |
+| CBOR                    | **147095**  | 9000005     | **5494662**      |
+| MsgPack                 | 148395      | 9000005     | **5494662**      |
+| UBJSON unoptimized      | 148695      | 9000002     | 7718787          |
+| UBJSON size-optimized   | 150569      | 9000007     | 7718792          |
+| UBJSON format-optimized | 150883      | **8000009** | 7718792          |
+
+The results show that there does not exist a "best" encoding. Furthermore, it is not always worthwhile to use UBJSON's optimizations.
+
--- a/doc/images/binary.png
+++ b/doc/images/binary.png
--- a/doc/index.md
+++ b/doc/index.md
@ -42,7 +42,7 @@ These pages contain the API documentation of JSON for Modern C++, a C++11 header
    - @link nlohmann::basic_json::parse parse @endlink parse from string
    - @link nlohmann::basic_json::operator>>(std::istream&, basic_json&) operator>> @endlink parse from stream
    - @link nlohmann::basic_json::accept accept @endlink check for syntax errors without parsing
-  - binary formats:
+  - [binary formats](binary_formats.md):
    - CBOR: @link nlohmann::basic_json::from_cbor from_cbor @endlink / @link nlohmann::basic_json::to_cbor to_cbor @endlink
    - MessagePack: @link nlohmann::basic_json::from_msgpack from_msgpack @endlink / @link nlohmann::basic_json::to_msgpack to_msgpack @endlink
    - UBJSON: @link nlohmann::basic_json::from_ubjson from_ubjson @endlink / @link nlohmann::basic_json::to_ubjson to_ubjson @endlink