Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

DAWN-478 ⁃ Dynamic Tables #354

Closed
arhag opened this issue Sep 6, 2017 · 10 comments
Closed

DAWN-478 ⁃ Dynamic Tables #354

arhag opened this issue Sep 6, 2017 · 10 comments

Comments

@arhag
Copy link
Contributor

arhag commented Sep 6, 2017

Background

Currently, EOS.IO is required to support a table schema for all the schemas a contract developer may want to use. Actually, the schema can ignore any "columns" of the table that are not acting as part of an index for the table. The EOS.IO system only needs to understand the type of the object of a table to the extent that it needs to be able to sort it for the table's indices. The rest of the fields (or "columns") of the object can be treated as a dumb vector of bytes by the EOS.IO system, and they only have meaning to the contract code.

So, for example, currently EOS.IO supports three types of tables. The first is a table supporting a single 64-bit key. The second is a table supporting two 128-bit keys. The third is a table supporting three 64-bit keys. The second table, as an example, provides two indices: the first sorts according to the primary key having priority over the secondary key (meaning the secondary key is used only to break ties in the primary key); the second sorts according to the secondary key have priority over the primary key.

Problem and Motivation

If the existing tables do not meet the requirements of contract developers, they are either out of luck or EOS.IO developers need to add a new table type to support the contract developers' needs and upgrade EOS.IO (possibly as a hard fork?) to allow use of that new table type.

For example, say a developer wanted to sort an object by a primary key in descending order and break ties with a secondary key in ascending order. In this example, the primary key could perhaps be a timestamp and the secondary key an account name. A contract developer could using hacky solutions to implement this with the currently available table types. For example, they could use the table with three 64-bit keys, and in particular use the second index of that table that sorts according to the secondary key in ascending order and then breaks ties with the tertiary key (again in ascending order). The contract developers would have to negate the timestamp (so that they can approximate descending order using a key that is sorted according to ascending order) and store it in a 64-bit key (wasting 32-bit of space). The primary key would also be entirely wasted. Finally, the account name would be stored as the tertiary key.

In the previous example, the contract developer was able to find a solution (though a very ugly one) with the existing table types. But what if they really needed a table with 3 128-bit indices? Or what if they needed to be able to store a more general string (not just the restricted strings that can be stored as uint64_t Names) that was sorted in lexicographical ordering? Then they would be out of luck, unless EOS.IO developers added a specific table type to meet their needs.

The EOS.IO platform would be far more powerful and useful to contract developers if it had dynamic table support with a sophisticated type system backing it which allows the contract developer to specify very permissive table schemas in their ABI and makes it possible for the EOS.IO system dynamically support it at run-time.

This document describes a design of a possible system that achieves that goal.

Table implementation

Currently, EOS.IO tables are implemented using the Boost.MultiIndex library. The new dynamic tables proposed in this document will continue using Boost.MultiIndex to implement the tables. This can be possible despite the dynamic nature of the schema because of a few features of Boost.MultiIndex.

One feature that could be of use in implementing dynamic tables using Boost.MultiIndex, is user-defined key extractors. A user-defined key extractor is a functor that receives a const reference to the object being considered within the table and returns another object of the key type. Since each instantiation of a boost::multi_index_container can have its own unique instantiation of the key extractor, it is possible to provide a functor that knows how to deal with the particular type associated with a given dynamic table, even if at the C++ compilation level Boost.MultiIndex treats all the object types as a dumb vector of bytes.

The key extractor does have to return the same particular type (known during compilation) for all the dynamic tables. So the object it returns would itself need to contain a vector of bytes holding the actual instance of the dynamic object but also the metadata needed to know how to compare it against other dynamic objects of the same dynamic type.

Using a user-defined key extractor as described above is one approach to implementing dynamic tables, but is not the approach taken for the proposal in this document. The user-defined key extractor approach requires copying bytes from the dynamic object instance to the dynamic key instance each time a key must be extracted by Boost.MultiIndex. And Boost.MultiIndex extracts a key for each object in boost::multi_index_container that it needs to do a comparison with in order to figure out where to place a new (or modified) object within the container or to simply find an existing object within the container. This approach would add unneccessary overhead and hurt performance.

The approach proposed in this document is to instead take full advantage of another feature of Boost.MultiIndex: custom comparison predicates. Custom functors are specified as part of the definition of the boost::multi_index_container and associated with each index (instead of using a user-defined key extractor, the identity extractor is used for all indices). This functor know how to compare two dynamic objects within the context of some other key type. In other words, although it may be comparing two dynamic objects of type T, it is really using the sorting information of a key type K (in which a definition for how type K can "look into" type T is provided in the ABI) to determine how to compare the two objects of type T. The functor instance needs to be made specific for not only for the table object type T but also for the key type K associated with a particular index, which means it needs access to all the metadata generated from the ABI for a K-type-shaped window into T. When instantiating a boost::multi_index_container, ctor_args_list is used to match the particular instance of the functor to the appropriate index of the container.

This approach works well when adding new objects or modifying existing ones. But one other piece of the puzzle is needed to allow lookups. A contract developer trying to find some object (or get an iterator to some point within a table index) would almost never want to specify the entire object as the lookup key. Instead other key types are used for lookups, and in fact each index of a table has a specific key type that must be used to do lookups (which of course must be specified as part of the contract ABI). To support lookups by custom keys, this proposal suggests the use of another Boost.MultiIndex feature: special lookup operations.

Special lookup operations allow specifying an alternative comparison predicate for lookups within an index as long as they follow a compatible sorting criteria. Since these comparison predicates would be generated by the EOS.IO system from the contract ABI, it can guarantee that the sorting criteria is compatible with that of the index the lookup is being done within. This alternative comparison predicate is another functor that allows comparing two different types. In this case, the two different types would actually be the same type (at the C++ compilation level): a dynamic object. However, one of them would be representing the object type of the table and the other would be representing the type of the key associated with the particular index the lookup is being done within. To properly compare the two instances of dynamic objects, the functor would need metadata for the key-type-shaped window into the table object type (much like in the case of the custom comparison predicates described earlier) as well as the metadata for the key type itself.

New EOS.IO type system

From the previous section, it should be clear that metadata is required which describes the various types involved in the dynamic tables. There should be sufficient metadata to allow the comparison predicates to do their job in comparing two compatible dynamic types.

Though this document has not yet described what sort of types will be supported (that will be discussed shortly within this section), it should be clear that the comparison predicates will eventually need to compare certain primitive types with each other. These primitive types include types such as: int8_t, uint8_t, int16_t, ..., uint64_t, and possibly additional types like bool. The way the values of these types are stored in the raw vector of bytes has a significant impact on the performance of the comparisons (and thus on things like table lookups or adding new objects to a table). Using the existing fc::pack and fc::unpack to, respectively, serialize types to and deserialize types from the raw vector of bytes would lead to very bad performance. Even if the comparison predicate needs to compare a single uint32_t somewhere within a struct, it would be forced to first unpack the entire struct (for both the left-hand side and right-hand side objects of the comparison) to then just compare integers.

Instead, the proposed design in this document takes inspiration from serialization libraries like Cap'n Proto such that the serialization outputs a vector of bytes which allow direct memory mapping of the data to a struct (at least for simple structs that do not involve dynamically-size vectors). This means care has to be paid to how the serialized types are actually laid out in memory so that the a particular integer type has appropriate alignment for fast access by the CPU. The layout algorithm for this design retains the flexibility to reorder fields within a struct type defined by a contract developer for more efficient packing (less padding required for alignment reasons), however, it still has strict constraints imposed to allow a contract developer to map the raw byte data of the serialization to a compatible C struct (these structs could be automatically generated from the ABI using code generators) without any deserialization overhead. Note that a good reference that I used to better understand C struct packing is: The Lost Art of C Structure Packing. Again following the design of Cap'n Proto, reader helper functions would still be necessary to traverse through the serialized type when it is more complicated than a simple struct (for example if it has dynamically-size vectors or sum types such as variant or optional), but this would only add very minimal overhead.

As a side note, I looked into just using Cap'n Proto for the type system rather than building a new one, but ultimately decided it was not the best choice. Cap'n Proto itself does not handle any comparison predicates automatically, so all of that custom code would need to be written anyway. That comparison code would then be forced to deal with the Reader interface of Cap'n Proto and deal with all of the edge cases of Cap'n Proto types that we would not care to support in EOS.IO (Cap'n Proto types have some additional features than the EOS.IO type system design proposed in this document does not). The Cap'n Proto C++ type compiler would have to be heavily modified to disallow the usage of the additional features we do not wish to support (including floating point types). So in terms of developer time it was not at all clear that using it would provide an advantage. In addition, Cap'n Proto does not currently support statically-sized arrays, which I thought was an important feature to have as part of the EOS.IO type system. Statically-sized arrays would need to be implemented as dynamically-sized lists in Cap'n Proto which comes with additional overhead (both in memory and in computation time) and due to the way the layout works would likely mean more cache misses. Statically-sized arrays can be particularly useful when using it as a fixed-size byte array to hold some type that is not directly supported in the EOS.IO type system. For example an std::array<uint8_t, 32> could be used as a substitute for a SHA256 hash. A contract developer may wish to store SHA256 hashes in a table with an index guaranteeing uniqueness (in that case comparison of the byte array would be done according to lexicographical sorting). In the EOS.IO type system design proposed in this document, storing such a hash would just take 32 extra bytes (ignore potential padding for alignment requirements) and be stored inline within the struct. Whereas with Cap'n Proto, the raw 32 bytes of data of the hash would be stored further away from data of the struct and at least 16 extra bytes (two 64-bit words) would be needed as overhead.

Supported types

The new EOS.IO type system proposed in this document supports a few classes of types.

The first class are the builtin types which include the primitive types (like the various integers and bool) and non-primitive types that are added because of the usefulness they are predicted to provide to contract developers, such as String (a NUL-terminated C string), Bytes (a vector of bytes), Rational, and larger sized integers, to name a few.

The second class are product types (to use the jargon of algebraic data types) which are essentially just structs. Note that structs are allowed to inherit from at most one other struct.

The third class are sum types (again using the jargon of algebraic data types): for example a variant (actually a static_variant) or an optional. The variant and optional types are laid out in-place, meaning there is enough space allocated with an appropriate alignment to fit the worst-case type that may be set for that sum type. For that reason, the variant has to be a static_variant, meaning that all of its possible case types must be specified as part of the ABI. For a "dynamic variant" a contract developer would instead want to use an Any type, which is actually part of the builtin class of types.

In the case of the Any type, the actual type it is instantiated to is located outside of the struct or container in which the Any belongs in, and the actual field within struct or container is a 64-bit word which not only holds the offset of the actual type instance but also a type_id which identifies which type the instance has.

The fourth and final class are homogeneous contiguous list containers of which there are two: an array container (which has a fixed size defined in the ABI) and a vector container (which is dynamically-sized). As mentioned earlier, arrays are stored in-place (within a struct or another container) but vectors are not. Vector data is stored somewhere in the raw serialized output but away from the actual struct or container the Vector type exists in. The actual field of the Vector type within the struct or container is a 64-bit word which holds the number of elements in the vector and the offset to the start of the actual data of the vector.

The type system prevents cycles that would lead to infinite sizes. So, for example, a struct cannot inherit from itself or contain itself (either directly or indirectly). It also cannot contain an array of itself, an optional of itself, or a variant that includes the struct as a possible case (again, in all cases, either directly or indirectly). However, vectors (and Any types) are the exception to this rule. A struct can contain as a field a vector of itself. This is because the vector only takes up a 64-bit word within the actual struct and that 64-bit word contains the size of the vector and the offset to the start of the vector data (the data of the vector is stored elsewhere within the serialized data).

Notice that there is no support for containers such as sets and maps. The EOS.IO type system proposed in this document only concerns itself with the size/alignment and comparison aspects of types.

First and foremost, the type system wants to know how much space it needs to hold the instance of the type and what its alignment requirements need to be for fast access to the instance. It also cares about how EOS.IO is supposed to compare two instances of the same type, which is why, for example, there is both a type for uint16_t and int16_t. Both of those types have the same size and alignment requirements, but they have different comparison requirements. A contract developer wants to know that, for example, -1 is sorted before 2 in ascending order. But if all 32-bit integers were treated the same (ignoring its sign) then 2 (or 0x0002 when represented as a uint16_t) would come before -1 (or 0xFFFF when represented as a uint16_t) in ascending order. It is also the reason for a Rational builtin type. From a size/alignment perspective only, a contract developer could use:

struct not_quite_a_rational 
{
    int64_t numerator;
    uint64_t denominator;
};

But the EOS.IO type system would sort this in lexicographical order. So, for example, the rational 4/5 would be sorted before 5/8 in ascending order (even though 0.8 > 0.625) if a contract developer used the custom not_quite_a_rational struct rather than using the EOS.IO rational type. As one can imagine, this would be a major problem if those rationals were, for example, representing prices in an order book table of an exchange contract.

The focus on just the size/alignment and comparison aspects of EOS.IO types however means that certain types that one is used to using in programming languages do not actually need a corresponding EOS.IO type. For example, a map can be represented using vector<pair<key, value>> (the pair type is just another product type and could be represented with a custom struct). Even if the map type is implemented using a red-black tree within the programming environment of the contract code, when it comes time to serialize it to a raw vector of bytes, it would be treated like vector<pair<key, value>> sorted in the appropriate order. The EOS.IO system would then allow comparisons between instances of vector<pair<key, value>> using the straightforward lexicographical ordering, which is good enough if all the contract developer cares about is (as an example) ensuring that each map in the table is unique. It usually does not make sense to compare two sets or two maps for some purpose other than checking for equality, hence the EOS.IO type system proposed in this document does not bother with supporting sets and maps through some special type with a special comparison predicate.

Comparisons between instances of the supported types

Each of the supported types has an appropriate comparison defined.

The array and vector types are sorted in lexicographical order, in which the comparison function of their element type is used to figure out how to compare any two elements within the lists.

The optional type is sorted as follows: two empty optionals are equal; an empty optional is less than a non-empty optional; and, if comparing two non-empty optionals the comparison is delegated to the comparison function of the contained type of the optional.

The variant type is sorted as follows: if the case index of two variants are not the same, the one with the smaller case index is less than the one with the larger case index; otherwise, if both variants have the same case index, then the comparison is delegated to the comparison function of the contained type of the variants.

The Any type is sorted as follows: two empty Anys are equal; an empty Any is less than a non-empty Any; if comparing two non-empty Anys of different types, the comparison is based on the comparison of their type_ids (definition of type_id comparison will be left for later); and, if comparing two non-empty Anys of the same type, the comparison is delegated to the comparison function of the actual type of the Anys.

The Bytes and String types of the builtin types class are sorted in lexicographical order similar to the array and vector types.

The primitive integer types (and bool) of the builtin types class are sorted in the usual way integers are sorted.

The rational type of the builtin types class is sorted according to: lhs.numerator * rhs.denominator < rhs.numerator * lhs.denominator.

Struct types are defined in the ABI with a particular sorting specification. The sorting specification describes which members of the struct are to be compared (in a particular order) and in which manner (ascending or descending order) when comparing two instances of the same struct type. In this case, "members" can refer to either one of the fields of the struct or the base of a derived struct.

Table specification

Tables are specified in the ABI by specifying a particular struct type as the main object of the table, and specifying one or more table indices. The table indices specify another type (either a struct type or a type from the builtin class) which acts as the key type for that index. The ABI for the table index also specifies whether it is a unique or non-unique index and whether to reverse the sorting order of the key type (i.e. make it in descending order rather than its default of ascending order).

In order for the table index to specify a key type different from the main object type of the table, the table index specification also needs to include a mapping between the main object type and the key type. This mapping associates to each of the sorted members of the key type a particular member of the main object type. This mapping provides the metadata needed to get a "key-type-shaped window into the table object type."

Contract developer interface

The goal is to provide a convenient interface to the contract developers to handle most of the complexity of this type system for them. A contract developer should be able to use the regular types of their programming environment and just add some extra metadata to them (either through annotations or perhaps some macros) to reflect those types to their corresponding EOS.IO types and provide the sorting specification when appropriate. Then tools provided by EOS.IO developers would take this information and automatically generate the ABI for the contract as well as the interfaces in the contract developers programming environment that allows the developers to read/create/modify and generally interact with these EOS.IO types and the dynamic table system.

One possible approach, and the approach taken so far in the implementation of this system, which is specific to a C++ contract development environment, is to use macros and advanced C++ template features to generate code during regular compilation that automatically discovers the types used by the contract developer and their association with corresponding EOS.IO types, and provides a type initialization function (to be called during the init stage of the EOS.IO contract) which automatically generates a TypeManager object to help handle the management of the types.

In particular, the TypeManager object along with the helper functions that are automatically generated during contract compilation allow the contract developer to easily deserialize to the C++ types that they use from the raw vector of bytes representing the type instance that is provided by the EOS.IO system via the WASM interface. It also allows the contract developer to take the C++ types they use and serialize them to the raw vector of bytes which they then pass to the EOS.IO system via the WASM interface (for example to create a new object in the table or modify an existing one).

Other more sophisticated interfaces for dealing with the EOS.IO types are possible as well, which can provide a more performant way of accessing or manipulating the data under certain situations. There could be support added for immutable access to the type via the raw data directly, meaning no deserialization step at all (this is similar to the approach taken with Cap'n Proto).

Another option that could be supported is mutable window into the type via the raw data (again no deserialization step). In this case, though there would be mutability, it would be a limited mutability. In particular, no vectors would be allowed to be resized, even though the contents of the vectors could be swapped and modified as the contract developer wishes. This would be a very useful option for parts of contracts that, for example, just need to modify a few primitive fields in a struct and do not bother with more complicated types such as vectors. For this case, the contract developer gets a big performance boost by using this limited mutability interface rather than going through the overhead of deserialization and reserialization. Instead the raw vector of bytes received from the EOS.IO system is modified in-place directly by the contract and then returned back to the EOS.IO system to actually make the change occur in the database.

But if the contract developer needs to modify an existing object in a way that requires changing the size of vectors contained within the object, they would be forced to go through the deserialization step so that they have a normal C++ object they can manipulate as they wish, before eventually serializing back to the raw vector of bytes that is passed via the WASM interface to the EOS.IO system.

The ability to deserialize from raw data to a C++ type and then reserialize back to raw data is the more general approach that will work in all situations. The additional interfaces of the immutable access interface and the limited mutability interface are nice features to eventually have because they provide a performance benefit to contract developers. But since the advantage they provide is only in performance, it is not a priority to support those interfaces in the beginning. Those interfaces can be added later. The EOS.IO type system proposed in this document is already designed to support these interfaces and so adding them would not require a redesign or a hard fork. In fact, EOS.IO developers need not even be the ones to add such interfaces. These can be added by third-party developers as better tooling to support contract developers in their favorite programming environment (assuming it compiles to WebAssembly of course).

Implementation done so far

As of September 5, 2017, I have developed a C++ implementation for most of this new type system. One important class in the implementation is TypeConstructor. TypeConstructor provides an interface to add various types into the TypeConstructor object as well as to add tables. Here is look at the interface for some of the relevant methods:

      struct table_index
      {
         type_id key_type;
         bool unique;
         bool ascending;
         vector<uint16_t> mapping;
      };

      type_id::index_t forward_declare(TypeNameConstRef declared_name);
      type_id::index_t add_struct(TypeNameConstRef name, const vector<pair<type_id, int16_t>>& fields, type_id base = type_id(), int16_t base_sort = 0 );
      type_id::index_t add_array(type_id element_type, uint32_t num_elements);
      type_id::index_t add_vector(type_id element_type);
      type_id::index_t add_optional(type_id element_type);
      type_id::index_t add_variant(const type_list& cases);
      
      type_id::index_t add_table(type_id::index_t object_index, const vector<table_index>& indices);

This is an interface that could be used by, for example, compiler plugins, or, in the case of my implementation, by macros and C++ template visitors to actually build the metadata for the set of types used by the contract. That metadata is held as two vectors in another class called TypeManager:

class TypeManager
{
      vector<uint32_t>        types;
      vector<field_metadata>  fields;
// Rest omitted... 
};

The field_metadata structure includes a type_id (which is represented in a 32-bit number) and an additional 32-bit of data storing the offset of the field as well as sorting information. The TypeManager object can be extracted from the TypeConstructor object assuming that all the types (as they are constructed within the TypeConstructor object) validate.

The TypeManager object has all the metadata needed by the EOS.IO system to do comparisons between dynamic objects and implement dynamic tables.

The TypeManager object also has the metadata needed by the contract code to serialize and deserialize C++ objects to and from raw data.

Rather than using TypeConstructor directly, my implementation uses macros and template visitors to automatically discover the relevant types (after being given some information by the contract developer via macros) and create a type initialization function that can be called to build the TypeConstructor appropriately.

While I still have changes planned for the interface of this mechanism, I can show a small example of how it works today:

#include <eos/types/reflect.hpp>

#include <iostream>
#include <iterator>

using std::vector;
using std::array;

struct type1
{
   uint32_t a;
   uint64_t b;
   vector<uint32_t> c;
   vector<type1> v;
};

struct type2 : public type1
{
   int16_t d;
   array<type1, 2> arr;
};

EOS_TYPES_REFLECT_STRUCT( type1,       (a)(b)(c)(v),  ((c, asc))((a, desc)) )
//                        struct name, struct fields, sequence of pairs for sorted fields (field name, asc/desc) where asc = ascending order and desc = descending order.

EOS_TYPES_REFLECT_STRUCT_DERIVED( type2,       type1,     (d)(arr),      ((d, desc))((type1, asc)) )
//                                struct name, base name, struct fields, sequence of pairs for sorted members (as described above), however notice the base type name can also be included as member

EOS_TYPES_REGISTER_STRUCTS((type2))
// Only need to bother registering type2 because the reflection system will automatically discover type1 since type2 uses it.



int main()
{
   auto tc = eos::types::initialize_types();

   std::cout << "type1:" << std::endl;
   auto type1_index = eos::types::get_struct_index<type1>();
   tc.write_type(std::cout, eos::types::type_id::make_struct(type1_index));
   auto sa1 = tc.get_size_align_of_struct(type1_index);
   std::cout << "Size: " << sa1.get_size() << std::endl;
   std::cout << "Alignment: " << (int) sa1.get_align() << std::endl;
   std::cout << "Sorted members (in sort priority order):" << std::endl;
   for( auto f : tc.get_sorted_members(type1_index) )
      std::cout << f << std::endl;
   std::cout << std::endl;

   std::cout << "type2:\n";
   auto type2_index = eos::types::get_struct_index<type2>();
   tc.write_type(std::cout, eos::types::type_id::make_struct(type2_index));
   auto sa2 = tc.get_size_align_of_struct(type2_index);
   std::cout << "Size: " << sa2.get_size() << std::endl;
   std::cout << "Alignment: " << (int) sa2.get_align() << std::endl;
   std::cout << "Sorted members (in sort priority order):" << std::endl;
   for( auto f : tc.get_sorted_members(type2_index) )
      std::cout << f << std::endl;
   std::cout << std::endl;


   auto tm = tc.destructively_extract_type_manager();
   eos::types::serialization_region r(tm);

   type1 s1{ .a = 8, .b = 9, .c = {1, 2, 3, 4, 5, 6} };

   type2 s2;
   s2.a = 42; s2.b = 43; s2.d = -1;
   s2.arr[0] = s1;
   s2.arr[1].a = 1; s2.arr[1].b = 2; s2.arr[1].c.push_back(1); s2.arr[1].c.push_back(2);

   r.write_struct(s1);

   std::cout << "Raw data of serialization of s1:\n" << std::hex;
   for( auto b : r.get_raw_data() )
      std::cout << (int) b << " ";
   std::cout << std::dec << std::endl;

   std::ostream_iterator<uint32_t> oi(std::cout, ", ");
   std::cout << "Reading back struct from raw data." << std::endl;
   type1 s3;
   r.read_struct(s3);
   std::cout << "a = " << s3.a << ", b = " << s3.b << ", and c = [";
   std::copy(s3.c.begin(), s3.c.end(), oi);
   std::cout << "]\n" << std::endl;

   r.clear();
   r.write_struct(s2);

   std::cout << "Raw data of serialization of s2:\n" << std::hex;
   for( auto b : r.get_raw_data() )
      std::cout << (int) b << " ";
   std::cout << std::dec << std::endl;

   std::cout << "Reading back struct from raw data." << std::endl;
   type2 s4;
   r.read_struct(s4);
   std::cout << "a = " << s4.a << ", b = " << s4.b << ", c = [";
   std::copy(s4.c.begin(), s4.c.end(), oi);
   std::cout << "]";
   std::cout << ", d = " << s4.d << ", and arr = [";
   for( auto i = 0; i < 2; ++i )
   {
      std::cout << "{a = " << s4.arr[i].a << ", b = " << s4.arr[i].b << ", and c = [";
      std::ostream_iterator<uint32_t> oi(std::cout, ", ");
      std::copy(s4.arr[i].c.begin(), s4.arr[i].c.end(), oi);
      std::cout << "]}, ";
   }
   std::cout << "]\n" << std::endl;

   return 0;
}

And that code generates the following output:

type1:
struct type1 /* struct(4) */
{
    UInt32 f0; // Sorted in descending order
    UInt64 f1;
    Vector<UInt32> f2; // Sorted in ascending order
    Vector<T1> /*[T1 = struct(4)]*/ f3;
};
Size: 32
Alignment: 8
Sorted members (in sort priority order):
Vector<UInt32> (sorted in ascending order) located at offset 0x8
UInt32 (sorted in descending order) located at offset 0x18

type2:
struct type2 /* struct(0) */ : type1 /* struct(4) */ // Base sorted in ascending order
{
    Int16 f0; // Sorted in descending order
    Array<T1, 2> /*[T1 = struct(4)]*/ f1;
};
Size: 104
Alignment: 8
Sorted members (in sort priority order):
Int16 (sorted in descending order) located at offset 0x60
T [T = struct(4)] (sorted in ascending order) located at offset 0x0

Raw data of serialization of s1:
9 0 0 0 0 0 0 0 6 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
Reading back struct from raw data.
a = 8, b = 9, and c = [1, 2, 3, 4, 5, 6, ]

Raw data of serialization of s2:
2b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2a 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 6 0 0 0 68 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 80 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ff ff 0 0 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 1 0 0 0 2 0 0 0
Reading back struct from raw data.
a = 42, b = 43, c = [], d = -1, and arr = [{a = 8, b = 9, and c = [1, 2, 3, 4, 5, 6, ]}, {a = 1, b = 2, and c = [1, 2, ]}, ]

As of now, my implementation of the macro reflection shown above does not supporting the following types (although my implementation of TypeConstructor and TypeManager do support these types): variants, Any, Rational, String, Bytes. I also do not yet have support (either at the macro reflection level or the TypeConstructor/TypeManager level) for builtin integers larger in size than 64-bits. Also, as of now, I have not yet completed work on the comparison predicates for Boost.MultiIndex using this new type system needed to make dynamic tables possible, although I have tested using comparison predicates in an earlier and extremely rough prototype just to prove to myself that it was in fact possible to do what I want to do.

@bytemaster
Copy link
Contributor

Things that are missing from this:

  1. What does the JSON representation of the ABI look like
  2. How do we convert a type to JSON / from JSON given just the ABI and a binary blob or JSON and no pre-compiled data

I would find your code above more readable if you removed std:: everywhere and spaced things out a bit more.

@bytemaster
Copy link
Contributor

It may not be possible to do 0 copy because the WASM code can only access its address space.

That said, I think we can do 1 copy rather than 2 copies and we can avoid dynamic memory allocation for parsing types from the DB.

Unfortunately, for dynamically sized types created by WASM code we will need to normalize and ensure it is all located in contiguous memory before we can store it.

Is your code checked in anywhere it can be reviewed?

@arhag
Copy link
Contributor Author

arhag commented Sep 11, 2017

I have been redesigning a few things in my implementation to meet some of the requirements I have learned are necessary for the type system to be used in EOS.IO. Also, I continued to add the functionality needed to compare instances of compatible types, which is critical to make Dynamic Tables work,

ABI

Initially, I just had a TypeConstructor class which had the helper methods to construct the "compiled" version of the type metadata that could directly be used by the system to compare table objects/keys. This was good for some initial testing, but ultimately I needed a proper ABI specification that could be serialized and included within a transaction when changing/setting the contract code.

So now I have designed a tentative ABI that can be passed to the EOS.IO platform. I haven't written up a formal description of it yet, but just to get an idea of what it currently looks like, here is the definition of the ABI struct in my current implementation:

   struct ABI
   {
      enum type_specification : uint8_t
      {
         struct_type,
         array_type,
         vector_type,
         optional_type,
         variant_type
      };

      struct type_definition
      {
         uint32_t           first;
         int32_t            second;
         type_specification ts;
      };

      struct struct_t
      {
         string                        name;
         vector<pair<string, type_id>> fields;
         vector<int16_t>               sort_order;
      };

      struct table_index
      {
         type_id          key_type;
         bool             unique;
         bool             ascending;
         vector<uint16_t> mapping;
     };

      struct table
      {
         type_id::index_t    object_index;
         vector<table_index> indices;
      };

      vector<type_definition> types;
      vector<struct_t>        structs;
      vector<type_id>         variant_cases;
      vector<table>           tables;
   };

Much of it is self-explanatory, but a few points are worth mentioning.

The type_definition can either be for a struct, array, vector, optional, or variant. In the case of a vector or optional, the first field holds the type_id of the element type that the vector/optional contains, and the second field is ignored. In the case of an array, the first field again holds the type_id of the element type that the array contains, and the second field holds the number of elements of the array. In the case of a variant, first is an index into variant_cases to the start of the cases of the variants and second is the number of cases for that variant. And finally, in the case of a struct, second specifies the base type of the struct (-1 if no base type, otherwise it is the index into types to the struct which is the base type) and first is an index into structs vector which provides the remaining definition of the struct.

The ABI includes extra information that I did not bother including before such as the name of the structs and the name of their fields. This information is not necessary to make dynamic tables work nor to allow interoperability between the contract code and the EOS.IO platform. It is only there to allow the platform to convert the structs to/from JSON. I believe the ABI should have everything needed to be able convert the objects/keys of the table (and any types they contain within) to/from JSON dynamically (although I have not yet started working on the actual implementation of that).

A table must specify one struct as its object type. A given struct type cannot be used in more than one table. The table inherits the name of the struct object it is defined for. A table can have 1 or more indices (currently I have a hard cap of 255 indices for a table). The index specifies a key_type which must either be a builtin type or a struct type. The index also specifies whether it is unique or non-unique (note: non-unique indices are implemented in Boost.MultiIndex as unique indices that break ties if the key is equal by comparing the object IDs) and whether to keep the normal sorting order of the key_type (ascending == true) or to reverse it (ascending == false). The mapping field is needed to map the sorted fields (and only the sorted fields) of the key_type (or the whole type itself if key_type is a builtin type) to the fields of the table object type.

This ABI could be serialized/deserialized using fc::pack and fc::unpack because there is no need to compare two ABIs together in dynamic tables. It should be possible to serialize it using this new type system I am designing (if that was desired for some reason), but it requires getting the C++ serialization implementation to support enums (to handle type_specification) and to support automatic conversions of arbitrary classes to builtin types for which they have conversions defined (to handle automatically serializing type_id as a uint32_t). Both of those requirements are something I want the C++ serialization implementation to support anyway, especially the latter one. The latter requirement means allowing a contract developer to create a token C++ class (which is just a smarter int64_t that automatically manages precision info and other useful things related to tokens) and have the serialization system automatically (if the class has the necessary methods defined and the class has been reflected appropriately) handle it as if it is a int64_t from the perspective of the Dynamic Tables system.

Comparing compatible objects according to sorting specification of a table index

The table indices specified in the ABI above tell the Dynamic Table system how to compare two table objects according to a custom sorting rule. It is as if the key_type is extracted from the objects and those keys are compared to figure out how to sort the objects. However, that is not what actually happens in the implementation. Instead the ABI is "compiled" to metadata that describes a "key-shaped window into the object" which is later used by the Dynamic Table system to compare two instances of the object type in the appropriate way at runtime. Also, using the additional compiled metadata of the key type, it is possible for the Dynamic Table system to compare an object type with the key type at run-time (again it is as if the key type is being compared with another key extracted from the table object). This heterogeneous form of comparison is necessary to handle the contract developer's requests for index lookups by key, while the earlier homogeneous form of comparison is needed by the Dynamic Table system to handle insertion and modification of objects within the tables.

I have implemented both of these comparison algorithms for this new type system. They are accessible through two new methods of the types_manager class:

      int8_t compare_object_with_key(const raw_region& object_data, const raw_region& key_data, const table_index& ti)const;
      int8_t compare_objects(uint64_t lhs_id, const raw_region& lhs, uint64_t rhs_id, const raw_region& rhs, const table_index& ti)const;

They both take references to the raw data (handled in the raw_region class) for two type instances to be compared and a reference to the table_index providing the sorting specifications.

compare_object_with_key takes an object (object_data) being considered from the table and a provided key (key_data) and returns:

  • -1 if the appropriate key extracted from the object would precede the provided key in the sorting order specified by table_index;
  • 1 if the appropriate key extracted from the object would follow the provided key in the sorting order specified by table_index;
  • or 0 if the appropriate key extracted from the object is considered indistinguishable from the provided key according to the sorting order specified by table_index.

compare_objects takes two objects of the same type to compare: one on the left-hand side (lhs) of the comparison (lhs together with lhs_id) and one on the right-hand side (rhs) of the comparison (rhs together with rhs_id). Similar to compare_object_with_key, it returns:

  • -1 if the lhs object precedes the rhs object in the sorting order specified by table_index;
  • 1 if the lhs object follows the rhs object in the sorting order specified by table_index;
  • or 0 if the lhs and rhs objects are indistinguishable from one another according to the sorting order specified by table_index.
    Keep in mind that unique indices do not use the ID information of the lhs and rhs objects at all, while non-unique indices use the IDs as a last resort to distinguish two different objects that would have otherwise been considered indistinguishable according to the sorting rules of the table index (i.e. the keys that could be extracted from the two objects would be identical).

Updated code showing new interface and new features

Once again, the interface of the C++ implementation will likely change. It already has changed a bit since last week. Now the global initialize_types function returns an abi_constructor rather than TypeConstructor. One can use the get_abi() method of abi_constructor to get the ABI which could, eventually, then be serialized and put into a transaction including a setcode message.

As before the initialize_types function is built using macros and uses template visitors to automatically discover (via the reflected structs created via macros) the relevant types and generate the ABI from them. I have now added a couple more macros to be used by the contract developer. EOS_TYPES_CREATE_TABLE allows the contract developer to specify a new table including its indices (see the comments in the example code below for how to use it). EOS_TYPES_REGISTER_STRUCTS has been replaced by the new macro EOS_TYPES_REGISTER_TYPES which allows the contract developer to specify the tables they want registered as well as any other miscellaneous structs that would not be automatically discovered from the specified tables to be registered.

The idea behind these macros and the tools provided in the current C++ implementation is that they can be used in an executable (which runs on the contract developer's machine) that automatically discovers the types used in the contract and generates the serialized ABI to be used in the setcode message when the developer is ready to deploy it to EOS.IO network (or a test net).

The executable could also generate the C++ files to be included by the contract code which provides the serialization/deserialization code to allow the contract to interface with the Dynamic Table system conveniently. While the current implementation is capable of generating that serialization/deserialization code during C++ compilation (and the metadata that compiled code needs to do its job needs to be generated during an initialization stage at runtime) that may not ultimately be a viable approach since a lot of that code uses exceptions and C++ STL and Boost libraries, which apparently will not be available in the WebAssembly environment, So, it may just be easier to do the harder "type compilation" in a fuller C++ environment (for developer convenience) and generate much simpler serialization/deserialization C++ code that can be compiled to WebAssembly and run in that environment without problems.

Note: Even with this code generation approach, the strict WebAssembly environment puts a lot of developer burden on the implementation of the serialization/deserialization tools. First, all the existing error handling in my serialization/deserialization implementation has to be redesigned to be C-style, since apparently exceptions are not allowed. Second, I at least need a vector class; so a minimal vector class similar to the C++ STL vector needs to be reimplemented for the WebAssembly environment (this of course means we need malloc and free #129). Third, some of the type traits feature of the STL need to be reimplemented. And all of this is assuming optional and variant types are not initially supported in the serialization/deserialization implementation (though they would still be supported by the Dynamic Tables system). If optional and variant types were to be supported as well, that would require even more reimplementation of existing libraries.

Example code:

#include <eos/types/reflect.hpp>
#include <eos/types/serialization_region.hpp>
#include <eos/types/abi_constructor.hpp>
#include <eos/types/TypeConstructor.hpp>

#include <iostream>
#include <iterator>
#include <string>

using std::vector;
using std::array;
using std::string;

struct type1
{
   uint32_t a;
   uint64_t b;
   vector<uint32_t> c;
   vector<type1> v;
};

struct type2 : public type1
{
   int16_t d;
   array<type1, 2> arr;
};

struct type3
{
   uint64_t x;
   uint32_t y;
};

EOS_TYPES_REFLECT_STRUCT( type1,       (a)(b)(c)(v),  ((c, asc))((a, desc)) )
//                        struct name, struct fields, sequence of pairs for sorted fields (field name, asc/desc) where asc = ascending order and desc = descending order.

EOS_TYPES_REFLECT_STRUCT_DERIVED( type2,       type1,    (d)(arr),       ((d, desc))((type1, asc)) )
//                                struct name, base name, struct fields, sequence of pairs for sorted members (as described above), however notice the base type name can also be included as member

EOS_TYPES_REFLECT_STRUCT( type3, (x)(y), ((x, asc))((y, asc)) )

EOS_TYPES_CREATE_TABLE( type1,  ((uint32_t, nu_asc, ({0}) ))((type3,    u_desc,     ({1,0})   )) )
//                      object, sequence of index tuples:   ((key_type, index_type, (mapping) )) where: mapping is a list (in curly brackets) of uint16_t member indices
//                                                                                                        which map the key_type's sort members (specified in that order) to the object type's members,
//                                                                                               and    index_type = { u_asc,  // unique index sorted according to key_type in ascending order
//                                                                                                                     u_desc, // unique index sorted according to key_type in descending order
//                                                                                                                     nu_asc, // non-unique index sorted according to key_type in ascending order
//                                                                                                                     nu_desc // non-unique index sorted according to key_type in descending order
//                                                                                                                   }

EOS_TYPES_REGISTER_TYPES( (type1), (type2) )
//                        tables,  other structs
// By registering the table for type1, it automatically registers type1 (since it is the object type of the table) and type3 (since it is the key type of one of the indices of the table).
// However, type2 is not discovered from the table of type1, so it needs to be explicitly registered through specifying it within the other structs arguments to the macro.

int main()
{
   using namespace eos::types;
   using std::cout;
   using std::endl;


   auto ac = initialize_types();

   TypeConstructor tc(ac.get_abi());

   auto print_type_info = [&](type_id tid)
   {
      auto index = tid.get_type_index();
      cout << tc.get_struct_name(index) << ":" << endl;
      auto sa = tc.get_size_align_of_struct(index);

      tc.write_type(cout, tid);
      cout << "Size: " << sa.get_size() << endl;
      cout << "Alignment: " << (int) sa.get_align() << endl;

      cout << "Sorted members (in sort priority order):" << endl;
      for( auto f : tc.get_sorted_members(index) )
         cout << f << endl;
   };

   auto type1_tid = type_id::make_struct(tc.get_struct_index<type1>());
   print_type_info(type1_tid);
   cout << endl;

   auto type2_tid = type_id::make_struct(tc.get_struct_index<type2>());
   print_type_info(type2_tid);
   cout << endl;

   auto type3_tid = type_id::make_struct(tc.get_struct_index<type3>());
   print_type_info(type3_tid);
   cout << endl;

   //auto tm = tc.destructively_extract_types_manager();
   auto tm = tc.copy_types_manager();
   serialization_region r(tm);

   type1 s1{ .a = 8, .b = 9, .c = {1, 2, 3, 4, 5, 6} };

   type2 s2;
   s2.a = 42; s2.b = 43; s2.d = -1;
   s2.arr[0] = s1;
   s2.arr[1].a = 1; s2.arr[1].b = 2; s2.arr[1].c.push_back(1); s2.arr[1].c.push_back(2);

   r.write_struct(s1, type1_tid);

   cout << "Raw data of serialization of s1:" << std::hex << endl;
   for( auto b : r.get_raw_data() )
      cout << (int) b << " ";
   cout << std::dec << endl;

   std::ostream_iterator<uint32_t> oi(cout, ", ");
   cout << "Reading back struct from raw data." << endl;
   type1 s3;
   r.read_struct(s3, type1_tid);
   cout << "a = " << s3.a << ", b = " << s3.b << ", and c = [";
   std::copy(s3.c.begin(), s3.c.end(), oi);
   cout << "]" << endl << endl;


   raw_region r2;
   r2.extend(4);
   auto tbl_indx1 = tm.get_table_index(tm.get_table("type1"), 0);
   auto tbl_indx2 = tm.get_table_index(tm.get_table("type1"), 1);

   r2.set<uint32_t>(0, 13);
   auto c1 = tm.compare_object_with_key( r.get_raw_region(), r2, tbl_indx1 );
   cout << "Comparing object s1 with uint32_t key of value " << r2.get<uint32_t>(0) << " via the first index of table 'type1' gives: " << (int) c1 << endl;
   r2.set<uint32_t>(0, 5);
   auto c2 = tm.compare_object_with_key( r.get_raw_region(), r2, tbl_indx1 );
   cout << "Comparing object s1 with uint32_t key of value " << r2.get<uint32_t>(0) << " via the first index of table 'type1' gives: " << (int) c2 << endl;
   r2.set<uint32_t>(0, 8);
   auto c3 = tm.compare_object_with_key( r.get_raw_region(), r2, tbl_indx1 );
   cout << "Comparing object s1 with uint32_t key of value " << r2.get<uint32_t>(0) << " via the first index of table 'type1' gives: " << (int) c3 << endl;
   r2.clear();
   cout << endl;

   type3 s5{ .x = 8, .y = 9 };
   cout << "Struct s5 is {x = " << s5.x << ", y = " << s5.y << "}." << endl;
   serialization_region r3(tm);
   r3.write_struct(s5, type3_tid);
   auto c4 = tm.compare_object_with_key( r.get_raw_region(), r3.get_raw_region(), tbl_indx2 );
   cout << "The relevant key extracted from object s1 would " << (c4 == 0 ? "be equal to" : (c4 < 0 ? "come before" : "come after"))
        << " the key s5 according to the sorting specification of the second index of table 'type1'." << endl;
   cout << endl;


   r.clear();
   r.write_struct(s2, type2_tid);

   cout << "Raw data of serialization of s2:" << std::hex << endl;
   for( auto b : r.get_raw_data() )
      cout << (int) b << " ";
   cout << std::dec << endl;

   cout << "Reading back struct from raw data." << endl;
   type2 s4;
   r.read_struct(s4, type2_tid);
   cout << "a = " << s4.a << ", b = " << s4.b << ", c = [";
   std::copy(s4.c.begin(), s4.c.end(), oi);
   cout << "]";
   cout << ", d = " << s4.d << ", and arr = [";
   for( auto i = 0; i < 2; ++i )
   {
      cout << "{a = " << s4.arr[i].a << ", b = " << s4.arr[i].b << ", and c = [";
      std::copy(s4.arr[i].c.begin(), s4.arr[i].c.end(), oi);
      cout << "]}, ";
   }
   cout << "]" << endl << endl;

   auto type1_table = tm.get_table("type1");
   for( auto i = 0; i < tm.get_num_indices_in_table(type1_table); ++i )
   {
      auto ti = tm.get_table_index(type1_table, i);

      auto key_type = ti.get_key_type();
      string key_type_name;
      if( key_type.get_type_class() == type_id::builtin_type )
         key_type_name = type_id::get_builtin_type_name(key_type.get_builtin_type());
      else
         key_type_name = tc.get_struct_name(key_type.get_type_index());

      cout << "Index " << (i+1)  << " of the 'type1' table is " << (ti.is_unique() ? "unique" : "non-unique")
           << " and sorted according to the key type '" << key_type_name
           << "' in " << (ti.is_ascending() ? "ascending" : "descending" ) << " order." << endl;
   }

   return 0;
}

That code generates the following output:

type1:
struct type1 /* struct(4) */
{
    UInt32 f0; // Sorted in descending order
    UInt64 f1;
    Vector<UInt32> f2; // Sorted in ascending order
    Vector<T1> /*[T1 = struct(4)]*/ f3;
};
Size: 32
Alignment: 8
Sorted members (in sort priority order):
Vector<UInt32> (sorted in ascending order) located at offset 0x8
UInt32 (sorted in descending order) located at offset 0x18

type2:
struct type2 /* struct(8) */ : type1 /* struct(4) */ // Base sorted in ascending order
{
    Int16 f0; // Sorted in descending order
    Array<T1, 2> /*[T1 = struct(4)]*/ f1;
};
Size: 104
Alignment: 8
Sorted members (in sort priority order):
Int16 (sorted in descending order) located at offset 0x60
T [T = struct(4)] (sorted in ascending order) located at offset 0x0

type3:
struct type3 /* struct(0) */
{
    UInt64 f0; // Sorted in ascending order
    UInt32 f1; // Sorted in ascending order
};
Size: 16
Alignment: 8
Sorted members (in sort priority order):
UInt64 (sorted in ascending order) located at offset 0x0
UInt32 (sorted in ascending order) located at offset 0x8

Raw data of serialization of s1:
9 0 0 0 0 0 0 0 6 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
Reading back struct from raw data.
a = 8, b = 9, and c = [1, 2, 3, 4, 5, 6, ]

Comparing object s1 with uint32_t key of value 13 via the first index of table 'type1' gives: -1
Comparing object s1 with uint32_t key of value 5 via the first index of table 'type1' gives: 1
Comparing object s1 with uint32_t key of value 8 via the first index of table 'type1' gives: 0

Struct s5 is {x = 8, y = 9}.
The relevant key extracted from object s1 would come before the key s5 according to the sorting specification of the second index of table 'type1'.

Raw data of serialization of s2:
2b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2a 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 6 0 0 0 68 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 80 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ff ff 0 0 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 1 0 0 0 2 0 0 0
Reading back struct from raw data.
a = 42, b = 43, c = [], d = -1, and arr = [{a = 8, b = 9, and c = [1, 2, 3, 4, 5, 6, ]}, {a = 1, b = 2, and c = [1, 2, ]}, ]

Index 1 of the 'type1' table is non-unique and sorted according to the key type 'UInt32' in ascending order.
Index 2 of the 'type1' table is unique and sorted according to the key type 'type3' in descending order.

Sorry for posting all this code in the GitHub issue. I am getting close to cleaning up what I currently have and posting it in some repo for review (though I expect a lot of what I currently have to change a lot). For the time being it will be easier to just keep it in separate repo (not eos) as I redesign/refactor, test, and finish implementing the remaining features to actually demonstrate a Boost.MultiIndex table sorting dynamic objects. Then after that I will be figure out how to integrate it into eos code base and ChainBase.

Tasks to do

  • Finish cleaning up existing code and put in a GitHub repository somewhere.
  • Add serialization support for String, Bytes, and Rational.
  • Implement code to use the new comparison methods within the comparison functors of Boost.MultiIndex and build demonstration application showing dynamic tables (in a very rudimentary form) working.
  • Unit tests
  • Add serialization support for automatically converting C++ enums and general classes/structs to/from their backing builtin types of this new type system (supports use cases like the token class discussed earlier). Possibly also add support in the serialization system to automatically convert C++ pairs and tuples to/from a compatible struct type of this new type system (supports serializing a map as a vector of pairs, for example).
  • Change implementation of serialization/deserialization code so that it can work in the restricted WebAssembly environment: no exceptions; no STL or Boost dependencies; and, reimplement critical STL features needed from type_traits and vector (better yet avoid using if possible and assuming that requires less development time than just reimplementing these features). This is also when I would probably switch over to the code generation approach rather than the approach I am currently using.
  • Design and build the convenient contract developer interface to the dynamic tables system.
  • Integrate my code into EOS.IO code base.
  • Add more builtin types (e.g. larger integers).
  • Conversion to/from JSON.
  • More tests
  • Much later: figure out how to support optional/variants and Any in the serialization system; also, add support for immutable_region (allows read-only access to object without a deserialization step) and mutable_region (allows limited in-place mutability to an object while avoiding deserialization and reserialization steps).

@arhag
Copy link
Contributor Author

arhag commented Sep 12, 2017

Today I cleaned up some of my existing code, found and corrected a few bugs, and added serialization/deserialization support for the String, Bytes, and Rational types. In the process I also figured out that it would be easy to build serialization/deserialization support for "general classes/structs to/from their backing builtin types of this new type system" so I went ahead and added that as well.

I also wrote another example program demonstrating these new features:

#include <eos/types/reflect.hpp>
#include <eos/types/serialization_region.hpp>
#include <eos/types/abi_constructor.hpp>
#include <eos/types/TypeConstructor.hpp>

#include <type_traits>
#include <iostream>
#include <iterator>
#include <string>

using std::vector;
using std::array;
using std::string;

struct eos_symbol;
struct currency_symbol;


template<typename NumberType, class TokenType>
class token
{
private:

   using enable_if_t = typename std::enable_if< std::is_integral<NumberType>::value && !std::is_same<NumberType, bool>::value >::type; 
      
   NumberType quantity;

public:

   token()
      : quantity()
   {}

   explicit token( NumberType v )
      : quantity(v)
   {}

   // This assignment operator and the conversion to NumberType below it allow this custom class to be used 
   // as if it is the primitive integer type NumberType by the serialization/deserialization system.
   token& operator=( NumberType v ) { quantity = v; return *this; }    
   explicit operator NumberType()const { return quantity; }

   token& operator-=( const token& a ) {
    assert( quantity >= a.quantity ); // "integer underflow subtracting token balance" );
    quantity -= a.quantity;
    return *this;
   }

   token& operator+=( const token& a ) {
    assert( quantity + a.quantity >= a.quantity ); //  "integer overflow adding token balance" );
    quantity += a.quantity;
    return *this;
   }

   inline friend token operator+( const token& a, const token& b ) {
    token result = a;
    result += b;
    return result;
   }

   inline friend token operator-( const token& a, const token& b ) {
    token result = a;
    result -= b;
    return result;
   }

   friend bool operator <= ( const token& a, const token& b ) { return a.quantity <= b.quantity; }
   friend bool operator <  ( const token& a, const token& b ) { return a.quantity <  b.quantity; }
   friend bool operator >= ( const token& a, const token& b ) { return a.quantity >= b.quantity; }
   friend bool operator >  ( const token& a, const token& b ) { return a.quantity >  b.quantity; }
   friend bool operator == ( const token& a, const token& b ) { return a.quantity == b.quantity; }
   friend bool operator != ( const token& a, const token& b ) { return a.quantity != b.quantity; }

   explicit operator bool()const { return quantity != 0; }
};

using eos_token      = token<uint64_t, eos_symbol>;
using currency_token = token<uint64_t, currency_symbol>; // Compilation error if you try to add a currency_token to an eos_token

EOS_TYPES_REFLECT_BUILTIN( eos_token, builtin_uint64 )
EOS_TYPES_REFLECT_BUILTIN( currency_token, builtin_uint64 )
// Both eos_token and currency_token will be treated as a uint64_t in the type system.

class time_point_sec
{
public:
   time_point_sec()
      :utc_seconds(0){}

   explicit time_point_sec(uint32_t seconds )
      :utc_seconds(seconds){}

   static time_point_sec maximum() { return time_point_sec(0xffffffff); }
   static time_point_sec min() { return time_point_sec(0); }

   uint32_t sec_since_epoch()const { return utc_seconds; }

   friend bool      operator < ( const time_point_sec& a, const time_point_sec& b )  { return a.utc_seconds < b.utc_seconds; }
   friend bool      operator > ( const time_point_sec& a, const time_point_sec& b )  { return a.utc_seconds > b.utc_seconds; }
   friend bool      operator <= ( const time_point_sec& a, const time_point_sec& b )  { return a.utc_seconds <= b.utc_seconds; }
   friend bool      operator >= ( const time_point_sec& a, const time_point_sec& b )  { return a.utc_seconds >= b.utc_seconds; }
   friend bool      operator == ( const time_point_sec& a, const time_point_sec& b ) { return a.utc_seconds == b.utc_seconds; }
   friend bool      operator != ( const time_point_sec& a, const time_point_sec& b ) { return a.utc_seconds != b.utc_seconds; }
   time_point_sec&  operator += ( uint32_t m ) { utc_seconds+=m; return *this; }
   time_point_sec&  operator -= ( uint32_t m ) { utc_seconds-=m; return *this; }
   time_point_sec   operator +( uint32_t offset )const { return time_point_sec(utc_seconds + offset); }
   time_point_sec   operator -( uint32_t offset )const { return time_point_sec(utc_seconds - offset); }

   friend class eos::types::serialization_region;
   friend class eos::types::deserialize_visitor;

private:
   uint32_t utc_seconds;

   // The assignment operator from and conversion to the integer type can also be private if desired as long as the appropriate classes are added as friends (see above).
   time_point_sec& operator=(uint32_t seconds) { utc_seconds = seconds; return *this; }
   explicit operator uint32_t()const { return utc_seconds; }
};

EOS_TYPES_REFLECT_BUILTIN( time_point_sec, builtin_uint32 )


struct order_id
{
   string name; // EOS.IO account names are actually represented with a uint64_t, but this is just to show off the String type in this new type system.
   uint32_t id;
};

EOS_TYPES_REFLECT_STRUCT( order_id, (name)(id), ((name, asc))((id, asc)) )

using eos::types::rational;

struct bid
{
   order_id       buyer;
   rational       price;
   eos_token      quantity;
   time_point_sec expiration;
};

EOS_TYPES_REFLECT_STRUCT( bid, (buyer)(price)(quantity)(expiration) )

struct ask
{
   order_id       seller;
   rational       price;
   currency_token quantity;
   time_point_sec expiration;
};

EOS_TYPES_REFLECT_STRUCT( ask, (seller)(price)(quantity)(expiration) )

struct order_id_key
{
   order_id     oid;
};

EOS_TYPES_REFLECT_STRUCT( order_id_key, (oid), ((oid, asc)) )

struct expiration_key
{
   time_point_sec expiration;
   order_id       oid;
};

EOS_TYPES_REFLECT_STRUCT( expiration_key, (expiration)(oid), ((expiration, asc))((oid, asc)) )

EOS_TYPES_CREATE_TABLE( bid,  
                        (( order_id_key,   u_asc,   ({0})    ))
                        (( rational,       nu_desc, ({1})    ))
                        (( expiration_key, u_asc,   ({3, 0}) ))
                      )

EOS_TYPES_CREATE_TABLE( ask,  
                        (( order_id_key,   u_asc,   ({0})    ))
                        (( rational,       nu_asc,  ({1})    ))
                        (( expiration_key, u_asc,   ({3, 0}) ))
                      )

EOS_TYPES_REGISTER_TYPES( (bid)(ask) )

int main()
{
   using namespace eos::types;
   using std::cout;
   using std::endl;


   auto ac = initialize_types();
  
   TypeConstructor tc(ac.get_abi());

   auto print_type_info = [&](type_id tid)
   {
      auto index = tid.get_type_index();
      cout << tc.get_struct_name(index) << ":" << endl;
      auto sa = tc.get_size_align_of_struct(index);

      tc.print_type(cout, tid);
      cout << "Size: " << sa.get_size() << endl;
      cout << "Alignment: " << (int) sa.get_align() << endl;

      cout << "Sorted members (in sort priority order):" << endl;
      for( auto f : tc.get_sorted_members(index) )
         cout << f << endl;
   };

   auto order_id_tid = type_id::make_struct(tc.get_struct_index<order_id>());
   print_type_info(order_id_tid);
   cout << endl;

   auto bid_tid = type_id::make_struct(tc.get_struct_index<bid>());
   print_type_info(bid_tid);
   cout << endl;

   auto ask_tid = type_id::make_struct(tc.get_struct_index<ask>());
   print_type_info(ask_tid);
   cout << endl;

   //auto tm = tc.destructively_extract_types_manager();
   auto tm = tc.copy_types_manager();
   serialization_region r(tm);

   auto tbl_indx1 = tm.get_table_index(tm.get_table("bid"), 0);
   auto tbl_indx2 = tm.get_table_index(tm.get_table("bid"), 1);
   auto tbl_indx3 = tm.get_table_index(tm.get_table("bid"), 2);

   time_point_sec exp_time(1506000000);

   ask a1 = { .seller = { .name = "Alice", .id = 0 },
              .price = { .numerator = 8 , .denominator = 5 },
              .quantity = currency_token(10),
              .expiration = exp_time
            }; 

   bid b1 = { .buyer = { .name = "Bob", .id = 0 }, 
              .price = { .numerator = 6 , .denominator = 7 },
              .quantity = eos_token(4),
              .expiration = exp_time
            }; 

   bid b2 = { .buyer = { .name = "Bob", .id = 1 }, 
              .price = { .numerator = 1 , .denominator = 1 },
              .quantity = (eos_token(7) - eos_token(4)), // Note that (eos_token(7) - currency_token(4)) would be a compilation error.
              .expiration = exp_time + 6
            }; 

   r.write_type( a1, ask_tid );
 
   cout << "Raw data of serialization of a1:" << std::hex << endl;
   for( auto b : r.get_raw_data() )
      cout << (int) b << " ";
   cout << std::dec << endl;

   raw_region a1_data = r.get_raw_region();  

   r.clear();   
   r.write_type( b1, bid_tid );
 
   cout << "Raw data of serialization of b1:" << std::hex << endl;
   for( auto b : r.get_raw_data() )
      cout << (int) b << " ";
   cout << std::dec << endl;

   raw_region b1_data = r.get_raw_region();  

   r.clear();   
   r.write_type( b2, bid_tid );
 
   cout << "Raw data of serialization of b2:" << std::hex << endl;
   for( auto b : r.get_raw_data() )
      cout << (int) b << " ";
   cout << std::dec << endl;
   cout << endl;

   raw_region b2_data = r.get_raw_region();  

   bid b2_copy;
   r.read_type( b2_copy, bid_tid );
   if( b2_copy.buyer.name != b2.buyer.name || b2_copy.quantity != b2.quantity || b2_copy.expiration != b2.expiration )
      cout << "Something went wrong with deserialization of b2." << endl;

   r.clear();
   r.write_type( rational(13, 14), type_id(type_id::builtin_rational) );

   cout << "Raw data of serialization of rat_key:" << std::hex << endl;
   for( auto b : r.get_raw_data() )
      cout << (int) b << " ";
   cout << std::dec << endl;

   raw_region rat_key = r.get_raw_region();


   auto c1 = tm.compare_object_with_key( b1_data, rat_key, tbl_indx2 );
   cout << "Comparing b1 by price with rat_key: " << (int) c1 << endl;
   auto c2 = tm.compare_object_with_key( b2_data, rat_key, tbl_indx2 );
   cout << "Comparing b2 by price with rat_key: " << (int) c2 << endl;
   cout << endl;

   auto c3 = tm.compare_objects( 0, b1_data, 1, b2_data, tbl_indx2 );
   cout << "b1 " << (c3 == 0 ? "is equal to" : (c3 < 0 ? "comes before" : "comes after")) 
        << " b2 by price." << endl;
   auto c4 = tm.compare_objects( 0, b1_data, 1, b2_data, tbl_indx1 );
   cout << "b1 " << (c4 == 0 ? "is equal to" : (c4 < 0 ? "comes before" : "comes after")) 
        << " b2 by order_id." << endl;
   auto c5 = tm.compare_objects( 0, b1_data, 1, b2_data, tbl_indx3 );
   cout << "b1 " << (c5 == 0 ? "is equal to" : (c5 < 0 ? "comes before" : "comes after")) 
        << " b2 by expiration." << endl;
   cout << endl;

   auto bid_table = tm.get_table("bid");
   for( auto i = 0; i < tm.get_num_indices_in_table(bid_table); ++i )
   {
      auto ti = tm.get_table_index(bid_table, i);
      
      auto key_type = ti.get_key_type();
      string key_type_name;
      if( key_type.get_type_class() == type_id::builtin_type )
         key_type_name = type_id::get_builtin_type_name(key_type.get_builtin_type());
      else
         key_type_name = tc.get_struct_name(key_type.get_type_index());

      cout << "Index " << (i+1)  << " of the 'bid' table is " << (ti.is_unique() ? "unique" : "non-unique")
           << " and sorted according to the key type '" << key_type_name
           << "' in " << (ti.is_ascending() ? "ascending" : "descending" ) << " order." << endl;
   }

   return 0;
}

That code produces the following output:

order_id:
struct order_id /* struct(4) */
{
    String f0; // Sorted in ascending order
    UInt32 f1; // Sorted in ascending order
};
Size: 16
Alignment: 8
Sorted members (in sort priority order):
String (sorted in ascending order) located at offset 0x0
UInt32 (sorted in ascending order) located at offset 0x8

bid:
struct bid /* struct(12) */
{
    T1 /*[T1 = struct(4)]*/ f0;
    Rational f1;
    UInt64 f2;
    UInt32 f3;
};
Size: 48
Alignment: 8
Sorted members (in sort priority order):

ask:
struct ask /* struct(16) */
{
    T1 /*[T1 = struct(4)]*/ f0;
    Rational f1;
    UInt64 f2;
    UInt32 f3;
};
Size: 48
Alignment: 8
Sorted members (in sort priority order):

Raw data of serialization of a1:
6 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 a 0 0 0 0 0 0 0 80 bc c3 59 0 0 0 0 41 6c 69 63 65 0
Raw data of serialization of b1:
4 0 0 0 30 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 80 bc c3 59 0 0 0 0 42 6f 62 0
Raw data of serialization of b2:
4 0 0 0 30 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 86 bc c3 59 0 0 0 0 42 6f 62 0

Raw data of serialization of rat_key:
d 0 0 0 0 0 0 0 e 0 0 0 0 0 0 0
Comparing b1 by price with rat_key: 1
Comparing b2 by price with rat_key: -1

b1 comes after b2 by price.
b1 comes before b2 by order_id.
b1 comes before b2 by expiration.

Index 1 of the 'bid' table is unique and sorted according to the key type 'order_id_key' in ascending order.
Index 2 of the 'bid' table is non-unique and sorted according to the key type 'Rational' in descending order.
Index 3 of the 'bid' table is unique and sorted according to the key type 'expiration_key' in ascending order.

Over the next day or two I should be getting a repo up with the current code.

@arhag
Copy link
Contributor Author

arhag commented Sep 14, 2017

I cleaned up the existing code and put it up on GitHub.

You can find the repository here: https://github.com/arhag/eos-dynamic-tables

My work on this issue will continue within that repository, so if you are interested in the details of the progress of that work, make sure to check out the commits there. However, I will still make comments here (though less frequently) summarizing larger changes and milestones achieved regarding this issue.

arhag added a commit to arhag/eos-dynamic-tables that referenced this issue Sep 20, 2017
* Modularized `intialize_types` to allow more than one subset of the types available to the program to be initialized into their own ABI.
  This was needed to bootstrap serializing the ABI itself, but also will be helpful in later unit tests.
* Changed deserializer for custom builtins to use conversion constructor (and copy/move assignment operator) rather than the custom
  assignment operator I was initially using to allow the same code to work with enumerations as well.
* Added support to reflect pairs/tuples which are just treated as structs (using mangled C++ type name)
  with anonymous fields (in ABI the fields are named f0, f1, ... ).
* With the above changes, I had the necessary tools to reflect `ABI`.
  Now the `ABI` struct is reflected with the same type system which it can describe, and so a serialized version of a contracts ABI can
  be included in a transaction (as a possible alternative to using `fc::pack` and `fc::unpack`).
  I also demonstrated constructing the `types_manager` from a serialized ABI by modifying the `reflection_test1` example.
arhag added a commit to arhag/eos-dynamic-tables that referenced this issue Sep 22, 2017
…s struct) (for EOSIO/eos#354):

 * Modified ABI to allow tuple types which do not require specifying a name
   for the type nor names for the "fields".
 * Updated types_constructor to support the new tuple type.
 * Updated reflection macros to still automatically work with std::pair and
   std::tuple but now without depending on implementation-specific C++ type names.
 * Found and corrected a small bug in `abi_constructor::add_variant`.
 * Tidying up.
arhag added a commit to arhag/eos-dynamic-tables that referenced this issue Sep 23, 2017
 * `types_constructor` now creates two different versions of types managers:
   - One is a minimal one, `types_manager`, which has just what will be needed by
     EOS.IO to compare objects and make dynamic tables work on the platform side.
   - The other is the full one, `full_types_manager`, which has enough information
     to do serialization/deserialization between raw data and C++ types, as well as
     serialization/deserialization between raw data and JSON (to be implemented later).
 * Both versions have a lot of common functionality, and this functionality is
   contained in the class `types_manager_common` which both versions inherit.
 * The C++ contract code would need a types manager version that has all struct fields
   (which `types_manager` does not) rather than just the sorted fields. Right now that
   is `full_types_manager` but it also includes extra information that are not needed
   for serialization/deserialization between raw data and C++ types and so it is
   information that the C++ contract code would presumably not need (such as field names).
   I am considering adding a third (middleground) version of types manager to be
   used exclusively by the C++ contract code. Maybe this third version will be very
   different in implementation than the other two (perhaps the source might even be
   automatically generated by a tool) in order to satisfy the stricter programming
   constraints imposed by the WebAssembly environment.
arhag added a commit to arhag/eos-dynamic-tables that referenced this issue Sep 26, 2017
  * Fixed bug where a missing static keyword led to undefined behavior
    which happen to result in all sorted fields being treated as if
    they were all specified in the ascending order.
  * Now `full_types_manager` has field names and sort order information.
  * This enables more informative printing of types, which was in fact
    implemented along with making the printed struct formatting look nicer.
  * While printing the structure of types is not super important, the above
    changes provide the framework necessary for conversion to/from JSON
    using just: the raw data, the `type_id` associated with that raw data,
    and the information available within the `full_types_manager` class.
@arhag
Copy link
Contributor Author

arhag commented Sep 26, 2017

I just want to remind any interested observers of this issue that you can look at the messages of commits in my eos-dynamic-table branch which cross-reference this issue to see the overall progress for this issue (for example, there are four such cross-referencing commits over the past week located above this comment). There are additional smaller commits that I do not always bother cross-referencing. If you are interested in those as well, check out the full commit log in my repo.

While my cross-referenced commit messages are pretty descriptive, it may sometimes be worth it to go over the current state of dynamic tables project in more detail every once in a while as a comment on this issue. The last time I did this was a little over two weeks ago, so it is probably a good time to do that now.

ABI

The ABI has been slightly changed since two weeks ago:

   struct ABI
   {
      enum type_specification : uint8_t
      {
         struct_type,
         tuple_type,
         array_type,
         vector_type,
         optional_type,
         variant_type
      };

      struct type_definition
      {
         uint32_t           first;
         int32_t            second;
         type_specification ts;
      };

      struct struct_t
      {
         string                        name;
         vector<pair<string, type_id>> fields;
         vector<int16_t>               sort_order;
      };

      struct table_index
      {
         type_id          key_type;
         bool             unique;
         bool             ascending;
         vector<uint16_t> mapping;
     };

      struct table
      {
         type_id::index_t    object_index;
         vector<table_index> indices;
      };


      vector<type_definition> types;
      vector<struct_t>        structs;
      vector<type_id>         type_sequences;
      vector<table>           tables;
   };

It now includes a tuple type, which is specified in a very similar way (within the ABI data structure) as the variant type. However, remember that the variant is a sum type, meaning an instance of it can only have one of the available case types at a time, whereas the tuple is a product type (similar to a struct), meaning any given instance of the tuple type (or struct type) involves all of the specified sequence of types (they make up the "fields" of the product type).

Prior to this change in the ABI, tuples/pairs were supported (and in fact the serialization/deserialization implementation for them was done 6 days ago in commit 8a095d0) but they had to be treated as structs. This meant giving the type a name and also giving its fields names (my implementation chose the names f0, f1, ... for the field names and it used the implementation-specific C++ name for the name of the tuple/pair). This was undesirable primarily because the redundant name information took up unnecessary space within the ABI serialization. So now the ABI treats tuples differently than structs, although the same code is used for both when figuring out where the fields should be laid out within the raw data for serialization, deserialization, and sorting purposes.

More features in serialization/deserialization tool

The C++ code which handles serialization of C++ types to raw data (compatible with the new EOS.IO type system) and deserialization of that raw data back to C++ types has been greatly improved over the past couple weeks.

Modularize initialize_types

First, the initialize_types function has been replaced by a modularized version so that it is possible to select (at compile time) different subsets of reflected types (or in fact repeat an identical subset as a separate ABI instance) and get back an ABI instance for that specific subset. This makes it easier to do unit testing, but it also makes it possible to separate out the reflection of the ABI struct and all the types it depends on (which by the way is now supported along with serializing/deserializng the ABI just like any other user-defined struct) from the reflection of the structs specific to a contract.

This feature is already being used in programs/reflection_test1.

One selected subset involves the eos::types::ABI struct and all the types it depends on. You can see this happening in this part of the code:

struct abi_reflection;
EOS_TYPES_REGISTER_TYPES( abi_reflection, BOOST_PP_SEQ_NIL, (eos::types::ABI) )

(Note: the BOOST_PP_SEQ_NIL is there to tell the macro that no tables are being reflected since there is no desire to reflect any tables within this subset.)

Another subset involves the user-defined tables (and all the user-defined types they depend whether as table objects, table index keys, or the structs/tuples that those objects and keys depend on) which are meant to be used as part of the specific example in programs/reflection_test1 as well as any additional structs that the user is also interested in serializing/deserializing (in this case type2). After all the structs (in this case there are three user-defined structs: type1, type2, and type3) have been defined and reflected, and any desired tables are created (in this case a table was created for type1), the subset can be selected as simply as:

struct reflection_test1_types;
EOS_TYPES_REGISTER_TYPES( reflection_test1_types, (type1), (type2) )

Notice that type3 does not need to be mentioned because the type1 table already depends on it since it acts as the key for table's second (user-defined) index (see the table definition). The struct type1 is also obviously needed by the table type1 (it acts as the table object) and so it is also included in the ABI for this reflection_test1_types subset. However, type2 is not reached from table type1, and so if the user is interested in serializing/deserializing type2, they must explicitly add it in the sequence passed as the third parameter to the macro. Now, the struct type1 would also be discovered from type2 (since type2 inherits from type1), but it will not make a difference since it was already discovered from table type1 (the user does not need to worry about duplication concerns).

To actually get the ABI for that subset at run-time, the following code is used:

   auto ac = types_initializer<reflection_test1_types>::init();
   auto abi = ac.get_abi(); // This returns the ABI struct

The ABI can be validated (to check for errors in the macro specifications) by passing the ABI as an argument to the constructor of the types_constructor class and then trying to extract the generated types managers:

   types_constructor tc(abi);
   auto types_managers = tc.destructively_extract_types_managers();

If those lines succeed without throwing exceptions, then the ABI is good. This is the same process that EOS.IO would follow to both validate the ABI and generate the types managers it needs to enable dynamic tables.

Serializing/deserializing the ABI

The ABI itself can be serialized/deserialized in accordance with this new type system which the ABI is supposed to describe. This is similar to how Cap'n Proto's schemas can describe the structure of Cap'n Proto schemas.

An ABI is first constructed describing itself using the same macros described above. You can find the reflection macros for the ABI structs here:

EOS_TYPES_REFLECT_BUILTIN( eos::types::ABI::type_specification, builtin_uint8 )
EOS_TYPES_REFLECT_STRUCT( eos::types::ABI::type_definition, (first)(second)(ts) )
EOS_TYPES_REFLECT_STRUCT( eos::types::ABI::struct_t, (name)(fields)(sort_order) )
EOS_TYPES_REFLECT_STRUCT( eos::types::ABI::table_index, (key_type)(unique)(ascending)(mapping) )
EOS_TYPES_REFLECT_STRUCT( eos::types::ABI::table, (object_index)(indices) )
EOS_TYPES_REFLECT_STRUCT( eos::types::ABI, (types)(structs)(type_sequences)(tables) )

Then that ABI struct is passed to types_constructor to generate the types managers which are then used with the serialization/deserialization functions to convert the C++ ABI struct to/from raw data.

An example of this process can also be seen in programs/reflection_test1 in which the ABI of the reflection_test1_types subset (described in the prior sub-section) is serialized to raw data, printed out, and then deserialized back to a struct to be passed to types_constructor. Here is the relevant snippet of code (see programs/reflection_test1.cpp for the full code):

//...
struct abi_reflection;
EOS_TYPES_REGISTER_TYPES( abi_reflection, BOOST_PP_SEQ_NIL, (eos::types::ABI) )

int main()
{
   using namespace eos::types;
   using std::cout;
   using std::endl;

   auto abi_ac = types_initializer<abi_reflection>::init();
   types_constructor abi_tc(abi_ac.get_abi());
   auto abi_types_managers = abi_tc.destructively_extract_types_managers();
   const auto& abi_tm = abi_types_managers.second;
   auto abi_tid = type_id::make_struct(abi_tm.get_struct_index<ABI>());
   serialization_region abi_serializer(abi_tm);

   vector<type_id::index_t> abi_structs = { abi_tid.get_type_index(),
                                            abi_tm.get_struct_index<ABI::type_definition>(),
                                            abi_tm.get_struct_index<ABI::struct_t>(),
                                            abi_tm.get_struct_index<ABI::table_index>(),
                                            abi_tm.get_struct_index<ABI::table>()
                                          };
   for( auto indx : abi_structs )
   {
      abi_tm.print_type(cout, type_id::make_struct(indx));
      cout << "Members:" << endl;
      for( auto f : abi_tm.get_all_members(indx) )
         cout << f << endl;
      cout << endl;
   }

   auto ac = types_initializer<reflection_test1_types>::init();
   abi_serializer.write_type(ac.get_abi(), abi_tid);

   cout << "Raw data of serialization of abi:" << endl;
   cout << abi_serializer.get_raw_region() << endl;

   ABI abi;
   abi_serializer.read_type(abi, abi_tid);

   types_constructor tc(abi);
   auto types_managers = tc.destructively_extract_types_managers();
   const auto& tm  = types_managers.first;
   const auto& ftm = types_managers.second;
   serialization_region r(ftm);
//...

And here is the relevant snippet of output:

struct eos::types::ABI
{
    Vector<eos::types::ABI::type_definition> types;
    Vector<eos::types::ABI::struct_t>        structs;
    Vector<UInt32>                           type_sequences;
    Vector<eos::types::ABI::table>           tables;
};
Members:
Vector<T> [T = struct(4)] located at offset 0x0
Vector<T> [T = struct(8)] located at offset 0x8
Vector<UInt32> located at offset 0x10
Vector<T> [T = struct(16)] located at offset 0x18

struct eos::types::ABI::type_definition
{
    UInt32 first;
    Int32  second;
    UInt8  ts;
};
Members:
UInt32 located at offset 0x0
Int32 located at offset 0x4
UInt8 located at offset 0x8

struct eos::types::ABI::struct_t
{
    String                       name;
    Vector<Pair<String, UInt32>> fields;
    Vector<Int16>                sort_order;
};
Members:
String located at offset 0x0
Vector<T> [T = struct(12)] located at offset 0x8
Vector<Int16> located at offset 0x10

struct eos::types::ABI::table_index
{
    UInt32         key_type;
    Bool           unique;
    Bool           ascending;
    Vector<UInt16> mapping;
};
Members:
UInt32 located at offset 0x8
Bool located at offset 0xc
Bool located at offset 0xd
Vector<UInt16> located at offset 0x0

struct eos::types::ABI::table
{
    UInt32                               object_index;
    Vector<eos::types::ABI::table_index> indices;
};
Members:
UInt32 located at offset 0x8
Vector<T> [T = struct(20)] located at offset 0x0

Raw data of serialization of abi:
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 03 00 00 00 20 00 00 00   03 00 00 00 44 00 00 00
0x01Z | 00 00 00 00 00 00 00 00   01 00 00 00 34 01 00 00
0x02Z | 00 00 00 00 ff ff ff ff   00 00 00 00 01 00 00 00
0x03Z | ff ff ff ff 00 00 00 00   02 00 00 00 01 00 00 00
0x04Z | 00 00 00 00 06 00 00 00   8c 00 00 00 02 00 00 00
0x05Z | 94 00 00 00 02 00 00 00   b0 00 00 00 06 00 00 00
0x06Z | b4 00 00 00 04 00 00 00   bc 00 00 00 02 00 00 00
0x07Z | f4 00 00 00 06 00 00 00   f8 00 00 00 03 00 00 00
0x08Z | 00 01 00 00 02 00 00 00   30 01 00 00 74 79 70 65
0x09Z | 33 00 00 00 02 00 00 00   ac 00 00 00 07 00 01 00
0x0aZ | 02 00 00 00 ae 00 00 00   05 00 01 00 78 00 79 00
0x0bZ | 01 00 02 00 74 79 70 65   31 00 00 00 02 00 00 00
0x0cZ | ec 00 00 00 05 00 01 00   02 00 00 00 ee 00 00 00
0x0dZ | 07 00 01 00 02 00 00 00   f0 00 00 00 0a 00 01 00
0x0eZ | 02 00 00 00 f2 00 00 00   01 00 00 02 61 00 62 00
0x0fZ | 63 00 76 00 03 00 fe ff   74 79 70 65 32 00 00 00
0x10Z | 02 00 00 00 24 01 00 00   02 00 01 00 04 00 00 00
0x11Z | 26 01 00 00 01 00 00 08   05 00 00 00 2a 01 00 00
0x12Z | 09 00 01 00 64 00 61 72   72 00 6e 61 6d 65 00 00
0x13Z | fe ff 00 00 02 00 00 00   40 01 00 00 01 00 00 00
0x14Z | 01 00 00 00 60 01 00 00   05 00 01 00 00 01 00 00
0x15Z | 02 00 00 00 62 01 00 00   00 00 00 04 01 00 00 00
0x16Z | 00 00 01 00 00 00

The ability to serialize/deserialize the ABI could be useful if it was desirable to include the ABI serialized with this new type system as part of the setcode message that changed the code of an account. This could be an alternative to fc::pack and fc::unpack that is currently used. Doing it that way means that future improvement to the tooling for this type system could allow processing the ABI without bothering to go through a deserialization step first. However, the constraints on the serialization for this type system (alignment requirements to allow efficient comparisons of type instances) means that the serialization is not going to be as efficient as what fc::pack does due to all the extra zeros (as can probably be seen from the example raw data above). A simple compression scheme that compresses many consecutive zeros (similar to Cap'n Proto) could help with the size, if that is a concern, at the expense of requiring the overhead of an extra (but still lightweight in computation) decompression stage at the beginning. But even then it could not compete with the small sizes that something like fc::pack can achieve.

Mapping custom C++ types to builtin types and automatically handling tuples/pairs

One thing you might have noticed in the previous sub-section is that the serialization/deserialization tool supported enums and pairs with very little effort. In fact the std::pair was handled automatically, while the type_specification enum (from ABI) required the macro reflection:

EOS_TYPES_REFLECT_BUILTIN( eos::types::ABI::type_specification, builtin_uint8 )

In fact, the reflection system has been built to already support std::pair and std::tuple (up to reasonable length tuples) and it can be easily tweaked to support custom C++ types that behave in similar ways (requires std::get<N>(...) to be supported on that type).

Also, the reflection allows the EOS_TYPES_REFLECT_BUILTIN macro to be used on any C++ enum/struct/class to map it to any compatible backing EOS.IO builtin type.

For an enum to be compatible with some specified EOS.IO builtin integer type, it needs to have an underlying C++ primitive integer type specified that has the specific EOS.IO builtin integer as its backing. So for example, the underlying type of eos::types::ABI::type_specification is uint8_t and so the macro can be used to map the enum to builtin_uint8. If the underlying type of some other enum is uint32_t, then it would instead be mapped to builtin_uint32 with the macro.

For a struct/class to be compatible with some specified EOS.IO integer builtin type (my intention is to have other non-integer types supported as a backing type for a user-defined struct/class but that is not yet implemented) it needs to have a constructor that takes that the corresponding C++ primitive integer type as its only argument (e.g. if mapping user1 class to builtin_int16, the class must have a constructor explicit user1(int16_t v); defined) and it needs to have a conversion function defined to that same primitive integer type (e.g. continuing the previous example the class user1 would also need to define explicit operator int16_t()const;).

A good example of this feature can be seen in programs/reflection_test2. Click the link to see the full code as I will not repeat it here. But within that example I create a token class (meant to provide type safe operations dealing with token values) which maps to a builtin_uint64 EOS.IO type and time_point_sec class (meant to represent timestamps) that maps to a builtin_uint32 EOS.IO type.

In that example program you can also see the implementation of the ask and bid tables of very simple exchange order book. One of the table indices of each table sorts on the rational price. Sorting of the rationals (see code here) works as expected: 6/7 is less than 1/1 (which wouldn't be the case with lexicographical ordering on the numerator and denominator pair).

Two different kinds of types managers

In the snippets of code in earlier sections you may have noticed that destructively_extract_types_managers of the class types_constructor now returns a pair of types managers. The first is types_manager which just has the information needed by the EOS.IO engine to implement dynamic tables for the WebAssembly contracts (but not other reflection information needed to convert to/from JSON at run-time). The second is full_types_manager which has all the information types_manager has and then some. In particular it has all the information needed to convert the EOS.IO types to/from JSON at run-time (which includes things like field names).

While the JSON conversion has not been implemented yet, the framework for it has been implemented (i.e. all the information needed to do the conversion are appropriately constructed in the data structures within full_types_manager). Furthermore, a demonstration of how to access this information from the internal data structure of full_types_manager is provided by the code that pretty prints the structure of EOS.IO struct types. You have already seen examples of this pretty printing in the output snippet included in the "Serializing/deserializing the ABI" sub-section. Notice that unlike before it now also prints the reflected field names (as well as sort order priority information, but the example of that can be seen in the program output provided in the last section of this document).

The full_types_manager class also has the necessary information for doing serialization/deserialization while types_manager does not (this is why full_types_manager rather than types_manager is passed into the serialization_region constructor in the various example programs). This is because types_manager only stores the sorted fields of a struct whereas full_types_manager stores all the fields.

The idea behind the separation is that a block producing node (or some full nodes) which does not care about providing a convenient API to inspect the data within tables to users (or provide JSON forms of the data to web services using the node as a backend) can get away with just storing types_manager and discarding full_types_manager. Those nodes end up saving space because they do not need to store unnecessary data like field names. However a full node which wants to provide all these convenient services will need access to full_types_manager to do things such as JSON conversion or to reflect the schema of the table for things like block explorers.

The full_types_manager could in theory also be used by the contract to do the serialization and deserialization (assuming I can get it to run in the restricted WebAssembly environment) and that is in fact what I am currently doing in these example programs that are currently not running in a WebAssembly environment. However, I am considering a third kind of types manager that is constructed from full_types_manager which perhaps stores less information (field names wouldn't really be needed for serialization/deserialization) but more importantly is implemented in a manner that is better suited for the restricted WebAssembly environment. In this case the implementation code would actually be generated by another tool that uses the full_types_manager instance (constructed in the usual way that I currently do in the example programs) to write out code into a generated header file which is then included by the contract C++ source code. But these are just my current thoughts on the matter and I have not yet fully decided how to best go about working around this issue of a restricted WebAssembly environment.

Boost.MultiIndex using the new type system

About a week ago (see commit 8f2ff41) I added the functionality to integrate this new type system into Boost.MultiIndex using comparison functors (as was discussed in my original post at the top of this issue). I also added a new example program programs/table_test1 the demonstrate this functionality. I won't repeat the entire source code here (click the link to see the entire code) but I will add a few relevant snippets and also show the entire program output.

The code is focused around two structs (type1 and type2) and one table type1 which obviously has struct type1 as the table object but also uses struct type2 as the key for its second user-defined table index. The fields b and a of type1 are mapped to the field x and y (respectively) of type2:

struct type1
{
   uint32_t a;
   uint64_t b;
   vector<uint8_t> c;
};

struct type2
{
   uint64_t x;
   uint32_t y;
};

EOS_TYPES_REFLECT_STRUCT( type1, (a)(b)(c), ((c, asc))((a, desc)) )

EOS_TYPES_REFLECT_STRUCT( type2, (x)(y), ((x, asc))((y, asc)) )

EOS_TYPES_CREATE_TABLE( type1,  
                        (( uint32_t,        nu_asc, ({0})   ))
                        (( type2,           u_desc, ({1,0}) )) 
                        (( vector<uint8_t>, nu_asc, ({2})   )) 
                      )

struct table_test1_types;
EOS_TYPES_REGISTER_TYPES( table_test1_types, (type1) )

You might also notice that the dynamic table system is flexible enough to allow field c (which is a vector of bytes) of type1 to be used as the key for the third table index. In this case, that index is a non-unique index sorted in ascending order (which means ascending lexicographical order in this case).

After all the typical set up (creating ABI, using it to create the type managers and initialize the serialization_region denoted r, and printing out the structure of the two struct types) we get to the interesting part of the example program:

   dynamic_table_3 table_type1(make_dynamic_table_ctor_args_list<3>(tm, tm.get_table("type1")));

   cout << "Size of table 'type1': " << table_type1.size() << endl << endl;

   type1 s1{ .a = 8, .b = 4, .c = {1, 2, 3, 4, 5, 6} };
   dynamic_object s1_obj{.id = 0};
   r.write_type(s1, type1_tid);
   print_raw_data(r.get_raw_region(), "s1");
   s1_obj.data = r.move_raw_region();

   cout << "Inserting object s1 into table 'type1'... "; 
   auto res1 = table_type1.insert(s1_obj);
   cout << (res1.second ? "Success." : "Failed.") << endl;

   cout << "Size of table 'type1': " << table_type1.size() << endl << endl;

   type1 s2{ .a = 8, .b = 10, .c = {13} }; // Insertion into table would fail (i.e. res2.second == false) if b was 4 because of uniqueness violation of index 2 (with type2 key).
   dynamic_object s2_obj{.id = 1};
   r.write_type(s2, type1_tid);
   print_raw_data(r.get_raw_region(), "s2");
   s2_obj.data = r.move_raw_region();

   cout << "Inserting object s2 into table 'type1'... "; 
   auto res2 = table_type1.insert(s2_obj);
   cout << (res2.second ? "Success." : "Failed.") << endl;

   cout << "Size of table 'type1': " << table_type1.size() << endl << endl;

   type1 s3{ .a = 3, .b = 4, .c = {13, 14} };
   dynamic_object s3_obj{.id = 2};
   r.write_type(s3, type1_tid);
   print_raw_data(r.get_raw_region(), "s3");
   s3_obj.data = r.move_raw_region();

   cout << "Inserting object s3 into table 'type1'... "; 
   auto res3 = table_type1.insert(s3_obj);
   cout << (res3.second ? "Success." : "Failed.") << endl;

   cout << "Size of table 'type1': " << table_type1.size() << endl << endl;

In the above code snippet, a new (empty) Boost.MultiIndex instance is created in the first line to represent the table type1. Then the instance s1 (of struct type1) is created, serialized, and inserted into the table. Then this is repeated with a second instance s2 and then finally a third instance s3.

After the insertions, the program then displays the objects in the table sorted in the order of its three different table indices:

   cout << "Table 'type1' objects sorted by index 1 (field 'a' of type1 as the key sorted in ascending order):" << endl;
   for( const auto& obj : table_type1.get<1>() )
   {
      cout << "Object id = " << obj.id << " and data = " << endl << obj.data << endl;
   }   
   cout << endl;

   cout << "Table 'type1' objects sorted by index 2 (type2, mapped to fields 'b' and 'a' of type1, as the key sorted by type2's sorting specification except in descending order):" << endl;
   for( const auto& obj : table_type1.get<2>() )
   {
      cout << "Object id = " << obj.id << " and data = " << endl << obj.data << endl;
   }   
   cout << endl;

   cout << "Table 'type1' objects sorted by index 3 (field 'c' of type1 as the key sorted in ascending lexicographical order):" << endl;
   for( const auto& obj : table_type1.get<3>() )
   {
      cout << "Object id = " << obj.id << " and data = " << endl << obj.data << endl;
   }   
   cout << endl;

Finally the last bit of code in the example program is to demonstrate how lookups work:

   dynamic_key_compare dkc1(tm.get_table_index(tm.get_table("type1"), 0)); // Comparison functor for lookups in index 1 of table 'type1'

   uint32_t v = 3;
   r.write_type(v, type_id(type_id::builtin_uint32));
   dynamic_key k1(r.get_raw_region());

   auto itr = table_type1.get<1>().lower_bound(k1, dkc1);
   if( itr != table_type1.get<1>().end() )
   {
      cout << "Found lower bound using index 1 of table 'type1' and a key of value " << v << ":" << endl;
      cout << "Object id = " << itr->id << " and data = " << endl << itr->data << endl;
   }

   itr = table_type1.get<1>().upper_bound(k1, dkc1);
   if( itr != table_type1.get<1>().end() )
   {
      cout << "Found upper bound using index 1 of table 'type1' and a key of value " << v << ":" << endl;
      cout << "Object id = " << itr->id << " and data = " << endl << itr->data << endl;
   }

Here is the full output of programs/table_test:

type1 (index = 4):
struct type1
{
    UInt32 a; // Sorted in descending order (priority 2)
    UInt64 b;
    Bytes  c; // Sorted in ascending order (priority 1)
};
Size: 24
Alignment: 8
Sorted members (in sort priority order):
Bytes (sorted in ascending order) located at offset 0x8
UInt32 (sorted in descending order) located at offset 0x10

type2 (index = 0):
struct type2
{
    UInt64 x; // Sorted in ascending order (priority 1)
    UInt32 y; // Sorted in ascending order (priority 2)
};
Size: 16
Alignment: 8
Sorted members (in sort priority order):
UInt64 (sorted in ascending order) located at offset 0x0
UInt32 (sorted in ascending order) located at offset 0x8

Size of table 'type1': 0

Raw data of serialization of s1:
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   06 00 00 00 18 00 00 00
0x01Z | 08 00 00 00 00 00 00 00   01 02 03 04 05 06

Inserting object s1 into table 'type1'... Success.
Size of table 'type1': 1

Raw data of serialization of s2:
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 0a 00 00 00 00 00 00 00   01 00 00 00 18 00 00 00
0x01Z | 08 00 00 00 00 00 00 00   0d

Inserting object s2 into table 'type1'... Success.
Size of table 'type1': 2

Raw data of serialization of s3:
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   02 00 00 00 18 00 00 00
0x01Z | 03 00 00 00 00 00 00 00   0d 0e

Inserting object s3 into table 'type1'... Success.
Size of table 'type1': 3

Table 'type1' objects sorted by index 1 (field 'a' of type1 as the key sorted in ascending order):
Object id = 2 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   02 00 00 00 18 00 00 00
0x01Z | 03 00 00 00 00 00 00 00   0d 0e

Object id = 0 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   06 00 00 00 18 00 00 00
0x01Z | 08 00 00 00 00 00 00 00   01 02 03 04 05 06

Object id = 1 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 0a 00 00 00 00 00 00 00   01 00 00 00 18 00 00 00
0x01Z | 08 00 00 00 00 00 00 00   0d


Table 'type1' objects sorted by index 2 (type2, mapped to fields 'b' and 'a' of type1, as the key sorted by type2's sorting specification except in descending order):
Object id = 1 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 0a 00 00 00 00 00 00 00   01 00 00 00 18 00 00 00
0x01Z | 08 00 00 00 00 00 00 00   0d

Object id = 0 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   06 00 00 00 18 00 00 00
0x01Z | 08 00 00 00 00 00 00 00   01 02 03 04 05 06

Object id = 2 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   02 00 00 00 18 00 00 00
0x01Z | 03 00 00 00 00 00 00 00   0d 0e

Table 'type1' objects sorted by index 3 (field 'c' of type1 as the key sorted in ascending lexicographical order):
Object id = 0 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   06 00 00 00 18 00 00 00
0x01Z | 08 00 00 00 00 00 00 00   01 02 03 04 05 06

Object id = 1 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 0a 00 00 00 00 00 00 00   01 00 00 00 18 00 00 00
0x01Z | 08 00 00 00 00 00 00 00   0d

Object id = 2 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   02 00 00 00 18 00 00 00
0x01Z | 03 00 00 00 00 00 00 00   0d 0e

Found lower bound using index 1 of table 'type1' and a key of value 3:
Object id = 2 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   02 00 00 00 18 00 00 00
0x01Z | 03 00 00 00 00 00 00 00   0d 0e

Found upper bound using index 1 of table 'type1' and a key of value 3:
Object id = 0 and data =
    Z = 0  1  2  3  4  5  6  7    8  9  a  b  c  d  e  f
0xYYZ |--------------------------------------------------
0x00Z | 04 00 00 00 00 00 00 00   06 00 00 00 18 00 00 00
0x01Z | 08 00 00 00 00 00 00 00   01 02 03 04 05 06

arhag added a commit to arhag/eos-dynamic-tables that referenced this issue Sep 28, 2017
arhag added a commit to arhag/eos-dynamic-tables that referenced this issue Oct 2, 2017
… standard lib to work in restricted C++ environment (no exceptions, no standard lib) in preparation for later getting the serialization/deserialization code to run within WebAssembly (for EOSIO/eos#354)
@minimizeTrustLess
Copy link

@arhag @bytemaster @thomasbcox Pretty cool idea. Can you wrap EOS with python and use standard lib numpy/pandas/vector furthermore generate/store these data type using some sort of serialization?

@andriantolie
Copy link
Contributor

@arhag is this done and the issue can be closed?

@brianjohnson5972
Copy link
Contributor

@arhag , @wanderingbort, or @heifner My plan is to close this, since if it has any validity still, that small portion should be removed and captured in a new ticket since this is a very large writeup. Please indicate if you disagree.

@brianjohnson5972
Copy link
Contributor

Code has gone through multiple refactorings since this was written, so it is OBE

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants