Skip to content
This repository has been archived by the owner on Jul 6, 2023. It is now read-only.

[OEP 18] Community Requirements For 3.1 #19

Open
7 of 11 tasks
tglman opened this issue Mar 2, 2018 · 3 comments
Open
7 of 11 tasks

[OEP 18] Community Requirements For 3.1 #19

tglman opened this issue Mar 2, 2018 · 3 comments

Comments

@tglman
Copy link
Member

tglman commented Mar 2, 2018

Summary:
List and discussion of the requirements of the 3.1 from the cummunity.

Goals:
Collect and prioritize the requirements for 3.1

Non-Goals:

Success metrics:
Shared list of 3.1 requirements

Motivation:

Description:
As today from the various discussion are emerged a not yet sorted list of requirements:

  • Persistent Result Set for order by and group by queries
  • Index optimization in term of page size and serialization size
  • Smaller storage pages (8k,16k)
  • evaluation of new tool for distributed network layer
  • query engine optimizations in term of algorithm and technical(raw byte access)
  • Coordination of distributed structural operations (db create/drop ecc)
  • Improve support for third party drivers
  • persistence of transaction (some discussion where already keeping this out)
  • Indexing of embedded properties
  • Better support for locale in indexes
  • Evaluate if define a new hooks/triggers API

Alternatives:

Risks and assumptions:

Impact matrix

  • Storage engine
  • SQL
  • Protocols
  • Indexes
  • Console
  • Java API
  • Geospatial
  • Lucene
  • Security
  • Hooks
  • EE
@andrii0lomakin
Copy link
Member

andrii0lomakin commented Mar 12, 2018

Hi guys.
As usual, I do not think that is requirements for 3.1 IMHO better to say that is just discussion of priority of issues some will be implemented in time for 3.1 some will be moved in another version. Otherwise, that just will be another forever running release. So about priorities. On a high level, I want to:

  1. Migrate to the key level, record level locks on transaction level.
  2. Use page locking instead of component locking at least for the most popular index, B-Tree index.
  3. Decrease the WAL write overhead and as result increase write speed of other components.
  4. Implement fully durability mode in our transactions.
  5. Make our storage engine more lightweight in general.

That is high level, not sure that all will be done at 3.1.

  1. Let's look what can be done in short-term in with big impact. Currently, I am working on immutable WAL. And this change must have to implement fully durability mode. The problem is following. When we write new records from the WAL to disk after the flush we if the page is not fully written, we read this page and write additional data in this page. This breaks the main invariant of our durability framework, if data is written to the page they have to be in WAL. Surely changing the strategy of writes in WAL will change both calculations of LSN and strategy of writing the data to the disk. The interesting side effect is that new WAL is mostly based on CAS operations and as result should be more scalable.
  2. The next thing which is really fast to implement because of we already have it implemented with some modifications is lock-free read cache for our disk data. We have a separate issue [OEP 8] Asynchronous read cache with state machine #8 for that. It will take about two weeks and change will automatically include support of small pages inside of the read cache.
  3. Two issues above are very quick to implement. Next feature is longer but it unlocks big possibilities. Physiological logging. It allows to: a) decrease WAL overhead. b) make storage engine much more lightweight by removing tracking of page changes. c) Make requirements for the locks are needed to implement durability weaker. d) implement full durability mode with overhead comparable with RDBMs overhead.
  4. Once physiological logging will be implemented next logical step is migrate to key/record level locking on a transactional level.
  5. Next step is the implementation of full durability mode.
  6. And final step which I see is the implementation of B-Link tree index which allows implementing range indexes using page locks instead of component locks.
  7. Yes and of course also issue with small pages can be solved just after implementation of lock-free read cache.

I suppose for a list of short-term and not so short-term tasks.

@andrii0lomakin
Copy link
Member

I forgot to add also the implementation of support of files which are not covered by WAL and temporary files. But because currently a lot of emphases is done on testing of new components to keep storage engine very stable, I suppose it can be done in gaps when tests for other features are running. And small note. Even if performance before and after physiological logging change will be the same, I will consider this as a win because it will unlock us the possibility to increase system scalability. But I have significant doubts that performance will be the same.

@andrii0lomakin
Copy link
Member

Ok, after discussion I suppose the shortest list for 3.1 issues #8 and #9 I will change their versions accordingly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants