-
Notifications
You must be signed in to change notification settings - Fork 3
[OEP 12] Go from "Clusters" to "Shards" #12
Comments
+1 |
+1 However, you point to orientdb/issues/6608 (which I created) as a mixup of the 2 cluster word usages, but all the times the word is used there it has the semantics of shard. |
Hi all +1 to the general idea, the word What we are discussing here is the naming for
My 2 cents Luigi |
You did a great job describing the problem despite the terms. I just added it because I found the whole issue a good example of the terms being mixed in a problem that should actually only concern one or the other, nothing more. Thanks for your insights. I see what you mean. "Data Segment" sounds good, but isn't the most elegant solution. Maybe just "segment"? According to the sharding docs, for a distributed database, the clusters seem to be used just like shards. Is this true? Are there plans to use the current cluster concept and sharding differently, as in sharding as you explained it could be? On the other hand, I personally don't think trying to relate the physical form of a cluster (i.e. files) to the logical form within ODB is necessary. It really doesn't matter how a cluster/ data segment is stored physically to understand ODBs logical architectural structure. The only other thing I can think of is "chunk", like MongoDB has, which is a smaller container under a shard. But, that sounds even less elegant and its used by MongoDB. LOL! 😄 How about just simply "partition"? Scott |
+1 for I agree with you that relating the name to the implementation is not a good idea (btw, I think in the future we will have to re-think the 1-1 association of file-segment) Thanks Luigi |
+1 for |
Yeah, I am warming up to
Not too shabby. But this is the place in the docs that has always bothered me the most.
Hmmmmm......? Reading that, the I feel the term should be shard, unless the sharding within ODB is going to change. If that is the case, then this chapter would have to be rewritten anyway. Let me try and rewrite that same part with shard as the term.
This sentence makes no sense.
That would need to be reworked. I also have no idea what it means in the original form. 😄 Scott |
Summary:
From the start of ODB, the feature for the logical single unit for separation and location of data was called a "cluster". This was, unfortunately, an incorrect term for this concept. If, in fact, the analogy was taken from a disk operating system, where there are clusters, then the better term should have actually been "sector". But, this term would make even less sense, when discussing a database technology.
Within the distributed database concepts of today, a cluster is considered the entire distribution of the database.
https://www.techopedia.com/definition/17/clustering-databases
The proper term for the unit of data storage and location in a database scenario is a "shard".
Here the definition:
https://en.wikipedia.org/wiki/Shard_(database_architecture)
Goals:
Throughout the code and documentation, replace the term "cluster" with the term "shard", when referring to a single grouping of data and its location.
In the end, this change will raise the level of professionalism for OrientDB dramatically, because the docs and code logic will conform to the standard terms used for distributed database technology.
Non-Goals:
None currently.
Success metrics:
None currently.
Motivation:
In order to be much more conform to today's terminology for distributed database technology, the term "cluster", as in "clusters make up a class", should be exchanged with the term "shard". "shards make up a class". This will enable new DBAs and programmers to find themselves faster within the ODB architectural landscape. It will also enable discussions around database data separation and locality and server architecture to be less ambiguous or confusing.
In the docs, the mixup is evident, when the discussion is wrapped around clusters, as single units of data groupings and a clusters of servers doing sharding.
Example:
The mention of
(clusters)
shouldn't be necessary.See also the discussion with @luigidellaquila.
https://groups.google.com/forum/#!topic/orient-database/2_iTzne1eXo
Here a good example of the mixup in terms: orientechnologies/orientdb#6608
Description:
See above.
Alternatives:
None.
Risks and assumptions:
Here is an example of ODB code with the term Cluster replaced with Shard: https://gist.github.com/smolinari/609335f498c456bd97c3ff86ad6136db
It took me all of 1 minute to replace the terms in this single file. I am not able to judge if the changes make sense directly, but it seems they would be ok.
Doing the same with the docs would be just as easy, however, the changes in the docs would need to be examined a bit more in depth to be deemed correct from a comprehension standpoint.
There are already two people willing to work on the docs, including myself.
Impact matrix
Note: I added two points to the matrix below.
The text was updated successfully, but these errors were encountered: