-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Altering cluster configuration leads to orientdb node deadlock #8298
Comments
I managed to reproduce the scenario, one of node was blocked in the same manner as described above and this is the corresponding logs. IMPORTANT : I removed the configuration used for the cluster assignment. Didn't helped much. blocked node
node sending chunks into the void
Note that the last chunk written by the starting node was the chunk #156 whereas the transferring process went crazy for the other node at #158 . It seems there is an offset between the two nodes, it may be hint. I hope. Also, i tried to remove everything related to this database in order to cleanup the system, couldn't manage to do it either from the system or in manual. I don't know where you hid them, but there are still temporary files which forces OrientDB to try recover the database i want to delete. I've checked the directory where orientdb is installed, checked /var/run, /var/lib, /tmp/orientdb, i'm running out of ideas.. Best Regards, Cyprien Gottstein. |
Well, i've found what was going wrong. I wanted to start from scratch, without any database already there so i hid the old database inside a repository, the idea was "OrientDB won't do a recursive look up into the file tree of the database repository" and damn was i wrong. I had the following configuration $ORIENTDB_HOME/databases/backup/my/database.ocf... OrientDB was still able to detect the db even hidden. Unfortunately, it could detect it but not truly access the data, and thus was endlessly running into errors until i finally had an hint. I tried to change (just for my curiosity), the name of the repository storing the db and the name changed also into the log, that gave me the hint about the db being still detected. I still have other issues, and i've yet to make clusters work, but a part of the problem has been solved at least. |
A deadlock while syncing the database was fixed in 3.0.2, that is going to be relased soon, feel free to update and retry on that release. Regards |
OrientDB Version: 2.2.30
Java Version: 1.8.0_171
OS: Ubuntu 16.04
Hi,
I am facing an issue where my OrientDB cluster fails to properly synchronize and never ever gets up. I created a whole bunch of class to prepare for my application to run, OrientDB build automatically 8 clusters for those classes as each of the nodes of my cluster have 8 cores.
Everything is fine at this point, then i added some configuration into the default-distributed-db-config.json in order to reduce the clusters replication because i wanted to scale up on writes, the file looks like this :
To make sure the new clusters configuration is used by all of the nodes, i shut off the orientdb service on each of the instances of the cluster, removed the file distributed-config.json (again on each of the instances of the cluster) and the rebooted orientdb.
The file distributed-config.json has been regenerated and looks like so (Mind that its incomplete, i can't show the whole file as it reveals sensitive information regarding our data model) :
Two of the nodes of the cluster managed to come ONLINE but the last one is stuck transferring chunks... Instance2 is blocked at STARTING , endlessly receiving the same chunk from instance3.
Expected behavior
Instance2 moving from STARTING to ONLINE
Actual behavior
The node instance3 is blocked into transferring chunk.
log of instance3 :
log of instance2:
log of instance1:
And when i log to the database it shows me that :
CONFIGURED SERVERS
Steps to reproduce
Hard to reproduce for now, all i can say is that it kind of managed to make it work with an empty database, once the database had a minimum of data to work with, it crashed. We currently have around 7.5 millions records (vertex and edges) into this OrientDB database, i did the same on a tinier database with around 300k records, it also crashed.
I also tried before that with an other database to change a cluster name using the command
alter cluster cluster name
but it hanged forever, corrupted the dabatase and i had no choice but to purge it by hand from the disks.
The server was running "fine" before that, we already had problems but no concerns with the current issue. Everything is okay in our network, the ports are all open, otherwise we wouldn't have been able to make anything work at all. Its all was triggered because we changed something in the cluster configuration.
Now i have several questions :
Best Regards,
Cyprien Gottstein.
The text was updated successfully, but these errors were encountered: