-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
From Uber, can we chat about lessons learned? Issues? #2
Comments
Hi Alain. This fork has been used for work on updating Titan to support more recent versions of Elasticsearch, resolving issues running OLAP queries with the HBase backend and contributing to the PR to upgrade to TinkerPop 3.2.3. The master branch merges these contributions and also includes updates to limit test logging and to support Travis CI. Our development has been very necessity-driven. Beyond wanting to run Titan with a more recent version of Elasticsearch, we also wanted to try out what looked like significant improvements in OLAP with TinkerPop 3.2. With Titan-1.0/TinkerPop-3.0.1 we moved to Spark for full graph processing. We initially ran into issues with memory and timeout errors using SparkGraphComputer when graph sizes got into hundreds of million nodes/billions of edges. But TinkerPop development has been solid and many of the updates from TinkerPop-3.0.1 to TinkerPop-3.2.3 appeared to be OLAP improvements. In particular Spark was updated from 1.2.1 to 1.6.1. We suspected this alone might resolve some of our errors because of memory management improvements added in Spark 1.5/1.6. Since updating we have indeed seen significant performance improvments with SparkGraphComputer in our testing. It is worth noting that with SparkGraphComputer the whole graph does have to fit in the memory of your Spark cluster. We're running Spark on Mesos to better support scaling without a dedicated Spark cluster. Incidently running SparkGraphComputer through Mesos seems to be easier in terms of configuration than when running on Yarn. Where extra development work might be useful would be to better understand new capabilities added between TinkerPop 3.0.1 and 3.2.3 and where Titan could be updated to take full advantage of these to improve performance. Beyond this I'd look over existing issues/PRs and see if any would impact your chosen storage/indexing backend. Getting PRs submitted to resolve any outstanding issues would be a great service to the community. As you've probably seen there is effort underway by the community to get a fresh project started for more generally continuing the open source work of Titan. I can only assume that one of the first things they'd do would be to merge some of the existing contributions into the new project. If you haven't already you might reach out to @pluradj for more information on this effort and to get his perspective on where additional development is most needed. |
Hi there, I am Alain from Uber, Inc. and my team is standing up a knowledge graph based on Titan. Your team seems to be running one of the most updated Titan forks and we would love to hear about your experience with Titan and where you think extra development work would be needed.
You can reach me at [email protected]. I look forward to hearing from you!
The text was updated successfully, but these errors were encountered: