Generate nodes2ways DB #274

wangyoucao577 · 2020-04-10T07:59:15Z

Issue

Closes #273

Create new cmd tool nodes2way-builder to generate the db from wayid2nodeids.csv or wayid2nodeids.csv.snappy.
- Its functionality works well but has poor performance, which will be improved in Improve Generate nodes2way db performance #272 later.
Create new cmd tool nodes2way-cli to able to query ways from nodes, and able to measure the querying performance. Expose wayID in osrm route response by post processing #257 (comment) shows excellcent querying performance.
- it uses wayidsflag package which will be refactored in Refactor wayidsflag package #275 later

…insert into db

…display in stdout/stderr by default

CodeBear801 · 2020-04-10T14:47:34Z

integration/util/waysnodes/nodes2wayblotdb/db.go

+
+var (
+	errEmptyDB     = errors.New("empty db")
+	errKeyNotFound = errors.New("key not found")


This is a good behavior, its clear to define all errors for a component/interface in a organized way.

Here only defines common errors instead of all errors: will be used more than once. I saw many packages define some exported errors at the beginning, I think that's good since callers can handle them easer.

CodeBear801 · 2020-04-10T15:07:47Z

integration/util/waysnodes/nodes2wayblotdb/db.go

+}
+
+const (
+	defaultBucket = "bucket"


Shall we give bucket a specific name, like node2wayids_bucket? Will it possible that we have same db but would have different buckets for queries?

Maybe it's good to have a named bucket even we may only has one. I'll rename it.

CodeBear801 · 2020-04-10T15:23:30Z

integration/util/waysnodes/nodes2wayblotdb/db.go

+	}
+
+	// to improve write performance, but will manually sync before close
+	db.db.NoSync = true


I assume NoSync could improve performance a lot in our case, do you have a draft value?

I think the benefit of NoSync is single commit, means no db.log in the middle and let OS to arrange write throughput. By the way, I think bboltdb will open bucket file by mmap, even set to NoSync, but physical memory usage should not as large as total data we plan to write?

But it says that NoSync is dangerous, might corrupt db's metadata. Is it only suitable for following cases:

Our case here, single bucket, and after pre-processing, all data has been written into db, even if anything wrong, just rebuild. If there is possibility that we make a common module of db with multiple bucket, put NoSync as parameter might be better.

Bulk load all data from bucket

The comments for NoSync explains clearly(https://pkg.go.dev/go.etcd.io/bbolt?tab=doc#DB:):

// Setting the NoSync flag will cause the database to skip fsync() // calls after each commit. This can be useful when bulk loading data // into a database and you can restart the bulk load in the event of // a system failure or database corruption.

I think it's very smililar with the fwrite(): the default behavior is let OS decide when to sync to disk for performance, and users have to manually fsync() if want to make sure all contents are committed.(Maybe some implementation difference that I haven't figured out yet.)
I haven't test the performance yet. Currently the build db process is very slow. I'll profile to improve the performance later.

CodeBear801 · 2020-04-10T15:36:24Z

integration/cmd/nodes2way-builder/store.go

+		if err := db.Write(wayNodes.WayID, wayNodes.NodeIDs); err != nil {
+			if glog.V(3) { // avoid affect performance by verbose log
+				glog.Infof("Update %+v into db failed, err: %v", wayNodes, err)
+			}


Just confirm: glog.V(3).Infof should be able to achieve the same goal. I saw the document mentioned both.

// if glog.V(2) { // glog.Info("Starting transaction...") // } // // glog.V(2).Infoln("Processed", nItems, "elements")

Functionally they're the same. The difference is that the first one is cheaper if logging is off because it does not evaluate its arguments, and the seconder one is shorter. See more in https://godoc.org/github.com/golang/glog#V.

wangyoucao577 added 17 commits April 9, 2020 02:21

feat: interface declaration of waysnodes

9bf6b0b

feat: add writer interface

ebd9e7d

feat: implement nodes2way stores in boltdb

b9d6a6d

feat: implement query way/ways

1a6cbb4

test: db ut

3943cda

chore: dependency for boltdb

ff8c033

chore: fix dependency for boltdb

f27deba

feat: way2nodes csv utilities

dd20e52

feat: add new nodes2way-db-builder to extract way-nodes from csv and …

4aad342

…insert into db

feat: manually sync db to improve write performance

34c9939

chore: ignore new binary and vscode config for source code management

c21d529

fix: improve error info

cedcc30

feat: cli to query ways from nodes

f10aaf2

refactor: rename cmd

e01781d

feat: measure querying time cost, only output by glog which will not …

e91fe38

…display in stdout/stderr by default

feat: print db statitics

04da7af

feat: more specified error message

6e87f77

wangyoucao577 mentioned this pull request Apr 10, 2020

Refactor wayidsflag package #275

Closed

docs: add two cli tools

cc2d5bc

wangyoucao577 requested review from CodeBear801 and hellotechx April 10, 2020 08:35

wangyoucao577 self-assigned this Apr 10, 2020

wangyoucao577 added NewFeature New feature or feature improvement Prototype Proof of concept labels Apr 10, 2020

wangyoucao577 added this to the Sprint 61(Dev:04/21) milestone Apr 10, 2020

CodeBear801 reviewed Apr 10, 2020

View reviewed changes

CodeBear801 approved these changes Apr 10, 2020

View reviewed changes

refactor: rename bucket name

d53ba82

wangyoucao577 merged commit 8ffd446 into master Apr 13, 2020

wangyoucao577 deleted the feature/convert-nodes2ways branch April 13, 2020 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate nodes2ways DB #274

Generate nodes2ways DB #274

wangyoucao577 commented Apr 10, 2020 •

edited

Loading

CodeBear801 Apr 10, 2020

wangyoucao577 Apr 13, 2020

CodeBear801 Apr 10, 2020

wangyoucao577 Apr 13, 2020

CodeBear801 Apr 10, 2020 •

edited

Loading

wangyoucao577 Apr 13, 2020

CodeBear801 Apr 10, 2020

wangyoucao577 Apr 13, 2020

Generate nodes2ways DB #274

Generate nodes2ways DB #274

Conversation

wangyoucao577 commented Apr 10, 2020 • edited Loading

Issue

CodeBear801 Apr 10, 2020

Choose a reason for hiding this comment

wangyoucao577 Apr 13, 2020

Choose a reason for hiding this comment

CodeBear801 Apr 10, 2020

Choose a reason for hiding this comment

wangyoucao577 Apr 13, 2020

Choose a reason for hiding this comment

CodeBear801 Apr 10, 2020 • edited Loading

Choose a reason for hiding this comment

wangyoucao577 Apr 13, 2020

Choose a reason for hiding this comment

CodeBear801 Apr 10, 2020

Choose a reason for hiding this comment

wangyoucao577 Apr 13, 2020

Choose a reason for hiding this comment

wangyoucao577 commented Apr 10, 2020 •

edited

Loading

CodeBear801 Apr 10, 2020 •

edited

Loading