-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate nodes2ways DB #274
Conversation
…display in stdout/stderr by default
|
||
var ( | ||
errEmptyDB = errors.New("empty db") | ||
errKeyNotFound = errors.New("key not found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good behavior, its clear to define all errors for a component/interface in a organized way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here only defines common errors instead of all errors: will be used more than once. I saw many packages define some exported errors at the beginning, I think that's good since callers can handle them easer.
} | ||
|
||
const ( | ||
defaultBucket = "bucket" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we give bucket a specific name, like node2wayids_bucket
? Will it possible that we have same db but would have different buckets for queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's good to have a named bucket even we may only has one. I'll rename it.
} | ||
|
||
// to improve write performance, but will manually sync before close | ||
db.db.NoSync = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume NoSync
could improve performance a lot in our case, do you have a draft value?
I think the benefit of NoSync
is single commit
, means no db.log in the middle and let OS to arrange write throughput. By the way, I think bboltdb will open bucket file by mmap
, even set to NoSync
, but physical memory usage should not as large as total data we plan to write?
But it says that NoSync
is dangerous, might corrupt db's metadata. Is it only suitable for following cases:
- Our case here, single bucket, and after pre-processing, all data has been written into db, even if anything wrong, just rebuild. If there is possibility that we make a common module of
db
with multiplebucket
, putNoSync
as parameter might be better. - Bulk load all data from bucket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments for NoSync
explains clearly(https://pkg.go.dev/go.etcd.io/bbolt?tab=doc#DB:):
// Setting the NoSync flag will cause the database to skip fsync()
// calls after each commit. This can be useful when bulk loading data
// into a database and you can restart the bulk load in the event of
// a system failure or database corruption.
I think it's very smililar with the fwrite()
: the default behavior is let OS decide when to sync to disk for performance, and users have to manually fsync()
if want to make sure all contents are committed.(Maybe some implementation difference that I haven't figured out yet.)
I haven't test the performance yet. Currently the build db process is very slow. I'll profile to improve the performance later.
if err := db.Write(wayNodes.WayID, wayNodes.NodeIDs); err != nil { | ||
if glog.V(3) { // avoid affect performance by verbose log | ||
glog.Infof("Update %+v into db failed, err: %v", wayNodes, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirm: glog.V(3).Infof
should be able to achieve the same goal. I saw the document mentioned both.
// if glog.V(2) {
// glog.Info("Starting transaction...")
// }
//
// glog.V(2).Infoln("Processed", nItems, "elements")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally they're the same. The difference is that the first one is cheaper if logging is off because it does not evaluate its arguments, and the seconder one is shorter. See more in https://godoc.org/github.com/golang/glog#V.
Issue
Closes #273
nodes2way-builder
to generate the db fromwayid2nodeids.csv
orwayid2nodeids.csv.snappy
.nodes2way-cli
to able to query ways from nodes, and able to measure the querying performance. Expose wayID in osrm route response by post processing #257 (comment) shows excellcent querying performance.wayidsflag
package which will be refactored in Refactor wayidsflag package #275 later