Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate nodes2ways DB #274

Merged
merged 19 commits into from
Apr 13, 2020
Merged

Generate nodes2ways DB #274

merged 19 commits into from
Apr 13, 2020

Conversation

wangyoucao577
Copy link

@wangyoucao577 wangyoucao577 commented Apr 10, 2020

Issue

Closes #273

@wangyoucao577 wangyoucao577 self-assigned this Apr 10, 2020
@wangyoucao577 wangyoucao577 added NewFeature New feature or feature improvement Prototype Proof of concept labels Apr 10, 2020
@wangyoucao577 wangyoucao577 added this to the Sprint 61(Dev:04/21) milestone Apr 10, 2020

var (
errEmptyDB = errors.New("empty db")
errKeyNotFound = errors.New("key not found")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good behavior, its clear to define all errors for a component/interface in a organized way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here only defines common errors instead of all errors: will be used more than once. I saw many packages define some exported errors at the beginning, I think that's good since callers can handle them easer.

}

const (
defaultBucket = "bucket"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we give bucket a specific name, like node2wayids_bucket? Will it possible that we have same db but would have different buckets for queries?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's good to have a named bucket even we may only has one. I'll rename it.

}

// to improve write performance, but will manually sync before close
db.db.NoSync = true
Copy link

@CodeBear801 CodeBear801 Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume NoSync could improve performance a lot in our case, do you have a draft value?

I think the benefit of NoSync is single commit, means no db.log in the middle and let OS to arrange write throughput. By the way, I think bboltdb will open bucket file by mmap, even set to NoSync, but physical memory usage should not as large as total data we plan to write?

But it says that NoSync is dangerous, might corrupt db's metadata. Is it only suitable for following cases:

  • Our case here, single bucket, and after pre-processing, all data has been written into db, even if anything wrong, just rebuild. If there is possibility that we make a common module of db with multiple bucket, put NoSync as parameter might be better.
  • Bulk load all data from bucket

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments for NoSync explains clearly(https://pkg.go.dev/go.etcd.io/bbolt?tab=doc#DB:):

	// Setting the NoSync flag will cause the database to skip fsync()
	// calls after each commit. This can be useful when bulk loading data
	// into a database and you can restart the bulk load in the event of
	// a system failure or database corruption.

I think it's very smililar with the fwrite(): the default behavior is let OS decide when to sync to disk for performance, and users have to manually fsync() if want to make sure all contents are committed.(Maybe some implementation difference that I haven't figured out yet.)
I haven't test the performance yet. Currently the build db process is very slow. I'll profile to improve the performance later.

if err := db.Write(wayNodes.WayID, wayNodes.NodeIDs); err != nil {
if glog.V(3) { // avoid affect performance by verbose log
glog.Infof("Update %+v into db failed, err: %v", wayNodes, err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirm: glog.V(3).Infof should be able to achieve the same goal. I saw the document mentioned both.

//	if glog.V(2) {
//		glog.Info("Starting transaction...")
//	}
//
//	glog.V(2).Infoln("Processed", nItems, "elements")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally they're the same. The difference is that the first one is cheaper if logging is off because it does not evaluate its arguments, and the seconder one is shorter. See more in https://godoc.org/github.com/golang/glog#V.

@wangyoucao577 wangyoucao577 merged commit 8ffd446 into master Apr 13, 2020
@wangyoucao577 wangyoucao577 deleted the feature/convert-nodes2ways branch April 13, 2020 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NewFeature New feature or feature improvement Prototype Proof of concept
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate nodeIDs2wayID db
2 participants