Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Txn too big error in bulk loader #3998

Closed
wants to merge 2 commits into from

Conversation

ashish-goswami
Copy link
Contributor

@ashish-goswami ashish-goswami commented Sep 16, 2019

Fixes #3916

I was able to reproduce this issue after running bulk loader with file generate using below code:

package main

import (
	"fmt"
	"os"
)

func main() {
	f, err := os.Create("./many.rdf")
	if err != nil {
		fmt.Println("unable to open file ", err)
		return
	}
	for i := 0; i < 1000000; i++ {
		a := fmt.Sprintf("_:id%d <pred%d> \"val\" .", i, i)
		f.Write([]byte(a))
		f.Write([]byte("\n"))
	}
	f.Sync()
	f.Close()
}

I was thinking of using TxnWriter

dgraph/posting/writer.go

Lines 29 to 33 in d048d5c

type TxnWriter struct {
db *badger.DB
wg sync.WaitGroup
che chan error
}
, but found that, it creates a new transaction for every SetAt().

This change is Reviewable

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ A review job has been created and sent to the PullRequest network.


@ashish-goswami you can click here to see the review status or cancel the code review job.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 Message
The description of this pull request is blank. Adding a high-level summary will help our reviewers provide better feedback.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any qualms about the code itself, looks good to me. Were there any test cases that should either be modified, or new ones added for this change?

Also, I see the PR is marked as a WIP. I will come back for another round if there are any additional changes.


Reviewed with ❤️ by PullRequest

err = txn.SetEntry(e)
if err == badger.ErrTxnTooBig {
commitTxn(txn)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra newline doesn't seem necessary

Copy link
Contributor

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami and @mangalaman93)


dgraph/cmd/bulk/reduce.go, line 48 at r1 (raw file):

		}(reduceJob)
	}
	thr.Wait()

Either add a comment that we need to wait for thr before writesThr or increment before go routine is created.


dgraph/cmd/bulk/schema.go, line 139 at r1 (raw file):

			x.Check(txn.CommitAt(1, nil))
			txn = db.NewTransactionAt(math.MaxUint64, true)
			x.Check(txn.SetEntry(entry)) // We are not checking ErrTxnTooBig for second time.

add more comments here as to why it doesn't make sense to check ErrTxnTooBig here

@mangalaman93 mangalaman93 self-requested a review September 17, 2019 14:52
@ashish-goswami ashish-goswami marked this pull request as ready for review September 17, 2019 16:29
@martinmr martinmr requested review from manishrjain and a team September 17, 2019 18:55
Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Just a couple minor comments.

For the commit message, make sure you include a short description of the root cause and the solution for future reference.

Reviewed 2 of 2 files at r1.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ashish-goswami, @mangalaman93, and @manishrjain)


dgraph/cmd/bulk/reduce.go, line 93 at r1 (raw file):

			txn = newTxn()
			x.Check(txn.SetEntry(e)) // We are not checking ErrTxnTooBig second time.

nit: "for the second time"


dgraph/cmd/bulk/schema.go, line 133 at r1 (raw file):

		// If error returned while setting entry is badger.ErrTxnTooBig, we should
		// commit current txn and start new one.

nit: "a new one"


dgraph/cmd/bulk/schema.go, line 139 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

add more comments here as to why it doesn't make sense to check ErrTxnTooBig here

"Not checking for ErrTxnTooBig because transaction was just created" should be enough to clarify the reason.

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @mangalaman93, @manishrjain, and @martinmr)


dgraph/cmd/bulk/reduce.go, line 48 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

Either add a comment that we need to wait for thr before writesThr or increment before go routine is created.

I have added comment here.


dgraph/cmd/bulk/schema.go, line 139 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

"Not checking for ErrTxnTooBig because transaction was just created" should be enough to clarify the reason.

Added detailed comment.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami, @mangalaman93, and @martinmr)


dgraph/cmd/bulk/schema.go, line 139 at r1 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

Added detailed comment.

Why write all this logic. Why not use TxnWriter?

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to inactivity, PullRequest has cancelled this review job. You can reactivate the code review job from the PullRequest dashboard

@ashish-goswami
Copy link
Contributor Author

Closing this PR as fixed via #4296

@ashish-goswami ashish-goswami deleted the ashish/bulk_txn_too_big branch November 28, 2019 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants