Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i prevent multiple queries to the database when fetching foreign keys? #20

Closed
RubenSchmidt opened this issue May 16, 2017 · 15 comments

Comments

@RubenSchmidt
Copy link

RubenSchmidt commented May 16, 2017

Hi! I am trying to use this library with graphql-go but i am struggling to get the batching to work when doing list queries that reference foreign key objects. Here's an example schema of what i mean:

type Query {
    meetups: [Meetup]!
}
type User {
    ID: Int!
    Name: String!
}

type Meetup {
    ID: Int!
    Title: String!
    Owner: User!
}

In the MeetupResolver i have this:

func (r *MeetupResolver) Owner() *UserResolver {
	thunk := r.loader.Load(r.meetup.OwnerID)
	data, _ := thunk()
	user, _ := data.(model.User)
	return NewUserResolver(&user)

When doing a query to meetups and asking for the Owner of the objects it is still doing a single query to the database for each of the different owners, like this:

[2017-05-16 16:41:13]  [0.85ms]  SELECT * FROM "users"  WHERE ("id" IN ('6'))
[2017-05-16 16:41:13]  [1.34ms]  SELECT * FROM "users"  WHERE ("id" IN ('7'))

Is there a way to batch all the owner ids and then send the query to the database?

@RubenSchmidt RubenSchmidt changed the title How can i prevent n+1 queries to the database when fetching foreign keys? How can i prevent multiple queries to the database when fetching foreign keys? May 16, 2017
@tonyghita
Copy link
Member

Are you sharing the loaders between resolvers? Or does each FooResolver have it's own dataloader.Loader instance?

@RubenSchmidt
Copy link
Author

RubenSchmidt commented May 16, 2017

I send the loader with the request context like this:

func dataLoaderMiddleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		ctx := r.Context()
		userLoader := &loaders.UserLoader{Db: db}
		batchFunc := userLoader.Attach(ctx)
		uloader := dataloader.NewBatchedLoader(batchFunc, dataloader.WithCache(data.NewCache()))
		ctx = context.WithValue(ctx, "userLoader", uloader)
		newRequest := r.WithContext(ctx)
		*r = *newRequest
		next.ServeHTTP(w, r)
	})
}

And in the query resolver i use it like this:


func (r *Resolver) Meetups(ctx context.Context) ([]*resolvers.MeetupResolver, error) {
	loader, found := ctx.Value("userLoader").(*dataloader.Loader)
	if !found {
		return nil, errors.New("unable to find the user loader")
	}
        ....
	for _, meetup := range meetups {
		meetupResolvers = append(meetupResolvers, resolvers.NewMeetupResolver(meetup, loader))
	}
	return meetupResolvers, nil
}

@RubenSchmidt
Copy link
Author

So this is basically the same as this issue: graph-gophers/graphql-go#17. Dataloader is recommended there, so surely it is just something wrong with my implementation?

@nicksrandall
Copy link
Member

Can you share your code for the batchFn? It sounds like the issue is there. It's probably a result of bad documentation on my end.

@RubenSchmidt
Copy link
Author

I tried to implement it like @tonyghita proposed in #19

type UserLoader struct {
	Db *gorm.DB
}

func (l *UserLoader) Attach(ctx context.Context) dataloader.BatchFunc {
	return func(ids []string) []*dataloader.Result {
		var users []model.User
		if err := l.Db.Where(ids).Find(&users).Error; err != nil {
			log.Printf("Error in userloader: %v", err)
		}

		results := []*dataloader.Result{}
		for _, user := range users {
			results = append(results, &dataloader.Result{user, nil})
		}
		return results
	}
}

@nicksrandall
Copy link
Member

That batchFunc code looks correct to me. The default batch window is 6ms. Is it possible that your resolvers are taking longer than that to resolve? You could configure the batch window to be something longer to see if that solves your issue?

@RubenSchmidt
Copy link
Author

RubenSchmidt commented May 16, 2017

I can't really see why the resolver would take so long, there is really not much going on there as you can see in Meetups() and UserResolver above. I tried setting the batch window to a 1 sec like this:
uloader := dataloader.NewBatchedLoader(batchFunc, dataloader.WithCache(data.NewCache()), dataloader.WithWait(1000*time.Millisecond))
The query is clearly loading slower, however the results are the same.

[2017-05-16 21:32:29]  [2.81ms]  SELECT * FROM "meetups"
[2017-05-16 21:32:29]  [3.24ms]  SELECT * FROM "users"  WHERE ("id" IN ('2'))
[2017-05-16 21:32:30]  [1.64ms]  SELECT * FROM "users"  WHERE ("id" IN ('3'))

The returned response from the server is a list with 20 meetups and their owner. There are only two users that are owners to all the meetups, hence only the two select statements from users. So the caching works in the way that it prevents the resolver function from querying the database for the same user again. But in my application there could easily be 20 meetups with 20 different owners and the application would then query the database one time for each distinct user.

@RubenSchmidt
Copy link
Author

RubenSchmidt commented May 16, 2017

Update: i found that calling thunk() from the Owner() field in the resolver directly after Load() causes the load to send the query as stated above. But when i change the the Owner method to:

type MeetupResolver struct {
   meetup *model.Meetup
   thunk  dataloader.Thunk   <--- Not the loader, but the thunk
}

func (r *MeetupResolver) Owner() *UserResolver {
   data, _ := r.thunk()
   user, _ := data.(model.User)
   return NewUserResolver(&user)

And changing the Meetups() query resolver method to:

func (r *Resolver) Meetups(ctx context.Context) ([]*resolvers.MeetupResolver, error) {
	loader, found := ctx.Value("userLoader").(*dataloader.Loader)
	if !found {
		return nil, errors.New("unable to find the thing loader")
	}
	meetups, err := model.GetAllMeetups(r.db)
	if err != nil {
		return nil, err
	}
	var meetupResolvers []*resolvers.MeetupResolver
	for _, meetup := range meetups {
                // Load here now
		thunk := loader.Load(strconv.Itoa(meetup.OwnerID)) 
		meetupResolvers = append(meetupResolvers, resolvers.NewMeetupResolver(meetup, thunk))
	}
	return meetupResolvers, nil
}

The result is then:

[2017-05-16 22:19:33]  [0.74ms]  SELECT * FROM "meetups"
[2017-05-16 22:19:33]  [0.56ms]  SELECT * FROM "users"  WHERE ("id" IN ('2','3'))

However this is not exactly the result i want, because then the SELECT * FROM "users" statement would be run even though Owner in the Meetup is not requested.

@RubenSchmidt
Copy link
Author

RubenSchmidt commented May 23, 2017

@nicksrandall any updates on this?
This is easy replicated using the examples:

func main() {
	// go-cache will automaticlly cleanup expired items on given diration
	c := cache.New(time.Duration(15*time.Minute), time.Duration(15*time.Minute))
	cache := &Cache{c}
	loader := dataloader.NewBatchedLoader(batchFunc, dataloader.WithCache(cache))

	// immediately call the future function from loader
	loader.Load("key1")()
        loader.Load("key2")()
        loader.Load("key3")()
}

func batchFunc(keys []string) []*dataloader.Result {
	var results []*dataloader.Result
	fmt.Printf("got keys: %v \n", keys)
	for _, key := range keys {
		results = append(results, &dataloader.Result{key, nil})
	}
	return results
}

Prints:
got keys: [key1]
got keys: [key2]
got keys: [key3]

When calling the thunk the wait for the batching is not taken into account.

@tonyghita
Copy link
Member

I'm not sure that nil results are cached

@nicksrandall
Copy link
Member

nicksrandall commented May 23, 2017

@RubenSchmidt the "thunk" (or future) calls will block until they are resolved. The load method returns a thunk and not the values so that we can defer work. To help explain, the following code will work as you expected it to.

func main() {
	// go-cache will automaticlly cleanup expired items on given diration
	c := cache.New(time.Duration(15*time.Minute), time.Duration(15*time.Minute))
	cache := &Cache{c}
	loader := dataloader.NewBatchedLoader(batchFunc, dataloader.WithCache(cache))

        // queue up 
	thunk1 := loader.Load("key1")
        thunk2 := loader.Load("key2")
        thunk3 := loader.Load("key3")

       // wait for resolution
        thunk1()
        thunk2()
        thunk3()
}

func batchFunc(keys []string) []*dataloader.Result {
	var results []*dataloader.Result
	fmt.Printf("got keys: %v \n", keys)
	for _, key := range keys {
		results = append(results, &dataloader.Result{key, nil})
	}
	return results
}

The reason that you can immediately call the thunk in your resolvers is because they are each run in their own goroutine. So the following would also work as you expected:

func main() {
	// go-cache will automaticlly cleanup expired items on given diration
	c := cache.New(time.Duration(15*time.Minute), time.Duration(15*time.Minute))
	cache := &Cache{c}
	loader := dataloader.NewBatchedLoader(batchFunc, dataloader.WithCache(cache))

	// immediately call the future function from loader
	go loader.Load("key1")()
        go loader.Load("key2")()
        go loader.Load("key3")()

        // wait for goroutines to finish before exiting program.
        time.Sleep(1 * time.Second)
}

func batchFunc(keys []string) []*dataloader.Result {
	var results []*dataloader.Result
	fmt.Printf("got keys: %v \n", keys)
	for _, key := range keys {
		results = append(results, &dataloader.Result{key, nil})
	}
	return results
}

@nicksrandall
Copy link
Member

P.S. @tonyghita thanks for always jumping in and lending a hand in the issues. I sincerely appreciate it!

@RubenSchmidt
Copy link
Author

Alright thanks for clarifying! After looking some more into it, I found that by getting the value from the context in the Owner field and not when creating the resolver from the Meetups query it works as expected:

func (r *MeetupResolver) Owner(ctx context.Context) (*UserResolver, error) {
	loader, found := ctx.Value("userLoader").(*dataloader.Loader)
	if !found {
		return nil, errors.New("unable to find the userloader")
	}
	thunk := loader.Load(strconv.Itoa(r.meetup.OwnerID))
	data, _ := thunk()
	user, _ := data.(model.User)
	return NewUserResolver(&user), nil

Passing the resolvers in the context feels a little bit clunky tho and an example of a better way would be much appreciated :)

Thanks for all the help and a great project!

@tonyghita
Copy link
Member

tonyghita commented May 23, 2017

@RubenSchmidt I agree it's pretty clunky.

Off-topic, but I've started organizing the resolvers to have constructors where their dependencies are checked.

I also pull out a load() method on each resolver type that does all loader-dependency checking and data manipulation.

These allow for really minimal and clean resolver functions.

type MeetupResolver struct {
  id     string
  loader *dataloader.Loader
}

// NewMeetupResolver validates dependencies and returns a new instance of MeetupResolver.
func NewMeetupResolver(ctx context.Context, id string) (*MeetResolver, error) { 
  loader, found := ctx.Value("meetupLoader").(*dataloader.Loader)
  if !found {
    return nil, errors.New("unable to find meetup loader")
  }

  if id == "" {
    return nil, errors.New("no meetup ID specified")
  }
  
  return &MeetupResolver{id, loader}, nil
}

func (r *MeetupResolver) load() (*model.Meetup, error) {
  // we can have any kinds of necessary checks here
  if r.loader == nil {
     return nil, errors.New("missing meetup loader")
  }

  // kind of verbose, but makes code bulletproof and easy to debug
  if r.id == "" {
    return nil, errors.New("missing meetup key")
  } 

  // use the loader we attached in the constructor
  thunk := r.loader.Load(r.id)
  data, err := thunk()
  if err != nil {
    return nil, err
  }

  meetup, ok := data.(model.Meetup)
  if !ok {
    return nil, errors.New("unable to convert response to Meetup")
  }
  return meetup, nil
}

func (r *MeetupResolver) Owner(ctx context.Context) (*UserResolver, error) {
  meetup, err := r.load() // keep the logic simple for resolvers
  if err != nil { 
    return nil, err
  }

  return NewUserResolver(ctx, meetup.OwnerID) // analogous to NewMeetupResolver()
}

@RubenSchmidt
Copy link
Author

@tonyghita That looks cleaner! I'm going to do it that way for now! Again, thanks for all the help! I'm closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants