Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Datastore - Entity Count Query - StructuredQuery.Builder - Allow limit of 0 #5061

Closed
codeconsole opened this issue May 4, 2019 · 8 comments
Assignees
Labels
api: datastore Issues related to the Datastore API. needs more info This issue needs more information from the customer to proceed. type: question Request for information or clarification. Not an issue.

Comments

@codeconsole
Copy link

codeconsole commented May 4, 2019

According to the group, the best solution for counting entities is:

What Alfred describes is how countEntities is implemented. it simpy(?) does a normal query with a limit of 0 and a max_int offset. The count is then available via the response.batch.skipped_results (where response is the RunQueryResponse returned by RunQuery), however it may be necessary to run the query multiple times starting from reponse.batch.skipped_cursor if response.batch.more_results is not NO_MORE_RESULTS/MORE_RESULTS_AFTER_CURSOR.

https://groups.google.com/forum/#!topic/gcd-discuss/wH8lVOA-a8Y

This was made possible by this ticket.
#3279

However, currently the setLimit enforces a value > 0.

    @Override
    public B setLimit(Integer limit) {
      Preconditions.checkArgument(limit == null || limit > 0, "limit must be positive");
      this.limit = limit;
      return self();
    }

https://googleapis.github.io/google-cloud-java/google-cloud-clients/apidocs/com/google/cloud/datastore/StructuredQuery.Builder.html#setLimit-java.lang.Integer-

StructuredQuery.java needs to be updated to allow a limit >= 0.

https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-clients/google-cloud-datastore/src/main/java/com/google/cloud/datastore/StructuredQuery.java#L760

This will allow the following code to work:

        int count = 0;
        Query query = Query.newKeyQueryBuilder().setKind(kind)
                .setOffset(Integer.MAX_VALUE).setLimit(1).build();
        results = datastore.run(query);
        for (count = results.getSkippedResults(); results.getMoreResults() != QueryResultBatch.MoreResultsType.NO_MORE_RESULTS; i++) {
            query = query.toBuilder().setStartCursor(results.cursorAfter).build();
            results = gdatastore.run(query);
            count += results.getSkippedResults();
        }
@codeconsole codeconsole changed the title StructuredQuery.Builder - Entity Count Query - Allow limit of 0 Cloud Datastore - Entity Count Query - StructuredQuery.Builder - Allow limit of 0 May 4, 2019
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label May 4, 2019
@kolea2 kolea2 added api: datastore Issues related to the Datastore API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels May 7, 2019
@ajaaym
Copy link
Contributor

ajaaym commented May 24, 2019

@codeconsole Why do you want setLimit to 0 instead of not setting at all?

@ajaaym ajaaym added type: question Request for information or clarification. Not an issue. and removed type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels May 29, 2019
@kolea2 kolea2 added the needs more info This issue needs more information from the customer to proceed. label Jul 10, 2019
@ajaaym
Copy link
Contributor

ajaaym commented Sep 12, 2019

closing this due to staleness, please feel free to reopen should you need further help.

@ajaaym ajaaym closed this as completed Sep 12, 2019
@codeconsole
Copy link
Author

@ajaaym You are asking a question that was explained in the description.

What Alfred describes is how countEntities is implemented. it simpy(?) does a normal query with a limit of 0 and a max_int offset.

Which was referenced from here as explained by David Gray a Google Employee Mon, May 14, 2018 at 7:25 PM

https://groups.google.com/forum/#!topic/gcd-discuss/wH8lVOA-a8Y

@codeconsole
Copy link
Author

@ajaaym I imagine this has to do with the fact that returning 1 key every time is useless, serves no purpose, and is wasting network traffic. Although the efficiency is most likely best for Google as I believe there are no is no charge for a Key only query?

If you don't set a limit, you will get numerous keys returned which would be even less efficient.

@montss
Copy link

montss commented Nov 4, 2019

So we tried to implement a method to count results based on the above,

	int countEntities(StructuredQuery.Builder<?> qb) {
		StructuredQuery<?> q = qb.build();
		//copy query, but do NOT copy cursor!
		StructuredQuery.Builder<?> builderKeysOnly = Query.newKeyQueryBuilder().setKind(q.getKind()).setFilter(q.getFilter()).setLimit(0).setOffset(Integer.MAX_VALUE);

		int count = 0;

		QueryResults<?> forCursor = this.gCloudDatastore.run(builderKeysOnly.build());
		count = forCursor.getSkippedResults();
		while (forCursor.getMoreResults() != MoreResultsType.NO_MORE_RESULTS && forCursor.getMoreResults() != MoreResultsType.MORE_RESULTS_AFTER_LIMIT) {
			builderKeysOnly.setStartCursor(forCursor.getCursorAfter());
			forCursor = this.gCloudDatastore.run(builderKeysOnly.build());
			count += forCursor.getSkippedResults();
		}

		return count;
	}

First locally using the cloud-datastore-emulator 2.1.0 it always has getMoreResults() as MoreResultsType.MORE_RESULTS_AFTER_LIMIT, that's why I added the second check in the while loop, after that the counting is working fine.

But deployed I'm not getting correct counts at all, even though debugging live builds shows that getMoreResults() is MoreResultsType.NO_MORE_RESULTS

Now the Java Doc for getSkippedResults() says :-

Returns the number of results skipped, typically because of an offset.

A simple use case to count entities:

Query<Key> query = Query.newKeyQueryBuilder().setOffset(Integer.MAX_VALUE).build();
 QueryResults<Key> result = datasore.datastore.run(query);
 if (!result.hasNext()) {
  int numberOfEntities = result.getSkippedResults();
 }
 }

The sample code they have there is not clear, what if result.hasNext() is actually true ?

@codeconsole
Copy link
Author

Just keep it simple and do not re-invent the wheel, I have not had any issues

        int count = 0;
        Query query = Query.newKeyQueryBuilder().setKind(kind)
                .setOffset(Integer.MAX_VALUE).setLimit(0).build();
        results = datastore.run(query);
        for (count = results.getSkippedResults(); results.getMoreResults() != QueryResultBatch.MoreResultsType.NO_MORE_RESULTS; i++) {
            query = query.toBuilder().setStartCursor(results.cursorAfter).build();
            results = datastore.run(query);
            count += results.getSkippedResults();
        }

@codeconsole
Copy link
Author

Note: This isn't something you should be running all the time as the query takes a long time on large Tables. This is nothing like a count query in sql and is extremely slow. This is also not guaranteed to be accurate according to the limitations of this type of datastore.

@montss
Copy link

montss commented Nov 5, 2019

Just keep it simple and do not re-invent the wheel, I have not had any issues

        int count = 0;
        Query query = Query.newKeyQueryBuilder().setKind(kind)
                .setOffset(Integer.MAX_VALUE).setLimit(0).build();
        results = datastore.run(query);
        for (count = results.getSkippedResults(); results.getMoreResults() != QueryResultBatch.MoreResultsType.NO_MORE_RESULTS; i++) {
            query = query.toBuilder().setStartCursor(results.cursorAfter).build();
            results = datastore.run(query);
            count += results.getSkippedResults();
        }

Thank you.

I'm not trying something different here actually both codes are almost identical, only a while instead of a for loop, i++ in your loop is not needed, anyway I have two main problems as I said

  1. Locally with the emulator it always gives MoreResultsType.MORE_RESULTS_AFTER_LIMIT
  2. The count is almost never accurate, however most of out tables are big ones with 100,000's of items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the Datastore API. needs more info This issue needs more information from the customer to proceed. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants