Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtuoso imposes a limit of 2^20 = 1048576 results on HTTP response #700

Open
saleem-muhammad opened this issue Dec 13, 2017 · 9 comments
Open

Comments

@saleem-muhammad
Copy link

saleem-muhammad commented Dec 13, 2017

This issue might not be new. I am running some large results queries by contacting virtuoso SPARQL endpoint from my java application. I have noticed that Virtuoso endpoint does not retrieve more than 1048576 results as HTTP response, no matter how large you set the ResultSetMaxRows in the configuration file. Is there a way to remove this limit? I have also noted that ISQL does not have such limit.

@HughWilliams
Copy link
Collaborator

1048576 is a known limit on the size of a Virtuoso result set, thus using limit and offset is the way to go if you really need this many results.

@saleem-muhammad
Copy link
Author

saleem-muhammad commented Dec 13, 2017

Thanks for reply,
The problem is that I want to benchmark engines based on large data ( having greater than 1 M results) queries. Thus, a single query cannot be split by adding Limit Offset. The queries are coming from standard benchmark and adding offset limit would affect the whole purpose of the said benchmarking.

@kidehen
Copy link

kidehen commented Dec 13, 2017

@saleem-muhammad,

You are dealing with an HTTP limitation. You can use ODBC or JDBC connections to Virtuoso that execute SPARQL queries too. The problem is that a document comprising a solution of 1 million+ tuples over HTTP is not the norm for any benchmark.

You can perform a variety of relational operations over databases of various sizes, but the solutions themselves do not amount to a dump of 1 million plus tuples (be it records in a table or statement graphs). Anyway, if you want to retrieve data progressively over HTTP, which is what we offer, then you have OFFSET and LIMIT; otherwise, you can push this all through alternative protocols like ODBC or JDBC.

I hope this helps.

@saleem-muhammad
Copy link
Author

saleem-muhammad commented Dec 14, 2017

@kidehen
Thanks indeed and it worked over JDBC. However, I think this is not an HTTP limitation as I am able to retrieve more than one million results for other triple stores over HTTP. The examples given at http://vos.openlinksw.com/owiki/wiki/VOS/VOSDownload are very helpful. The only problem now i can see is that other triple stores should also support JDBC or ODBC connections. Otherwise, the comparison would not be fair. In addition, experiments using large data queries (million of results) over live virtuoso SPARQL endpoints or using SPARQL federation engines is made tricky by this limitation.

@pkleef
Copy link
Collaborator

pkleef commented Dec 14, 2017

As a temporary workaround, edit the file

libsrc/Wi/sparql_io.sql

around line 3236 you will find the following

 maxrows := 1024*1024; -- More than enough for web-interface.

Change this to

maxrows := 10*1024*1024; -- More than enough for web-interface.

and recompile.

I am working on a permanent fix which will be committed to VOS probably tomorrow.

@kidehen
Copy link

kidehen commented Dec 14, 2017

@saleem-muhammad,

To be clear, I should have stated this was a Virtuoso HTTP interface limitation. Anyway, as per comment by @pkleef, the arbitrary limit can be increased.

@iv-an-ru
Copy link
Collaborator

iv-an-ru commented Dec 14, 2017 via email

@saleem-muhammad
Copy link
Author

saleem-muhammad commented Dec 14, 2017

thanks. i have changed the maxrows limit to 64*1024*1024-2 in the libsrc/Wi/sparql_io.sql and now I am able to get upto 20 million results. And get the following error beyond that.

Exception in thread "main" HttpException: 500
        at com.hp.hpl.jena.sparql.engine.http.HttpQuery.rewrap(HttpQuery.java:414)
        at com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:358)
        at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:295)
        at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execResultSetInner(QueryEngineHTTP.java:346)
        at com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineHTTP.java:338)

I think it would be cool if this parameter is somehow matched to ResultSetMaxRows in the configuration file, i.e., virtuoso.ini

@pfps
Copy link

pfps commented Sep 6, 2018

I don't see how you are getting 20 million results, as that is bigger than the MAX_BOX_ELEMENTS limit mentioned by iv-an-ru.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants