convert templated queries to named queries and separate concerns by introducing named query middleware #2412

WolfgangFahl · 2024-01-26T10:17:23Z

Is your feature request related to a problem? Please describe.
blazegraph is getting close to the 4TB limit. Wikimedia foundation is testing a graph split in Q1/2024.
This will eventually and likeley force the use of:

federated queries
different SPARQL endpoint(s)
different triplestores
different flavors of SPARQL
may be even different query languages

also there is the already limiting timeout of 1 min of the official WDQS

Describe the solution you'd like

Change most or all relevant queries to named queries with parameters
Call a middleware to run the queries
Let the middleware do the necessary translation

Describe alternatives you've considered
Get your own copy of wikidata and use it see CEUR-WS Vol-3262 paper Getting and hosting your own copy of Wikidata

Additional context

Search Platfrom Office Hours 2023-12-06

Named Query handling:

Queries may be referenced theses days with e.g. short urls which are boths supported by the Wikdata Query Service and QLever. Personally i think it would be good to go one step futher and have "named queries". See e.g. https://cr.bitplan.com/index.php/List_of_Queries as a example for queries. Scholia also uses a similar idea internally. See https://github.com/WDscholia/scholia/tree/master/scholia/app/templates. Quite a few of these queries have no only a few parameters. E.g. https://github.com/WDscholia/scholia/blob/master/scholia/app/templates/author_topics.sparql only takes a single q - identifier has input.

In my own pylodstorage project https://pypi.org/project/pyLodStorage/ i am already offering named queries but without parameters. WolfgangFahl/pyLoDStorage#113 is the issue to parameterize the queries. The queries are described in Yaml files in this solution. I imagine a RESTFul service that takes a query name and a set of parameters and returns the result in a SPARQL server compatible way. This would mean that the details of the Query (e.g. whether it is federated or on which endpoint it runs) are hidden. I believe that this approach would work well with the intended Wikidata Split attempt in QI / 2024.

Links:

Previous analysis of blazegraph alternatives:

Qlever federation

https://phabricator.wikimedia.org/T339347
And on the QLever side, wikidata query service/qlever endpoint federated query ad-freiburg/qlever#1077

Scaling Wikidata Query Service - Split the Graph experiment

WolfgangFahl · 2024-01-26T10:28:06Z

Potential test platform https://scholia.portal.mardi4nfdi.de/

see also https://phabricator.wikimedia.org/T329368#8872693

tholzheim · 2024-01-26T10:49:01Z

Further example Queries:

WolfgangFahl · 2024-01-29T04:56:26Z

see also #2063 and ad-freiburg/qlever#859

fnielsen · 2024-01-29T08:48:16Z

I am trying to understand this. Do you propose the use of FROM against and special endpoint that distribute queries? Where does federation comes in?

WolfgangFahl · 2024-01-29T10:05:17Z

@fnielsen the intention is to do information hiding and don't reveal what the actual query looks like. Take
author_events as an example that query has the name "author_events" and a single QID parameter.

The specific query for your personal QID Q20980928 on QLever would e.g. https://qlever.cs.uni-freiburg.de/wikidata/084HGc and doesnot run out of the box The query on Wikidata does give 71 results but the URL shortening fails so i can't give a short link here and purposely i don't intend to show the details of the query. you'd just be interested in the result.

our pyLodStorage library already allows commands such as:

sparqlquery -qp wikidata.yaml -qn author_events_fan -f github

which will pick up the query specification from a yaml file with author_events_fan - named query spec(see result below).
The proposal here is to offer the same behavior as a SPARL endpoint compatible web service that hides all technical details. That way if a query needs to be rewritten to a federated query we may do so "behind the science" in the blackbox we are providing. We might even check whether the result is the same as without the federation.

author_events_fan

try it!

result

date	event	eventLabel	eventUrl	roles	locations
2023-09-13	http://www.wikidata.org/entity/Q117314306	First Wikibase Lexical Data Workshop	/event/Q117314306	speaker	Centre for Translation Studies
2023-05-28	http://www.wikidata.org/entity/Q115781177	ESWC 2023	/event/Q115781177	participant	Aldemar Knossos Royal
2023-05-28	http://www.wikidata.org/entity/Q115972632	Semantic Technologies for Scientific, Technical and Legal Data	/event/Q115972632	speaker, author	Aldemar Knossos Royal
2023-05-28	http://www.wikidata.org/entity/Q121334813	ESWC 2023 Workshops and Tutorials	/event/Q121334813	author	Chersonesos
2023-05-22	http://www.wikidata.org/entity/Q115497966	The 24th Nordic Conference on Computational Linguistics	/event/Q115497966	author	Tórshavn
2023-05-11	http://www.wikidata.org/entity/Q114794722	Wiki Workshop 2023	/event/Q114794722	author
2022-11-30	http://www.wikidata.org/entity/Q113956029	Sprogteknologisk Konference 2022	/event/Q113956029	participant	Søndre Campus
2022-11-07	http://www.wikidata.org/entity/Q113954954	Danish Data Science 2022	/event/Q113954954	participant	Hotel LEGOLAND
2021-11-16	http://www.wikidata.org/entity/Q108377974	Sprogteknologisk Konference 2021	/event/Q108377974	participant	Søndre Campus
2021-10-25	http://www.wikidata.org/entity/Q106591764	Deep Learning for Knowledge Graphs 2021	/event/Q106591764	program committee member
2021-10-24	http://www.wikidata.org/entity/Q106429029	The 2nd Wikidata Workshop	/event/Q106429029	program committee member
2021-05-31	http://www.wikidata.org/entity/Q102274071	The 23rd Nordic Conference on Computational Linguistics	/event/Q102274071	author	Reykjavík University
2021-04-14	http://www.wikidata.org/entity/Q104835330	Wiki Workshop 2021	/event/Q104835330	participant
2020-11-02	http://www.wikidata.org/entity/Q86530254	The 1st Wikidata Workshop	/event/Q86530254	program committee member
2020-10-26	http://www.wikidata.org/entity/Q100741900	WikiCite 2020 Virtual conference	/event/Q100741900	speaker, participant	online
2020-10-19	http://www.wikidata.org/entity/Q98083516	Combining Symbolic and Sub-symbolic methods and their Applications	/event/Q98083516	program committee member	Galway
2020-09-01	http://www.wikidata.org/entity/Q102070516	Digitally support Environment Assessment for Sustainable Development Goals	/event/Q102070516	participant
2020-06-22	http://www.wikidata.org/entity/Q79137947	7th Workshop on Linked Data in Linguistics	/event/Q79137947	author
2020-06-01	http://www.wikidata.org/entity/Q84430072	3rd Workshop on Quality of Open Data	/event/Q84430072	program committee member	University of Colorado, at Colorado Springs
2020-05-31	http://www.wikidata.org/entity/Q83793571	Deep Learning for Knowledge Graphs 2020	/event/Q83793571	program committee member	Chersonesos
2020-05-26	http://www.wikidata.org/entity/Q94759294	WikiLunch	/event/Q94759294	participant	German National Library of Science and Technology, World Wide Web, Wikiversity
2020-05-26	http://www.wikidata.org/entity/Q94495218	#vBIB20	/event/Q94495218	speaker	German National Library of Science and Technology, World Wide Web
2019-10-25	http://www.wikidata.org/entity/Q42449814	WikidataCon 2019	/event/Q42449814	speaker	Urania
2019-10-09	http://www.wikidata.org/entity/Q63686495	Conference on Natural Language Processing 2019	/event/Q63686495	author	Kollegienhaus
2019-09-09	http://www.wikidata.org/entity/Q59917009	SEMANTiCS 2019	/event/Q59917009	participant, author	Karlsruhe
2019-08-01	http://www.wikidata.org/entity/Q48010913	Wikimania 2019	/event/Q48010913	speaker	Stockholm University
2019-07-23	http://www.wikidata.org/entity/Q61983755	The 10th Global WordNet Conference	/event/Q61983755	participant, author	Wrocław University of Science and Technology
2019-06-26	http://www.wikidata.org/entity/Q61141551	2nd Workshop on Quality of Open Data	/event/Q61141551	program committee member	Seville
2019-06-17	http://www.wikidata.org/entity/Q59979937	5th International Conference on Computational Social Science	/event/Q59979937	program committee member	University of Amsterdam
2019-06-02	http://www.wikidata.org/entity/Q60808888	Workshop at ESWC 2019 on Deep Learning for Knowledge Graphs	/event/Q60808888	program committee member	Grand Hotel Bernardin
2019-06-02	http://www.wikidata.org/entity/Q59620529	ESWC 2019	/event/Q59620529	participant, author	Grand Hotel Bernardin
2019-05-17	http://www.wikidata.org/entity/Q44062313	Wikimedia Hackathon 2019	/event/Q44062313	participant	National Library of Technology building
2019-04-16	http://www.wikidata.org/entity/Q63171054	Women in Data Science Conference 2019 Copenhagen	/event/Q63171054	participant	IT University of Copenhagen
2019-03-29	http://www.wikidata.org/entity/Q59848782	Wikimedia Summit 2019	/event/Q59848782	participant	Mercure Hotel Berlin Tempelhof Airport
2018-11-27	http://www.wikidata.org/entity/Q55117737	WikiCite 2018	/event/Q55117737	speaker, participant	David Brower Center
2018-11-06	http://www.wikidata.org/entity/Q55910942	Second Linked Open Citation Database Workshop	/event/Q55910942	speaker	Mannheim Palace
2018-10-03	http://www.wikidata.org/entity/Q56876300	Research Output & Impact Analyzed and Visualized: Concluding Conference	/event/Q56876300	speaker	DGI-byen
2018-09-25	http://www.wikidata.org/entity/Q48563023	10th International Conference on Social Informatics	/event/Q48563023	program committee member	St. Petersburg
2018-09-03	http://www.wikidata.org/entity/Q51955163	Workshop on Open Citations	/event/Q51955163	speaker	University of Bologna
2018-07-20	http://www.wikidata.org/entity/Q48548111	1st Workshop on Quality of Open Data	/event/Q48548111	program committee member	Berlin
2018-07-12	http://www.wikidata.org/entity/Q47482917	4th Annual International Conference on Computational Social Science	/event/Q47482917	program committee member	Kellogg School of Management
2018-06-04	http://www.wikidata.org/entity/Q48621961	1st International Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies	/event/Q48621961	participant, author	Aldemar Knossos Royal
2018-06-03	http://www.wikidata.org/entity/Q54496448	3rd International Workshop on Geospatial Linked Data	/event/Q54496448	participant, author	Aldemar Knossos Royal
2018-06-03	http://www.wikidata.org/entity/Q50290385	ESWC 2018	/event/Q50290385	participant	Aldemar Knossos Royal
2018-05-27	http://www.wikidata.org/entity/Q47501229	11th International Conference on Chemical Structures	/event/Q47501229	author	Noordwijkerhout
2018-05-01	http://www.wikidata.org/entity/Q30087264	Wikimedia Hackathon 2018	/event/Q30087264	participant	Bellaterra Campus
2018-04-24	http://www.wikidata.org/entity/Q47035167	Wiki Workshop 2018	/event/Q47035167	participant, author	Palais des congrès de Lyon
2018-04-23	http://www.wikidata.org/entity/Q48910401	The Web Conference 2018	/event/Q48910401	participant, author	Palais des congrès de Lyon
2018-04-20	http://www.wikidata.org/entity/Q50132215	Wikimedia Conference 2018	/event/Q50132215	participant	Mercure Hotel Berlin Tempelhof Airport
2018-01-09	http://www.wikidata.org/entity/Q64864052	Teaching platform for developing and automatically tracking early stage literacy skill	/event/Q64864052	participant
2017-11-17	http://www.wikidata.org/entity/Q43254255	8th Language & Technology Conference	/event/Q43254255	speaker, participant, author	Poznań
2017-10-28	http://www.wikidata.org/entity/Q37807682	WikidataCon 2017	/event/Q37807682	speaker, participant	Tagesspiegel building
2017-09-13	http://www.wikidata.org/entity/Q48612170	9th International Conference on Social Informatics	/event/Q48612170	program committee member	Wolfson College
2017-09-07	http://www.wikidata.org/entity/Q28052808	2017 Conference on Empirical Methods in Natural Language Processing	/event/Q28052808	participant	Øksnehallen, DGI-byen, Copenhagen
2017-05-28	http://www.wikidata.org/entity/Q30090453	ESWC 2017	/event/Q30090453	participant, author	Portorož
2017-05-28	http://www.wikidata.org/entity/Q113625218	1st International Workshop on Scientometrics	/event/Q113625218	author	Portorož
2017-05-28	http://www.wikidata.org/entity/Q113744888	1st International Workshop on Enabling Decentralised Scholarly Communication	/event/Q113744888	author	Portorož
2017-05-19	http://www.wikidata.org/entity/Q28053831	Wikimedia Hackathon 2017	/event/Q28053831	participant	JUFA Wien City
2017-03-31	http://www.wikidata.org/entity/Q29169189	Wikimedia Conference 2017	/event/Q29169189	participant
2017-01-01	http://www.wikidata.org/entity/Q54856362	WikiCite 2017	/event/Q54856362	participant	Vienna
2016-06-16	http://www.wikidata.org/entity/Q24632656	The People's Meeting 2016	/event/Q24632656	participant	Allinge
2016-05-17	http://www.wikidata.org/entity/Q75540679	Wiki Workshop 2016, ICWSM 2016	/event/Q75540679	author	Cologne
2014-01-01	http://www.wikidata.org/entity/Q14506843	Wikimania 2014	/event/Q14506843	participant	Barbican Centre
2012-05-28	http://www.wikidata.org/entity/Q113505637	2nd Workshop on Semantic Publishing	/event/Q113505637	author	Chersonesos
2012-05-27	http://www.wikidata.org/entity/Q42431329	ESWC 2012	/event/Q42431329	author	Aldemar Knossos Royal
2011-05-30	http://www.wikidata.org/entity/Q113659299	ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages	/event/Q113659299	author	Heraklion
2010-01-01	http://www.wikidata.org/entity/Q14507062	Wikimania 2010	/event/Q14507062	participant	Gdańsk
2008-01-01	http://www.wikidata.org/entity/Q11756041	Wikimania 2008	/event/Q11756041	participant	Alexandria
2004-12-13	http://www.wikidata.org/entity/Q73025763	Neural Information Processing Systems 2004	/event/Q73025763	author	Whistler, Vancouver
2000-06-05	http://www.wikidata.org/entity/Q75936725	ICASSP 2000	/event/Q75936725	author	Istanbul
	http://www.wikidata.org/entity/Q114647284	Wikidata WikiProject COVID-19	/event/Q114647284	participant

WolfgangFahl · 2024-02-01T05:36:24Z

see also https://www.w3.org/2009/sparql/docs/tests/README.html#queryevaltests

WolfgangFahl · 2024-02-09T04:36:45Z

We have been hard at work on our Graph Split experiment [1], and we
now have a working graph split that is loaded onto 3 test servers. We
are running tests on a selection of queries from our logs to help
understand the impact of the split. We need your help to validate the
impact of various use cases and workflows around Wikidata Query
Service.

What is the WDQS Graph Split experiment?

We want to address the growing size of the Wikidata graph by splitting
it into 2 subgraphs of roughly half the size of the full graph, which
should support the growth of Wikidata for the next 5 years. This
experiment is about splitting the full Wikidata graph into a scholarly
articles subgraph and a “main” graph that contains everything else.

See our previous update for more details [2].

Who should care?

Anyone who uses WDQS through the UI or programmatically should check
the impact on their use cases, scripts, bots, code, etc.

What are those test endpoints?

We expose 3 test endpoints, for the full, main and scholarly articles
graphs. Those graphs are all created from the same dump and are not
live updated. This allows us to compare queries between the different
endpoints, with stable / non changing data (the data are from the
middle of October 2023).

The endpoints are:

Each of the endpoints is backed by a single dedicated server of
performance similar to the production WDQS servers. We don’t expect
performance to be representative of production due to the different
load and to the lack of updates on the test servers.

What kind of feedback is useful?

We expect queries that don’t require scholarly articles to work
transparently on the “main” subgraph. We expect queries that require
scholarly articles to need to be rewritten with SPARQL federation
between the “main” and scholarly subgraphs (federation is supported
for some external SPARQL servers already [3], this just happens to be
for internal server-to-server communication). We are doing tests and
analysis based on a sample of query logs.

We want to hear about:

General use cases or classes of queries which break under federation
Bots or applications that need significant rewrite of queries to work
with federation
And also about use cases that work just fine!

Examples of queries and pointers to code will be helpful in your feedback.

Where should feedback be sent?

You can reach out to us using the project’s talk page [1], the
Phabricator ticket for community feedback [4] or by pinging directly
Sannita (WMF) [5].

Will feedback be taken into account?

Yes! We will review feedback and it will influence our path forward.
That being said, there are limits to what is possible. The size of the
Wikidata graph is a threat to the stability of WDQS and thus a threat
to the whole Wikidata project. Scholarly articles is the only split we
know of that would reduce the graph size sufficiently. We can work
together on providing support for a migration, on reviewing the rules
used for the graph split, but we can’t just ignore the problem and
continue with a WDQS that provides transparent access to the full
Wikidata graph.

Have fun!

  Guillaume

[1] https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split
[2] https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_backend_update/October_2023_scaling_update
[3] https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Federation
[4] https://phabricator.wikimedia.org/T356773
[5] https://www.wikidata.org/wiki/User:Sannita_(WMF)

Guillaume Lederrey (he/him)
Engineering Manager
Wikimedia Foundation

WolfgangFahl · 2024-05-01T07:20:20Z

There is now a Wikimedia Hackathon 2024 project task for this https://phabricator.wikimedia.org/T363894

WolfgangFahl · 2024-05-06T07:55:15Z

Check out http://snapquery.bitplan.com/query/scholia/author_list-of-publications
with Q80 - Tim Berners-Lee to get

http://snapquery.bitplan.com has the demo and project is at https://github.com/WolfgangFahl/snapquery with further links to the Hackathon results - thanks to Tim and Dennis for making this happen!

fnielsen · 2024-05-06T19:24:46Z

Check out http://snapquery.bitplan.com/query/scholia/author_list-of-publications with Q80 - Tim Berners-Lee to get

http://snapquery.bitplan.com has the demo and project is at https://github.com/WolfgangFahl/snapquery with further links to the Hackathon results - thanks to Tim and Dennis for making this happen!

I get TimeoutError: No connection after 3.0 seconds

WolfgangFahl · 2024-05-06T19:52:09Z

@fnielsen there is another server at https://snapquery.wikidata.dbis.rwth-aachen.de/query/scholia/author_list-of-publications which might work. A socket connection is created which might not work behind firewalls or on internet connections with high latency.

WolfgangFahl · 2024-05-10T18:40:55Z

version 0.0.8 of snapquery is ready. It has e.g.
http://snapquery.bitplan.com/api/meta_query/params_stats.github

params_stats

query

SELECT count(*),
    params 
FROM "QueryDetails" 
GROUP BY params 
ORDER BY 1 desc

result

count(*)	params
374
293	q
14	q1,q2
9	q,q
3	q,q,q
3	p
1	q,q2
1	q,q,q,q,q
1	q,doi,q,doi,q,doi,q,doi,q,doi
1	lexeme

egonw · 2024-10-19T11:14:16Z

version 0.0.8 of snapquery is ready. It has e.g. http://snapquery.bitplan.com/api/meta_query/params_stats.github

Some queries have more complex query parameters like here:

SELECT ?venue (COUNT(DISTINCT ?work) AS ?number_of_works) (COUNT(?citing_work) AS ?number_of_citations)
  WHERE {
    VALUES ?venue {  {% for q in qs %} wd:{{ q }} {% endfor %}  }
    OPTIONAL {
      ?work wdt:P1433 ?venue .
      OPTIONAL { ?citing_work wdt:P2860 ?work }
    }
  } 
  GROUP BY ?venue

See https://github.com/WDscholia/scholia/blob/master/scholia/app/templates/venues_list-of-venues.sparql#L5C5-L5C66

WolfgangFahl added the enhancement some suggestions to improve Scholia label Jan 26, 2024

This was referenced Jan 29, 2024

SPARQL named query infrastructure tholzheim/named-queries#1

Open

QLever missing features or unexpected behavior ad-freiburg/qlever#615

Open

Daniel-Mietchen mentioned this issue Feb 8, 2024

How does the Wikidata graph split affect scholia? #2423

Open

Adafede mentioned this issue Feb 8, 2024

Keep an eye on Wikidata graph split lotusnprod/lotus-search#70

Open

WolfgangFahl mentioned this issue Feb 19, 2024

Look into QLever as a potential query engine for Scholia #1774

Open

egonw mentioned this issue Feb 19, 2024

Top-level configurable #2429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert templated queries to named queries and separate concerns by introducing named query middleware #2412

convert templated queries to named queries and separate concerns by introducing named query middleware #2412

WolfgangFahl commented Jan 26, 2024 •

edited

Loading

WolfgangFahl commented Jan 26, 2024 •

edited

Loading

tholzheim commented Jan 26, 2024 •

edited

Loading

WolfgangFahl commented Jan 29, 2024

fnielsen commented Jan 29, 2024

WolfgangFahl commented Jan 29, 2024 •

edited

Loading

WolfgangFahl commented Feb 1, 2024

WolfgangFahl commented Feb 9, 2024

WolfgangFahl commented May 1, 2024

WolfgangFahl commented May 6, 2024

fnielsen commented May 6, 2024

WolfgangFahl commented May 6, 2024

WolfgangFahl commented May 10, 2024

egonw commented Oct 19, 2024

convert templated queries to named queries and separate concerns by introducing named query middleware #2412

convert templated queries to named queries and separate concerns by introducing named query middleware #2412

Comments

WolfgangFahl commented Jan 26, 2024 • edited Loading

WolfgangFahl commented Jan 26, 2024 • edited Loading

tholzheim commented Jan 26, 2024 • edited Loading

WolfgangFahl commented Jan 29, 2024

fnielsen commented Jan 29, 2024

WolfgangFahl commented Jan 29, 2024 • edited Loading

author_events_fan

result

WolfgangFahl commented Feb 1, 2024

WolfgangFahl commented Feb 9, 2024

WolfgangFahl commented May 1, 2024

WolfgangFahl commented May 6, 2024

fnielsen commented May 6, 2024

WolfgangFahl commented May 6, 2024

WolfgangFahl commented May 10, 2024

params_stats

query

result

egonw commented Oct 19, 2024

WolfgangFahl commented Jan 26, 2024 •

edited

Loading

WolfgangFahl commented Jan 26, 2024 •

edited

Loading

tholzheim commented Jan 26, 2024 •

edited

Loading

WolfgangFahl commented Jan 29, 2024 •

edited

Loading