Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate db query use by the dataverse page #7802

Closed
landreev opened this issue Apr 14, 2021 · 5 comments
Closed

Investigate db query use by the dataverse page #7802

landreev opened this issue Apr 14, 2021 · 5 comments
Assignees

Comments

@landreev
Copy link
Contributor

A sub-issue to #7788, overall performance improvement issue.
We have a process (that's documented, maybe?) for counting the queries a page generates; and there are known techniques for reducing that number.
We've applied it in the past to the dataverse and dataset pages. It is time to revisit it (let's start with the dataverse page; which appears to have gotten very slow in production).

@djbrooke
Copy link
Contributor

  • Desire would be that we identify queries that are problematic and fix them
  • (ex. 200 queries on the page ~2 years ago, we knocked it down, but we should check again)

@djbrooke djbrooke added the Large label Apr 14, 2021
@pdurbin
Copy link
Member

pdurbin commented Apr 20, 2021

We have a process (that's documented, maybe?) for counting the queries a page generates

The process is documented as scripts and a README, added in 5b3cc46.

@kcondon
Copy link
Contributor

kcondon commented Jun 2, 2021

This may need a separate issue or just be part of this one at root but navigating to page 2, 3, etc of homepage shows a similar slowness. May be obvious but noticeable when you interact with it.

@landreev
Copy link
Contributor Author

landreev commented Sep 1, 2021

This one was supposed to be a compact issue just for counting the database queries issued by the page; i.e. just one segment of the overall speedup process. It just got stuck in the "in progress" column, as a placeholder for the whole optimization effort.
#7804 could have been a better choice.
But yes, the queries. Here's a typical snapshot (this is from loading the page under generic 5.6, as a guest user, on the performance cluster):

    157 SELECT ID, ALIAS, DESCRIPTION, NAME, PERMISSIONBITS, OWNER_ID FROM DATAVERSEROLE
    128 SELECT t0.ID, t0.DTYPE, t0.AUTHORITY, t0.CREATEDATE, t0.GLOBALIDCREATETIME, t0.IDENTIFIER, t0.IDENTIFIERREGISTERED, t0.INDEXTIME, t0.MODIFICATIONTIME, t0.PERMISSIONINDEXTIME, t0.PERMISSIONMODIFICATIONTIME, t0.PREVIEWIMAGEAVAILABLE, t0.PROTOCOL, t0.PUBLICATIONDATE, t0.STORAGEIDENTIFIER, t0.CREATOR_ID, t0.OWNER_ID, t0.RELEASEUSER_ID, t1.ID, t1.AFFILIATION, t1.ALIAS, t1.DATAVERSETYPE, t1.description, t1.FACETROOT, t1.GUESTBOOKROOT, t1.METADATABLOCKROOT, t1.METADATALANGUAGE, t1.NAME, t1.PERMISSIONROOT, t1.STORAGEDRIVER, t1.TEMPLATEROOT, t1.THEMEROOT, t1.DEFAULTCONTRIBUTORROLE_ID, t1.DEFAULTTEMPLATE_ID FROM DVOBJECT t0, DATAVERSE t1
    128 SELECT ID, BACKGROUNDCOLOR, LINKCOLOR, LINKURL, LOGO, LOGOALIGNMENT, LOGOBACKGROUNDCOLOR, LOGOFOOTER, LOGOFOOTERALIGNMENT, LOGOFOOTERBACKGROUNDCOLOR, LOGOFORMAT, TAGLINE, TEXTCOLOR, dataverse_id FROM DATAVERSETHEME
    118 SELECT ID, AUTHENTICATIONPROVIDERID, PERSISTENTUSERID, AUTHENTICATEDUSER_ID FROM AUTHENTICATEDUSERLOOKUP
    118 SELECT ID, AFFILIATION, CREATEDTIME, DEACTIVATED, DEACTIVATEDTIME, EMAIL, EMAILCONFIRMED, FIRSTNAME, LASTAPIUSETIME, LASTLOGINTIME, LASTNAME, POSITION, SUPERUSER, USERIDENTIFIER FROM AUTHENTICATEDUSER
     49 SELECT t0.ID, t0.DESCRIPTION, t0.DISPLAYNAME, t0.GROUPALIAS, t0.GROUPALIASINOWNER, t0.OWNER_ID FROM EXPLICITGROUP t0, ExplicitGroup_CONTAINEDROLEASSIGNEES t1
     49 SELECT DISTINCT DTYPE FROM DVOBJECT
     45 SELECT ID, ASSIGNEEIDENTIFIER, PRIVATEURLANONYMIZEDACCESS, PRIVATEURLTOKEN, DEFINITIONPOINT_ID, ROLE_ID FROM ROLEASSIGNMENT
     34 SELECT ID, CONTENT, LANG, NAME FROM SETTING
     30 SELECT ID, DISPLAYNAME, NAME, namespaceuri, owner_id FROM METADATABLOCK
     10 SELECT ID, CREATETIME, NAME, USAGECOUNT, DATAVERSE_ID, termsOfUseAndAccess_id FROM TEMPLATE
     10 SELECT ID, ADVANCEDSEARCHFIELDTYPE, ALLOWCONTROLLEDVOCABULARY, ALLOWMULTIPLES, description, DISPLAYFORMAT, DISPLAYONCREATE, DISPLAYORDER, FACETABLE, FIELDTYPE, name, REQUIRED, title, uri, VALIDATIONFORMAT, WATERMARK, METADATABLOCK_ID, PARENTDATASETFIELDTYPE_ID FROM DATASETFIELDTYPE
      8 SELECT t0.VERSIONSTATE, t1.ALIAS, t2.THUMBNAILFILE_ID, t2.USEGENERICTHUMBNAIL, t3.STORAGEIDENTIFIER FROM DATASETVERSION t0, DATAVERSE t1, DATASET t2, DVOBJECT t3
      7 SELECT t0.ID, t0.CREATEDATE, t0.INDEXTIME, t0.MODIFICATIONTIME, t0.PERMISSIONINDEXTIME, t0.PERMISSIONMODIFICATIONTIME, t0.PUBLICATIONDATE, t0.CREATOR_ID, t0.RELEASEUSER_ID, t0.PREVIEWIMAGEAVAILABLE, t1.CONTENTTYPE, t0.STORAGEIDENTIFIER, t1.FILESIZE, t1.INGESTSTATUS, t1.CHECKSUMVALUE, t1.RESTRICTED, t3.ID, t2.AUTHORITY, t2.IDENTIFIER, t1.CHECKSUMTYPE, t1.PREVIOUSDATAFILEID, t1.ROOTDATAFILEID, t0.AUTHORITY, T0.PROTOCOL, T0.IDENTIFIER FROM DVOBJECT t0, DATAFILE t1, DVOBJECT t2, DATASET t3
      7 SELECT df.id FROM datafile df, filemetadata fm, datasetversion dv, dvobject o
      2 SELECT t0.AFFILIATION, t0.ALIAS, t2.ALIAS FROM DATAVERSE t0, DVOBJECT t1, DATAVERSE t2, DVOBJECT t3
      2 SELECT logo FROM dataversetheme
      2 SELECT ID, UNF, ARCHIVALCOPYLOCATION, ARCHIVENOTE, ARCHIVETIME, CREATETIME, DEACCESSIONLINK, LASTUPDATETIME, MINORVERSIONNUMBER, RELEASETIME, VERSION, VERSIONNOTE, VERSIONNUMBER, VERSIONSTATE, DATASET_ID, termsOfUseAndAccess_id FROM DATASETVERSION
      2 SELECT ID, DISPLAYORDER, datasetfieldtype_id, dataverse_id FROM DATAVERSEFACET
      2 SELECT ID, AVAILABILITYSTATUS, CITATIONREQUIREMENTS, CONDITIONS, CONFIDENTIALITYDECLARATION, CONTACTFORACCESS, DATAACCESSPLACE, DEPOSITORREQUIREMENTS, DISCLAIMER, FILEACCESSREQUEST, LICENSE, ORIGINALARCHIVE, RESTRICTIONS, SIZEOFCOLLECTION, SPECIALPERMISSIONS, STUDYCOMPLETION, TERMSOFACCESS, TERMSOFUSE FROM TERMSOFUSEANDACCESS
      2 SELECT ID, ADVANCEDSEARCHFIELDTYPE, ALLOWCONTROLLEDVOCABULARY, ALLOWMULTIPLES, description, DISPLAYFORMAT, DISPLAYONCREATE, DISPLAYORDER, FACETABLE, FIELDTYPE, name, REQUIRED, title, uri, VALIDATIONFORMAT, WATERMARK, METADATABLOCK_ID, PARENTDATASETFIELDTYPE_ID FROM DATASETFIELDTYPE ORDER BY ID
      2 SELECT ID, ACTIVE, DISMISSIBLEBYUSER FROM BANNERMESSAGE
      2 SELECT DISTINCT t0.ID, t0.DTYPE, t0.DESCRIPTION, t0.DISPLAYNAME, t0.PERSISTEDGROUPALIAS FROM IPV4RANGE t1 LEFT OUTER JOIN PERSISTEDGLOBALGROUP t0 ON (t0.ID = t1.OWNER_ID)
      1 UPDATE dataset SET thumbnailfile_id=4810047
      1 UPDATE dataset SET thumbnailfile_id=4810032
      1 UPDATE dataset SET thumbnailfile_id=4810012
      1 UPDATE dataset SET thumbnailfile_id=4809998
      1 UPDATE dataset SET thumbnailfile_id=4809990
      1 UPDATE dataset SET thumbnailfile_id=4808029
      1 UPDATE dataset SET thumbnailfile_id=4143664
      1 SELECT d.id, h.archiveDescription FROM harvestingClient h, dataset d
      1 SELECT d.id, d.alias, d.name FROM dataverse d, DataverseFeaturedDataverse f
      1 select count(o.id) from GuestbookResponse  o

there obviously numerous repeated queries (on closer inspection, they are indeed repeated lookup of the same objects - like the root dataverse, etc.)
What's interesting is that the number of these queries doesn't go by that much, if you comment out the inclusion of the search fragment, and disable the search query on page init: T
(This is counter-intuitive because running that search and displaying the results is what the page does, it should constitute most of its workload!)

    144 SELECT ID, ALIAS, DESCRIPTION, NAME, PERMISSIONBITS, OWNER_ID FROM DATAVERSEROLE
    102 SELECT t0.ID, t0.DTYPE, t0.AUTHORITY, t0.CREATEDATE, t0.GLOBALIDCREATETIME, t0.IDENTIFIER, t0.IDENTIFIERREGISTERED, t0.INDEXTIME, t0.MODIFICATIONTIME, t0.PERMISSIONINDEXTIME, t0.PERMISSIONMODIFICATIONTIME, t0.PREVIEWIMAGEAVAILABLE, t0.PROTOCOL, t0.PUBLICATIONDATE, t0.STORAGEIDENTIFIER, t0.CREATOR_ID, t0.OWNER_ID, t0.RELEASEUSER_ID, t1.ID, t1.AFFILIATION, t1.ALIAS, t1.DATAVERSETYPE, t1.description, t1.FACETROOT, t1.GUESTBOOKROOT, t1.METADATABLOCKROOT, t1.METADATALANGUAGE, t1.NAME, t1.PERMISSIONROOT, t1.STORAGEDRIVER, t1.TEMPLATEROOT, t1.THEMEROOT, t1.DEFAULTCONTRIBUTORROLE_ID, t1.DEFAULTTEMPLATE_ID FROM DVOBJECT t0, DATAVERSE t1
    102 SELECT ID, BACKGROUNDCOLOR, LINKCOLOR, LINKURL, LOGO, LOGOALIGNMENT, LOGOBACKGROUNDCOLOR, LOGOFOOTER, LOGOFOOTERALIGNMENT, LOGOFOOTERBACKGROUNDCOLOR, LOGOFORMAT, TAGLINE, TEXTCOLOR, dataverse_id FROM DATAVERSETHEME
    102 SELECT ID, AUTHENTICATIONPROVIDERID, PERSISTENTUSERID, AUTHENTICATEDUSER_ID FROM AUTHENTICATEDUSERLOOKUP
    102 SELECT ID, AFFILIATION, CREATEDTIME, DEACTIVATED, DEACTIVATEDTIME, EMAIL, EMAILCONFIRMED, FIRSTNAME, LASTAPIUSETIME, LASTLOGINTIME, LASTNAME, POSITION, SUPERUSER, USERIDENTIFIER FROM AUTHENTICATEDUSER
     42 SELECT t0.ID, t0.DESCRIPTION, t0.DISPLAYNAME, t0.GROUPALIAS, t0.GROUPALIASINOWNER, t0.OWNER_ID FROM EXPLICITGROUP t0, ExplicitGroup_CONTAINEDROLEASSIGNEES t1
     42 SELECT ID, ASSIGNEEIDENTIFIER, PRIVATEURLANONYMIZEDACCESS, PRIVATEURLTOKEN, DEFINITIONPOINT_ID, ROLE_ID FROM ROLEASSIGNMENT
     42 SELECT DISTINCT DTYPE FROM DVOBJECT
     30 SELECT ID, CONTENT, LANG, NAME FROM SETTING
      2 SELECT ID, ACTIVE, DISMISSIBLEBYUSER FROM BANNERMESSAGE
      1 SELECT d.id, d.alias, d.name FROM dataverse d, DataverseFeaturedDataverse f
      1 select count(o.id) from GuestbookResponse  o

The 100+ repeated lookups on the dataverse, dvobject and authenticateduser objects are still there. And should be reducible. I don't have a working fix yet, but have some ideas.

@mreekie mreekie removed the sz.Large label Jan 11, 2023
@cmbz
Copy link

cmbz commented Jan 21, 2025

2025-01-22: Closing as not planned (as part of our backlog reduction effort).

@cmbz cmbz closed this as completed Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants