Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multirequest API in pycurl_manager to support parameters encoding and properly handle gzip responses #11403

Closed
vkuznet opened this issue Dec 16, 2022 · 0 comments · Fixed by #11404

Comments

@vkuznet
Copy link
Contributor

vkuznet commented Dec 16, 2022

Impact of the bug
We need to fix multirequest API in pycurl_manager to support input parameter encoding and properly handle gzip HTTP responses

Describe the bug
The current version of multirequest API in pycurl_manager has two bugs:

  • it does not encode input HTTP request parameters due to missing encode parameter in API
  • it does not properly handle gzip HTTP response from server which provides it, e.g. DBS

How to reproduce it
I tested the following code which reveals these issues;

#!/usr/bin/env python3
import os
import time
from WMCore.Services.pycurl_manager import RequestHandler

def blockLumis(blocks):
    mgr = RequestHandler()
    ckey = os.getenv('X509_USER_KEY')
    cert = os.getenv('X509_USER_CERT')
    headers = {'Accept': 'application/json'}
    furl = 'https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader/filelumis'
    pairs = set()
    for blk in blocks:
        params = {'block_name': blk}
        data = mgr.getdata(furl, params=params, headers=headers, ckey=ckey, cert=cert, encode=True, decode=True)
        for row in data:
            pair = (row['lumi_section_num'], row['run_num'])
            pairs.add(pair)
    return len(pairs)

def concurrentBlockLumis(blocks):
    mgr = RequestHandler()
    pairs = set()
    ckey = os.getenv('X509_USER_KEY')
    cert = os.getenv('X509_USER_CERT')
    headers = {'Accept': 'application/json'}
    url = 'https://cmsweb-prod.cern.ch/dbs/prod/global/DBSReader/filelumis'
    parray = []
    for blk in blocks:
        params = {'block_name': blk}
        parray.append(params)
    data = mgr.multirequest(url, parray, headers=headers, ckey=ckey, cert=cert, encode=True)
    for row in data:
        pair = (row['lumi_section_num'], row['run_num'])
        pairs.add(pair)
    return len(pairs)


blocks = [
    '/LQToDEle_M-4000_single_TuneCP2_13TeV-madgraph-pythia8/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v1/MINIAOD
SIM#471e5596-af04-4423-a850-5ef9091f154f',
    '/LQToDEle_M-4000_single_TuneCP2_13TeV-madgraph-pythia8/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v1/MINIAOD
SIM#6eb03689-167a-472f-8b09-f4bfadad6a8a',
    '/LQToDEle_M-4000_single_TuneCP2_13TeV-madgraph-pythia8/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v1/MINIAOD
SIM#b8cdec8f-b664-49a6-ab2d-bb2a89893581',
    '/LQToDEle_M-4000_single_TuneCP2_13TeV-madgraph-pythia8/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v1/MINIAOD
SIM#ff78bb73-0e8c-41cb-9e51-381cfbdf15e2'
]

time0 = time.time()
res = blockLumis(blocks)
print(res, "in ", time.time()-time0, " seconds")
time0 = time.time()
res = concurrentBlockLumis(blocks)
print(res, "in ", time.time()-time0, " seconds")

Expected behavior
encode=False should be added to multirequest API and passed into internal function calls. And, multirequest API should properly handle gzip'ed content from upstream server.

Additional context and error message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants