Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include size in pkgselect responses #1744

Merged
merged 15 commits into from
Sep 24, 2020
Merged

Include size in pkgselect responses #1744

merged 15 commits into from
Sep 24, 2020

Conversation

kevinemoore
Copy link
Contributor

@kevinemoore kevinemoore commented Aug 11, 2020

Description

Add size to the pkgselect response. The implementation still does most of the work in s3select and pandas (C), but could be slow for large responses (e.g., a folder with many objects).

TODO

  • Unit Tests
  • Performance testing

@kevinemoore kevinemoore requested review from nl0 and akarve August 11, 2020 02:42
Kevin Moore added 2 commits August 11, 2020 12:18
Updating tests to account for the addition of size and physical key in the lambda response.

Switching from `to_json` to `to_dict` in file_list_to_folder to avoid double encoding the folder view.
@kevinemoore kevinemoore marked this pull request as ready for review August 12, 2020 21:19
@codecov-commenter
Copy link

codecov-commenter commented Aug 12, 2020

Codecov Report

Merging #1744 into master will increase coverage by 0.02%.
The diff coverage is 96.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1744      +/-   ##
==========================================
+ Coverage   89.44%   89.46%   +0.02%     
==========================================
  Files          62       62              
  Lines        7190     7215      +25     
==========================================
+ Hits         6431     6455      +24     
- Misses        759      760       +1     
Flag Coverage Δ
#api-python 87.87% <ø> (ø)
#lambda 92.27% <96.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
lambdas/pkgselect/index.py 84.05% <85.71%> (+0.72%) ⬆️
lambdas/pkgselect/test/test_pkgselect.py 100.00% <100.00%> (ø)
lambdas/shared/t4_lambda_shared/utils.py 98.63% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2018aa2...af31ec2. Read the comment docs.

Copy link
Member

@akarve akarve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good and you can merge at will, i've requested some comments and suggested some small changes and asked questions but nothing to be considered blocking

folder.reset_index(inplace=True)
folder.rename(columns={0: 'logical_key'}, inplace=True)
prefixes = folder[folder.logical_key.str.contains('/')].to_dict(orient='records')
objects = folder[~folder.logical_key.str.contains('/')].to_dict(orient='records')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in keys with successive // to both the regex in the groupby and here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// is illegal in logical keys isn't it?

lambdas/pkgselect/index.py Outdated Show resolved Hide resolved
lambdas/pkgselect/index.py Outdated Show resolved Hide resolved
lambdas/pkgselect/index.py Outdated Show resolved Hide resolved
Copy link
Member

@nl0 nl0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but the catalog will need adjustments to pick up the new response format

@nl0
Copy link
Member

nl0 commented Sep 17, 2020

@kevinemoore can you return meta for the package / prefix as well?

@nl0 nl0 self-requested a review September 21, 2020 15:01
Copy link
Member

@nl0 nl0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. let's not return physical_keys for prefixes, bc they are not relevant (they actually point to random objects under the prefix) and waste bandwidth

  2. when trying to query an object the api throws local variable 'result' referenced before assignment

  3. should we return the whole meta object here or just user_meta? @akarve

Kevin Moore added 2 commits September 21, 2020 13:31
Moving metadata for prefixes and top-level package metadata under `contents` in the response so that meta will show up in the same place for the detail and folder views.
Copy link
Member

@nl0 nl0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works now

@akarve
Copy link
Member

akarve commented Sep 22, 2020

  1. should we return the whole meta object here or just user_meta? @akarve
    Whole object

objects = folder[~folder.str.endswith('/')].sort_values().tolist()
except AttributeError:
groups = df.groupby(df.logical_key.str.extract('([^/]+/?).*')[0], dropna=True)
folder = groups.agg(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this automatically skip undefined values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

lambdas/pkgselect/index.py Outdated Show resolved Hide resolved
@akarve akarve merged commit e5c62b4 into master Sep 24, 2020
@akarve akarve deleted the pkgselect-size branch September 24, 2020 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants