Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add S3 data to existing package #2180

Merged
merged 12 commits into from
Apr 27, 2021
Merged

add S3 data to existing package #2180

merged 12 commits into from
Apr 27, 2021

Conversation

sir-sigurd
Copy link
Member

@sir-sigurd sir-sigurd commented Apr 22, 2021

Description

See #2171 for frontend part.

TODO

@codecov
Copy link

codecov bot commented Apr 22, 2021

Codecov Report

Merging #2180 (949b82a) into master (ff6eb3f) will increase coverage by 0.54%.
The diff coverage is 98.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2180      +/-   ##
==========================================
+ Coverage   47.42%   47.96%   +0.54%     
==========================================
  Files         441      441              
  Lines       21138    21357     +219     
  Branches     2436     2436              
==========================================
+ Hits        10024    10244     +220     
+ Misses      10206    10205       -1     
  Partials      908      908              
Flag Coverage Δ
api-python 89.85% <ø> (ø)
catalog 16.30% <ø> (ø)
lambda 94.00% <98.29%> (+0.41%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
lambdas/pkgpush/tests/conftest.py 100.00% <ø> (ø)
lambdas/pkgpush/index.py 96.68% <95.65%> (+2.30%) ⬆️
lambdas/pkgpush/tests/test_index.py 100.00% <100.00%> (ø)
lambdas/shared/t4_lambda_shared/utils.py 98.70% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff6eb3f...949b82a. Read the comment docs.

lambdas/pkgpush/index.py Outdated Show resolved Hide resolved
lambdas/pkgpush/index.py Outdated Show resolved Hide resolved
@sir-sigurd sir-sigurd force-pushed the pkg-lambda-large-request branch 3 times, most recently from 2eacd11 to 2567a7d Compare April 23, 2021 16:36
@sir-sigurd sir-sigurd force-pushed the pkg-lambda-large-request branch 2 times, most recently from fb43228 to db50b31 Compare April 26, 2021 17:57
@sir-sigurd sir-sigurd marked this pull request as ready for review April 26, 2021 18:37
@sir-sigurd sir-sigurd requested a review from akarve April 26, 2021 18:49
@sir-sigurd sir-sigurd changed the title WIP: package creation lambda large requests add S3 data to existing package Apr 26, 2021
@@ -120,11 +127,51 @@
}


PACKAGE_CREATE_SCHEMA = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I've seen this before. Does it belong in lambda_shared?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have seen that in the registry 🙂,

def validator(data):
ex = next(iter_errors(data), None)
if ex is not None:
raise ApiException(HTTPStatus.BAD_REQUEST, ex.message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More useful if we send all errors back. This only sends the first one back.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that's how it currently works and there is no reason to change it right now.

def inner(f):
@functools.wraps(f)
def wrapper(request):
version_id = request.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is version_id really the only thing in the data string? What about unversioned buckets?



def large_request_handler(request_type):
user_request_key = f'user-requests/{request_type}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is thread safe at all. What happens if multiple users issue concurrent requests of the same type? Won't the be reading/deleting the same file in the TemporaryFile loop below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users use the same key to write requests, but they write to the versioned bucket created by us via Cloudformation, so each request get its own ID (version ID of object). A single lambda instance process only a single request at the same time.

def create_package(request):
json_iterator = map(json.JSONDecoder().decode, (line.decode() for line in request.stream))

data = next(json_iterator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest calling this first_line or something because the first line is special. data makes more sense if you're going to iterate with it.

lambdas/pkgpush/index.py Outdated Show resolved Hide resolved
lambdas/pkgpush/tests/test_index.py Outdated Show resolved Hide resolved
@@ -15,6 +15,8 @@

PACKAGE_INDEX_SUFFIX = "_packages"

LAMBDA_TMP_SPACE = 512 * 2 ** 20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AWS says 512 MB not 512 MiB

Suggested change
LAMBDA_TMP_SPACE = 512 * 2 ** 20
LAMBDA_TMP_SPACE = 512_000_000

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run shutil.disk_usage('/tmp/') and got this:

usage(total=551346176, used=892928, free=538333184)
In [23]: 538333184 / 2 ** 20
Out[23]: 513.39453125

I think it might make sense to drop that check and simply try to preallocate space using os.posix_fallocate() to fail early.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.posix_fallocate() fails with PermissionError in lambda 😞 .

lambdas/pkgpush/tests/test_index.py Outdated Show resolved Hide resolved
@sir-sigurd sir-sigurd merged commit 85edf0c into master Apr 27, 2021
@sir-sigurd sir-sigurd deleted the pkg-lambda-large-request branch April 27, 2021 17:34
nl0 added a commit that referenced this pull request May 4, 2021
* master: (54 commits)
  Use stable nginx version for catalog image (#2182)
  Ability to add S3 folders / files to package (#2171)
  lambda for adding S3 data to existing package (#2180)
  use github tarball for faster installation (#2181)
  Bump py from 1.7.0 to 1.10.0 in /lambdas/es/indexer (#2176)
  Bump py from 1.8.0 to 1.10.0 in /lambdas/s3select (#2177)
  Bump py from 1.8.0 to 1.10.0 in /lambdas/thumbnail (#2178)
  Allow unicode characters for package routes by allowing any character (#2179)
  Additional NotFoundPage scoped to Bucket (#2175)
  Docs: fix catalog config path (#2168)
  rework pkgpush auth (#2170)
  Use AWS credentials for directory package and copy package submit (#2172)
  Document package push limitations in catalog [ci skip] (#2161)
  Preview warnings accordion (#2167)
  tweak warning text (#2169)
  Copy tweaks (#2164)
  Don't crash pkgselect for empty manifests (#2147)
  add codecov config (#2155)
  Simplify warning messages for package name (#2134)
  Move package API requests to one file, consolidate naming and internal API (#2154)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants