-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dataset upload test (without using frontend) #8184
Conversation
WalkthroughThe pull request introduces enhancements to the end-to-end testing framework for dataset uploads. It adds a new test case in Changes
Assessment against linked issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I’m not very familiar with typescript or the existing e2e tests. @daniel-wer could you maybe also have a quick look?
43fff69
to
84eed1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review. Great addition to our tests, thank you! 🙏
Why did you choose to implement this without any frontend api code (for example, the reserveDatasetUpload
function from the admin_rest_api
)? I can imagine the resumable-js lib to be hard to work with in the tests, but was that the reason?
If it was by choice, do you think it would be worthwhile and possible (with reasonable effort) to add a similar test in a followup issue, testing the frontend functionality as well?
frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts
Outdated
Show resolved
Hide resolved
Co-authored-by: Daniel <[email protected]>
@daniel-wer I have tried to test the frontend. However it is just not easy to do. The resumable JS library uses Browser APIs e.g. File and XMLHttpRequest that are not available in the node test environment. I tried to mock them e.g. using JSDom, but it failed at sending the Multipart Formdata correctly (compare jsdom/jsdom#2300). So maybe it would be possible to do it but that way was tried and failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
test/e2e/End2EndSpec.scala (1)
54-58
: Consider enhancing error handling and loggingWhile the directory cleanup logic is a good addition for ensuring test isolation, consider these improvements:
- Add error handling for directory deletion
- Use proper logging instead of println
- Consider adding file locks or synchronization for parallel test execution
Here's a suggested improvement:
if (dataDirectory.exists()) { - println("Deleting existing data directory Organization_X") - PathUtils.deleteDirectoryRecursively(dataDirectory.toPath) + logger.info(s"Deleting existing data directory: ${dataDirectory.getPath}") + try { + PathUtils.deleteDirectoryRecursively(dataDirectory.toPath) + } catch { + case e: Exception => + logger.error(s"Failed to delete directory: ${e.getMessage}") + throw new Exception(s"Failed to prepare test environment: ${e.getMessage}", e) + } }frontend/javascripts/test/e2e-setup.ts (1)
70-70
: Consider using URL parsing for more robust protocol checkingWhile checking for both protocols is good, using
indexOf
for URL validation can be fragile. Consider using the URL API for more robust protocol checking.- if (url.indexOf("http:") === -1 && url.indexOf("https:") === -1) { + try { + const urlObj = new URL(url.toString()); + if (!['http:', 'https:'].includes(urlObj.protocol)) { + newUrl = `http://localhost:9000${url}`; + } + } catch { + newUrl = `http://localhost:9000${url}`; + }frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts (4)
123-123
: Avoid hardcodingfolderId
; retrieve it dynamicallyHardcoding the
folderId
may lead to test failures if the ID changes or differs between environments. Consider retrieving thefolderId
dynamically or using a constant defined in the test setup.
151-151
: Generate a dynamic boundary instead of hardcoding itIf manual construction of the multipart body is necessary, avoid hardcoding the boundary value. Generate it dynamically to prevent potential conflicts and comply with HTTP specifications.
Example:
- const boundary = "----WebKitFormBoundaryAqTsFa4N9FW7zF7I"; + const boundary = "----WebKitFormBoundary" + Math.random().toString(36).substring(2);
181-206
: Differentiate error messages for clearer debuggingThe error messages in
t.fail()
are identical at lines 182, 197, and 205. This can make debugging more difficult. Consider making each message unique to reflect the specific failure point.Apply this diff to update the error messages:
if (uploadResult.status !== 200) { - t.fail("Dataset upload failed"); + t.fail("Dataset upload failed at initial POST"); } if (finishResult.status !== 200) { - t.fail("Dataset upload failed at finish"); + t.fail("Dataset upload failed at finishUpload"); } if (result.status !== 200) { - t.fail("Dataset upload failed"); + t.fail("Dataset health check after upload failed"); }
127-127
: Avoid hardcoding organization names; use variables or configHardcoding
"Organization_X"
may lead to issues if the organization name changes. Consider using a variable or retrieving the organization name dynamically to make the test more maintainable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts
(2 hunks)frontend/javascripts/test/e2e-setup.ts
(3 hunks)test/e2e/End2EndSpec.scala
(2 hunks)
🔇 Additional comments (5)
test/e2e/End2EndSpec.scala (2)
3-3
: LGTM: Import statement properly updated
The addition of PathUtils
to the import statement is well-organized and properly grouped with related IO utilities.
54-58
: Well-implemented test environment preparation
The addition of directory cleanup before test dataset setup is a good practice that:
- Ensures test isolation
- Prevents interference from previous test runs
- Makes the tests more reliable and repeatable
This change aligns well with the PR's objective of implementing robust dataset upload tests.
frontend/javascripts/test/e2e-setup.ts (2)
6-6
: LGTM: Import additions support dataset upload testing
The additional imports of FormData
and File
from node-fetch are appropriate for implementing dataset upload tests without frontend dependencies.
87-88
: LGTM: Global assignments follow established pattern
The addition of FormData
and File
to the global scope is consistent with the file's pattern of making fetch-related APIs globally available for testing.
frontend/javascripts/test/backend-snapshot-tests/datasets.e2e.ts (1)
133-134
: Ensure the test dataset file exists before reading
Attempting to read a file that doesn't exist will cause the test to fail. Verify that the file test/dataset/test-dataset.zip
exists in the specified path, or include a check to handle the absence of the file gracefully.
Run the following script to check if the test dataset file exists:
✅ Verification successful
Test dataset file exists at the specified path
The file test/dataset/test-dataset.zip
exists in the repository, so there's no need for additional file existence checks in the test code.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the test dataset file exists at the expected path.
# Test: Check for the existence of 'test-dataset.zip'. Expect: File exists.
if [ -f "test/dataset/test-dataset.zip" ]; then
echo "File exists."
else
echo "File does not exist."
fi
Length of output: 77
|
||
const boundary = "----WebKitFormBoundaryAqTsFa4N9FW7zF7I"; | ||
let bodyString = `--${boundary}\r\n`; | ||
// @ts-ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Resolve the TypeScript error instead of using // @ts-ignore
Using // @ts-ignore
suppresses TypeScript errors, potentially hiding real issues. Instead, address the type mismatch by casting formData
appropriately or adjusting the code to satisfy the compiler.
Apply this change to resolve the typing issue:
- // @ts-ignore
for (const [key, value] of formData.entries()) {
+ for (const [key, value] of (formData as any).entries()) {
Committable suggestion skipped: line range outside the PR's diff.
const boundary = "----WebKitFormBoundaryAqTsFa4N9FW7zF7I"; | ||
let bodyString = `--${boundary}\r\n`; | ||
// @ts-ignore | ||
for (const [key, value] of formData.entries()) { | ||
bodyString += `Content-Disposition: form-data; name="${key}"\r\n\r\n${value}\r\n`; | ||
bodyString += `--${boundary}\r\n`; | ||
} | ||
bodyString += `Content-Disposition: form-data; name="file"; filename="test-dataset.zip"\r\n`; | ||
bodyString += "Content-Type: application/octet-stream\r\n\r\n"; | ||
|
||
// We have to send the file as bytes, otherwise JS does some encoding, resulting in erroneous bytes | ||
|
||
const formBytes = new TextEncoder().encode(bodyString); | ||
const fileBytes = new Uint8Array(testDataset); | ||
const endBytes = new TextEncoder().encode(`\r\n--${boundary}--`); | ||
const body = new Uint8Array(formBytes.length + fileBytes.length + endBytes.length); | ||
body.set(formBytes, 0); | ||
body.set(fileBytes, formBytes.length); | ||
body.set(endBytes, formBytes.length + fileBytes.length); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify multipart/form-data handling by using built-in FormData
methods
Manually constructing the multipart/form-data body and hardcoding the boundary can lead to errors and maintenance challenges. Use the built-in FormData
object directly with the fetch
API to handle multipart/form-data encoding automatically.
Apply this diff to simplify the upload request:
- // Manual construction of the multipart/form-data body
- const boundary = "----WebKitFormBoundaryAqTsFa4N9FW7zF7I";
- let bodyString = `--${boundary}\r\n`;
- // ...additional code to construct `bodyString`...
- const formBytes = new TextEncoder().encode(bodyString);
- const fileBytes = new Uint8Array(testDataset);
- const endBytes = new TextEncoder().encode(`\r\n--${boundary}--`);
- const body = new Uint8Array(formBytes.length + fileBytes.length + endBytes.length);
- body.set(formBytes, 0);
- body.set(fileBytes, formBytes.length);
- body.set(endBytes, formBytes.length + fileBytes.length);
- let content_type = `multipart/form-data; boundary=${boundary}`;
+ formData.append("file", new Blob([testDataset], { type: "application/octet-stream" }), "test-dataset.zip");
+
+ const uploadResult = await fetch("/data/datasets", {
+ method: "POST",
+ body: formData,
+ });
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for elaborating, that's understandable and good to have documented as part of this PR 👍
Steps to test:
Issues:
Summary by CodeRabbit
New Features
FormData
andFile
.Bug Fixes
fetchWrapper
function to ensure correct protocol handling.Refactor
ensureTestDataset
method to delete and recreate the data directory, improving test setup reliability.