-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ft future #1
Ft future #1
Conversation
Unable to locate .performanceTestingBot config file |
Seems you are using me but didn't get OPENAI_API_KEY seted in Variables/Secrets for this repo. you could follow readme for more information |
Processing PR updates... |
Thanks @2lambda123 for opening this PR! For COLLABORATOR only :
|
My review is in progress 📖 - I will have feedback for you in a few minutes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2lambda123
Thank you for your contribution to this repository! We appreciate your effort in opening pull request.
Happy coding!
Reviewer's Guide by SourceryThis pull request implements significant changes to the project structure, build process, and functionality. It includes updates to Go package imports, modifications to protobuf files, changes to shell scripts, and the addition of new Go files and shell scripts. The changes appear to be part of a larger refactoring effort to improve the distributed execution capabilities of the system. File-Level Changes
Tips
|
Their most recently public accepted PR is: 2lambda123/gchq-stroom-timeline#2 |
Your organization has reached the subscribed usage limit. You can upgrade your account by purchasing a subscription at Stripe payment link Disclaimer: This comment was entirely generated using AI. Be aware that the information provided may be incorrect. Current plan usage: 100.84% Have feedback or need help? |
First PR by @2lambda123 PR Details of @2lambda123 in binpash-dish :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2lambda123
Thank you for your contribution to this repository! We appreciate your effort in closing pull request.
Happy coding!
Warning Rate limit exceeded@labels-and-badges[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 20 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe changes encompass updates to various configuration files, the introduction of new scripts, and modifications to existing code in a project focused on distributed systems and data processing. Key alterations include adjustments to Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script
participant HDFS
User->>Script: Execute words.sh
Script->>Script: Create words-repeated.txt
Script->>Script: Append words to file
Script->>HDFS: Upload words-repeated.txt
Script->>Script: Remove local file
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Failed to generate code suggestions for PR |
for script_faults_out in "$folder"/*faults.out; do | ||
# Extract the script name without the extension | ||
script_name=$(basename "$script_faults_out" .faults.out) | ||
|
||
# Check if there is a corresponding .distr.out file | ||
script_distr_out="$folder/$script_name.distr.out" | ||
|
||
if [ -f "$script_distr_out" ]; then | ||
# Perform a diff between the two files | ||
echo "Comparing faults_out and distr_out for script $script_name.sh" | ||
if diff -q "$script_faults_out" "$script_distr_out"; then | ||
echo "Outputs are identical" | ||
else | ||
echo "Files are different. Differences are as follows:" | ||
diff -y "$script_faults_out" "$script_distr_out" | ||
fi | ||
echo "-------------------------------------------" | ||
fi | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script does not handle the case where there are no .faults.out
files in the specified folder. This could lead to misleading output or no output at all, which might be confusing for users.
Recommended Solution:
Add a check to see if any .faults.out
files exist in the folder before entering the loop. If no such files are found, print a message indicating that no files were found.
# Check if there are any .faults.out files in the folder
if compgen -G "$folder/*faults.out" > /dev/null; then
# Loop through the files in the folder
for script_faults_out in "$folder"/*faults.out; do
# Extract the script name without the extension
script_name=$(basename "$script_faults_out" .faults.out)
# Check if there is a corresponding .distr.out file
script_distr_out="$folder/$script_name.distr.out"
if [ -f "$script_distr_out" ]; then
# Perform a diff between the two files
echo "Comparing faults_out and distr_out for script $script_name.sh"
if diff -q "$script_faults_out" "$script_distr_out"; then
echo "Outputs are identical"
else
echo "Files are different. Differences are as follows:"
diff -y "$script_faults_out" "$script_distr_out"
fi
echo "-------------------------------------------"
fi
done
else
echo "No .faults.out files found in the folder."
fi
hdfs dfs -mkdir -p /oneliners | ||
|
||
if [ ! -f ./1M.txt ]; then | ||
curl -sf --connect-timeout 10 'http://ndr.md/data/dummy/1M.txt' > 1M.txt | ||
curl -sf --connect-timeout 10 'atlas-group.cs.brown.edu/data/dummy/1M.txt' > 1M.txt | ||
if [ $? -ne 0 ]; then | ||
curl -f 'https://zenodo.org/record/7650885/files/1M.txt' > 1M.txt | ||
[ $? -ne 0 ] && eexit 'cannot find 1M.txt' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for downloading the 1M.txt
file does not handle the case where both URLs fail to download the file. The script attempts to download from the first URL and, if it fails, tries the second URL. If both attempts fail, it calls eexit
, but eexit
is not defined in the provided code snippet, which will cause the script to fail silently.
Recommended Solution:
Define the eexit
function or replace it with a standard error handling mechanism like echo
and exit 1
.
eexit() {
echo "$1" >&2
exit 1
}
Or replace the eexit
call with:
[ $? -ne 0 ] && { echo 'cannot find 1M.txt' >&2; exit 1; }
fi | ||
|
||
if [ ! -f ./1G.txt ]; then | ||
curl -sf --connect-timeout 10 'http://ndr.md/data/dummy/1G.txt' > 1G.txt | ||
curl -sf --connect-timeout 10 'atlas-group.cs.brown.edu/data/dummy/1G.txt' > 1G.txt | ||
if [ $? -ne 0 ]; then | ||
touch 1G.txt | ||
for (( i = 0; i < 10; i++ )); do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for creating the 1G.txt
file does not handle the case where the curl
command fails to download the file from the specified URL. If the curl
command fails, it proceeds to create the file by concatenating 100M.txt
ten times. However, if 100M.txt
does not exist, this will result in an empty 1G.txt
file without any error notification.
Recommended Solution:
Add a check to ensure 100M.txt
exists before attempting to concatenate it to create 1G.txt
.
if [ ! -f ./1G.txt ]; then
curl -sf --connect-timeout 10 'atlas-group.cs.brown.edu/data/dummy/1G.txt' > 1G.txt
if [ $? -ne 0 ]; then
if [ ! -f ./100M.txt ]; then
echo "100M.txt not found, cannot create 1G.txt" >&2
exit 1
fi
touch 1G.txt
for (( i = 0; i < 10; i++ )); do
cat 100M.txt >> 1G.txt
done
fi
fi
|
||
# download wamerican-insane dictionary and sort according to machine | ||
if [ ! -f ./dict.txt ]; then | ||
curl -sf --connect-timeout 10 'http://ndr.md/data/dummy/dict.txt' | sort > dict.txt | ||
curl -sf --connect-timeout 10 'atlas-group.cs.brown.edu/data/dummy/dict.txt' | sort > dict.txt | ||
if [ $? -ne 0 ]; then | ||
sort words > sorted_words | ||
fi | ||
fi | ||
|
||
if [ ! -f ./all_cmds.txt ]; then | ||
curl -sf --connect-timeout 10 'http://ndr.md/data/dummy/all_cmds.txt' > all_cmds.txt | ||
curl -sf --connect-timeout 10 'atlas-group.cs.brown.edu/data/dummy/all_cmds.txt' > all_cmds.txt | ||
if [ $? -ne 0 ]; then | ||
# This should be OK for tests, no need for abort | ||
ls /usr/bin/* > all_cmds.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for downloading the dict.txt
and all_cmds.txt
files does not handle the case where the curl
command fails to download the files from the specified URLs. If the curl
command fails, it proceeds to use alternative methods to create the files, but it does not notify the user of the failure.
Recommended Solution:
Add error notifications to inform the user when the curl
command fails to download the files.
if [ ! -f ./dict.txt ]; then
curl -sf --connect-timeout 10 'atlas-group.cs.brown.edu/data/dummy/dict.txt' | sort > dict.txt
if [ $? -ne 0 ]; then
echo "Failed to download dict.txt, using local words file" >&2
sort words > sorted_words
fi
fi
if [ ! -f ./all_cmds.txt ]; then
curl -sf --connect-timeout 10 'atlas-group.cs.brown.edu/data/dummy/all_cmds.txt' > all_cmds.txt
if [ $? -ne 0 ]; then
echo "Failed to download all_cmds.txt, using local /usr/bin/*" >&2
ls /usr/bin/* > all_cmds.txt
fi
append_nl_if_not ./all_cmds.txt
fi
for file in "${input_files[@]}"; do | ||
hdfs dfs -put $file /oneliners/$file | ||
rm -f $file | ||
done No newline at end of file | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for uploading files to HDFS and then deleting them locally does not handle errors that may occur during the hdfs dfs -put
command. If the hdfs dfs -put
command fails, the script will still proceed to delete the local files, potentially resulting in data loss.
Recommended Solution:
Add error handling to ensure that local files are only deleted if the hdfs dfs -put
command succeeds.
for file in "${input_files[@]}"; do
hdfs dfs -put $file /oneliners/$file
if [ $? -eq 0 ]; then
rm -f $file
else
echo "Failed to upload $file to HDFS" >&2
fi
done
# pkill -9 -f worker.sh | ||
# pkill -9 -f hdfs | ||
|
||
ps aux | grep -E 'dish|pash|hdfs' | grep -Ev 'killall|dish\|pash\|hdfs|worker.py' | awk '{print $2}' | xargs kill -9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of kill -9
is very forceful and can lead to data corruption or other unintended side effects because it does not allow the processes to clean up. It is generally better to use a gentler signal like SIGTERM
(signal 15) first, and only use SIGKILL
(signal 9) if the process does not terminate.
Recommended Solution:
Use kill -15
initially and only escalate to kill -9
if necessary.
ps aux | grep -E 'dish|pash|hdfs' | grep -Ev 'killall|dish\|pash\|hdfs|worker.py' | awk '{print $2}' | xargs kill -15
sleep 5
ps aux | grep -E 'dish|pash|hdfs' | grep -Ev 'killall|dish\|pash\|hdfs|worker.py' | awk '{print $2}' | xargs kill -9
@@ -0,0 +1,3 @@ | |||
command="$DISH_TOP/runtime/bin/datastream --type read $@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using eval
to construct and execute commands can be dangerous as it may lead to command injection vulnerabilities. Instead, consider using an array to safely handle arguments.
Recommended Solution:
command=("$DISH_TOP/runtime/bin/datastream" "--type" "read" "$@")
"${command[@]}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Micro-Learning Topic: OS command injection (Detected by phrase)
Matched on "command injection"
In many situations, applications will rely on OS provided functions, scripts, macros and utilities instead of reimplementing them in code. While functions would typically be accessed through a native interface library, the remaining three OS provided features will normally be invoked via the command line or launched as a process. If unsafe inputs are used to construct commands or arguments, it may allow arbitrary OS operations to be performed that can compromise the server.
Try a challenge in Secure Code Warrior
Helpful references
- OWASP Command Injection - OWASP community page with comprehensive information about command injection, and links to various OWASP resources to help detect or prevent it.
- OWASP testing for Command Injection - This article is focused on providing testing techniques for identifying command injection flaws in your applications
@@ -0,0 +1,3 @@ | |||
command="$DISH_TOP/runtime/bin/datastream --type write $@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using eval
to construct and execute commands can be dangerous as it may lead to command injection vulnerabilities. Instead, consider using an array to safely handle arguments.
Recommended Solution:
command=("$DISH_TOP/runtime/bin/datastream" "--type" "write" "$@")
"${command[@]}"
@@ -60,10 +60,12 @@ echo -e "\nexport PATH=\$PATH:$(go env GOPATH)/bin" >> ~/.bashrc | |||
export PATH="$PATH:$(go env GOPATH)/bin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line export PATH="$PATH:$(go env GOPATH)/bin"
is redundant as it appears earlier in the script at line 43. This redundancy can lead to confusion and maintenance issues.
Recommended Solution:
Remove the redundant line to ensure the script is clean and maintainable.
for i in {1..100} | ||
do | ||
cat /usr/share/dict/words >> words-repeated.txt | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeatedly using cat
in a loop to append the contents of a file can be inefficient, especially for large files. This approach can be optimized to improve performance.
Recommended Solution:
Instead of using a loop, you can use the yes
command to repeat the content and then trim it to the desired number of repetitions:
yes "$(cat /usr/share/dict/words)" | head -n 100 > words-repeated.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @2lambda123 - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
Description
Related Issue
Types of changes
Checklist:
Note
I'm currently writing a description for your pull request. I should be done shortly (<1 minute). Please don't edit the description field until I'm finished, or we may overwrite each other.
Summary by Sourcery
Refactor the project structure by updating protobuf paths and package imports, introduce a poison pill mechanism for node termination, and enhance the build and setup scripts. Add new scripts for distributed execution and fault tolerance testing, and remove obsolete files from the old directory structure.
New Features:
Enhancements:
Build:
Chores:
Summary by CodeRabbit
New Features
check_ft_correctness.sh
for automated output comparison in distributed benchmarks.words.sh
to create and upload a repeated dictionary file to HDFS.Improvements
setup.sh
for downloading necessary files.dfs_split_reader
.Chores
.gitignore
files to streamline project management by excluding unnecessary files.