-
-
Notifications
You must be signed in to change notification settings - Fork 743
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2088 from FedML-AI/alexleung/dev_v070_for_refactor
Alexleung/dev v070 for refactor
- Loading branch information
Showing
70 changed files
with
3,378 additions
and
1,210 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
16 changes: 16 additions & 0 deletions
16
python/examples/deploy/dummy_failed/dummy_failed_scale_out/config.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
workspace: "./src" | ||
|
||
inference_image_name: "raphaeljin/fedml" | ||
enable_custom_image: true | ||
|
||
bootstrap: | | ||
echo "Bootstrap start..." | ||
cat serve_main.py | ||
echo "Bootstrap finished" | ||
# Simulate a successful deployment | ||
job: | | ||
python3 serve_main.py | ||
auto_detect_public_ip: true | ||
use_gpu: true |
39 changes: 39 additions & 0 deletions
39
python/examples/deploy/dummy_failed/dummy_failed_scale_out/src/serve_main.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
import os | ||
|
||
from fedml.serving import FedMLPredictor | ||
from fedml.serving import FedMLInferenceRunner | ||
import uuid | ||
import torch | ||
|
||
# Calculate the number of elements | ||
num_elements = 1_073_741_824 // 4 # using integer division for whole elements | ||
|
||
|
||
class DummyPredictor(FedMLPredictor): | ||
def __init__(self): | ||
super().__init__() | ||
# Create a tensor with these many elements | ||
tensor = torch.empty(num_elements, dtype=torch.float32) | ||
|
||
# Move the tensor to GPU | ||
tensor_gpu = tensor.cuda() | ||
|
||
# for debug | ||
with open("/tmp/dummy_gpu_occupier.txt", "w") as f: | ||
f.write("GPU is occupied") | ||
|
||
if os.getenv("FEDML_REPLICA_RANK") == "2": | ||
# Simulate a failure | ||
raise Exception("Simulated failure") | ||
exit(1) | ||
|
||
self.worker_id = uuid.uuid4() | ||
|
||
def predict(self, request): | ||
return {f"AlohaV0From{self.worker_id}": request} | ||
|
||
|
||
if __name__ == "__main__": | ||
predictor = DummyPredictor() | ||
fedml_inference_runner = FedMLInferenceRunner(predictor) | ||
fedml_inference_runner.run() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
python/examples/federate/cross_silo/mpi_fedavg_mnist_lr_example/mpi_host_file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
fedml-a6000-node-1 | ||
Dimitris-FedML.local |
36 changes: 36 additions & 0 deletions
36
...ples/federate/cross_silo/mqtt_s3_fedavg_mnist_lr_example/custom_data_and_model/run_all.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
#!/usr/bin/env bash | ||
|
||
set -e | ||
cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"/ | ||
|
||
# The name of the current run. | ||
RUN_ID=$1 | ||
if [ -z "${RUN_ID}" ]; then | ||
echo "Need to provide the id of the run as a string." | ||
exit 1 | ||
fi | ||
|
||
# The number of workers. | ||
WORKER_NUM=$2 | ||
if [ -z "${WORKER_NUM}" ]; then | ||
echo "Need to provide the number of workers you want to run the experiment for." | ||
exit 1 | ||
fi | ||
|
||
# Spawn server process. | ||
echo "Starting server" | ||
python3 torch_server.py --cf config/fedml_config.yaml --rank 0 --role server --run_id $RUN_ID & | ||
sleep 3 # Sleep for 3s to give the server enough time to start | ||
|
||
# Spawn client(s) process. | ||
# Change the number next to seq for spawning more than 1 clients. | ||
for i in `seq $WORKER_NUM`; do | ||
echo "Starting client $i" | ||
python3 torch_client.py --cf config/fedml_config.yaml --rank $i --role client --run_id $RUN_ID & | ||
done | ||
|
||
# Enable CTRL+C to stop all background processes | ||
trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM | ||
|
||
# Wait for all background processes to complete | ||
wait |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.