Merge pull request #2160 from FedML-AI/dev/v0.7.0

Dev/v0.7.0
FedML-AI · Jun 11, 2024 · 9c227bb · 9c227bb
2 parents 31d8e7c + af026fb
commit 9c227bb
Show file tree

Hide file tree

Showing 53 changed files with 861 additions and 9,055 deletions.
diff --git a/README.md b/README.md
@@ -1,44 +1,40 @@
 
 # FEDML Open Source: A Unified and Scalable Machine Learning Library for Running Training and Deployment Anywhere at Any Scale
 
-Backed by FEDML Nexus AI: Next-Gen Cloud Services for LLMs & Generative AI (https://fedml.ai)
+Backed by TensorOpera AI: Your Generative AI Platform at Scale (https://TensorOpera.ai)
 
 <div align="center">
- <img src="docs/images/fedml_logo_light_mode.png" width="400px">
+ <img src="docs/images/TensorOpera_arch.png" width="600px">
 </div>
 
-FedML Documentation: https://doc.fedml.ai 
+TensorOpera Documentation: https://docs.TensorOpera.ai
 
-FedML Homepage: https://fedml.ai/ \
-FedML Blog: https://blog.fedml.ai/ \
-FedML Medium: https://medium.com/@FedML \
-FedML Research: https://fedml.ai/research-papers/ 
+TensorOpera Homepage: https://TensorOpera.ai/ \
+TensorOpera Blog: https://blog.TensorOpera.ai/
 
-Join the Community: \
+Join the Community:
 Slack: https://join.slack.com/t/fedml/shared_invite/zt-havwx1ee-a1xfOUrATNfc9DFqU~r34w \
 Discord: https://discord.gg/9xkW8ae6RV
 
 
-FEDML® stands for Foundational Ecosystem Design for Machine Learning. [FEDML Nexus AI](https://fedml.ai) is the next-gen cloud service for LLMs & Generative AI. It helps developers to *launch* complex model *training*, *deployment*, and *federated learning* anywhere on decentralized GPUs, multi-clouds, edge servers, and smartphones, *easily, economically, and securely*.
+TensorOpera® AI (https://TensorOpera.ai) is the next-gen cloud service for LLMs & Generative AI. It helps developers to launch complex model training, deployment, and federated learning anywhere on decentralized GPUs, multi-clouds, edge servers, and smartphones, easily, economically, and securely.
 
-Highly integrated with [FEDML open source library](https://github.com/fedml-ai/fedml), FEDML Nexus AI provides holistic support of three interconnected AI infrastructure layers: user-friendly MLOps, a well-managed scheduler, and high-performance ML libraries for running any AI jobs across GPU Clouds.
+Highly integrated with TensorOpera open source library, TensorOpera AI provides holistic support of three interconnected AI infrastructure layers: user-friendly MLOps, a well-managed scheduler, and high-performance ML libraries for running any AI jobs across GPU Clouds.
 
-![fedml-nexus-ai-overview.png](./docs/images/fedml-nexus-ai-overview.png)
+A typical workflow is showing in figure above. When developer wants to run a pre-built job in Studio or Job Store, TensorOpera®Launch swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. When running the job, TensorOpera®Launch orchestrates the compute plane in different cluster topologies and configuration so that any complex AI jobs are enabled, regardless model training, deployment, or even federated learning. TensorOpera®Open Source is unified and scalable machine learning library for running these AI jobs anywhere at any scale. 
 
-A typical workflow is showing in figure above. When developer wants to run a pre-built job in Studio or Job Store, FEDML®Launch swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. When running the job, FEDML®Launch orchestrates the compute plane in different cluster topologies and configuration so that any complex AI jobs are enabled, regardless model training, deployment, or even federated learning. FEDML®Open Source is unified and scalable machine learning library for running these AI jobs anywhere at any scale. 
+In the MLOps layer of TensorOpera AI
+- **TensorOpera® Studio** embraces the power of Generative AI! Access popular open-source foundational models (e.g., LLMs), fine-tune them seamlessly with your specific data, and deploy them scalably and cost-effectively using the TensorOpera Launch on GPU marketplace.
+- **TensorOpera® Job Store** maintains a list of pre-built jobs for training, deployment, and federated learning. Developers are encouraged to run directly with customize datasets or models on cheaper GPUs.
 
-In the MLOps layer of FEDML Nexus AI
-- **FEDML® Studio** embraces the power of Generative AI! Access popular open-source foundational models (e.g., LLMs), fine-tune them seamlessly with your specific data, and deploy them scalably and cost-effectively using the FEDML Launch on GPU marketplace.
-- **FEDML® Job Store** maintains a list of pre-built jobs for training, deployment, and federated learning. Developers are encouraged to run directly with customize datasets or models on cheaper GPUs.
+In the scheduler layer of TensorOpera AI
+- **TensorOpera® Launch** swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. It supports a range of compute-intensive jobs for generative AI and LLMs, such as large-scale training, serverless deployments, and vector DB searches. TensorOpera Launch also facilitates on-prem cluster management and deployment on private or hybrid clouds.
 
-In the scheduler layer of FEDML Nexus AI
-- **FEDML® Launch** swiftly pairs AI jobs with the most economical GPU resources, auto-provisions, and effortlessly runs the job, eliminating complex environment setup and management. It supports a range of compute-intensive jobs for generative AI and LLMs, such as large-scale training, serverless deployments, and vector DB searches. FEDML Launch also facilitates on-prem cluster management and deployment on private or hybrid clouds.
-
-In the Compute layer of FEDML Nexus AI
-- **FEDML® Deploy** is a model serving platform for high scalability and low latency.
-- **FEDML® Train** focuses on distributed training of large and foundational models.
-- **FEDML® Federate** is a federated learning platform backed by the most popular federated learning open-source library and the world’s first FLOps (federated learning Ops), offering on-device training on smartphones and cross-cloud GPU servers.
-- **FEDML® Open Source** is unified and scalable machine learning library for running these AI jobs anywhere at any scale.
+In the Compute layer of TensorOpera AI
+- **TensorOpera® Deploy** is a model serving platform for high scalability and low latency.
+- **TensorOpera® Train** focuses on distributed training of large and foundational models.
+- **TensorOpera® Federate** is a federated learning platform backed by the most popular federated learning open-source library and the world’s first FLOps (federated learning Ops), offering on-device training on smartphones and cross-cloud GPU servers.
+- **TensorOpera® Open Source** is unified and scalable machine learning library for running these AI jobs anywhere at any scale.
 
 # Contributing 
 FedML embraces and thrive through open-source. We welcome all kinds of contributions from the community. Kudos to all of <a href="https://github.com/fedml-ai/fedml/graphs/contributors" target="_blank">our amazing contributors</a>!  

diff --git a/docs/images/TensorOpera_arch.png b/docs/images/TensorOpera_arch.png
diff --git a/python/examples/deploy/debug/inference_timeout/config.yaml b/python/examples/deploy/debug/inference_timeout/config.yaml
@@ -0,0 +1,10 @@
+workspace: "./src"
+entry_point: "serve_main.py"
+bootstrap: |
+  echo "Bootstrap start..."
+  sleep 5
+  echo "Bootstrap finished"
+auto_detect_public_ip: true
+use_gpu: true
+
+request_timeout_sec: 10
diff --git a/python/examples/deploy/debug/inference_timeout/src/serve_main.py b/python/examples/deploy/debug/inference_timeout/src/serve_main.py
@@ -0,0 +1,32 @@
+from fedml.serving import FedMLPredictor
+from fedml.serving import FedMLInferenceRunner
+import uuid
+import torch
+
+# Calculate the number of elements
+num_elements = 1_073_741_824 // 4  # using integer division for whole elements
+
+
+class DummyPredictor(FedMLPredictor):
+    def __init__(self):
+        super().__init__()
+        # Create a tensor with these many elements
+        tensor = torch.empty(num_elements, dtype=torch.float32)
+
+        # Move the tensor to GPU
+        tensor_gpu = tensor.cuda()
+
+        # for debug
+        with open("/tmp/dummy_gpu_occupier.txt", "w") as f:
+            f.write("GPU is occupied")
+
+        self.worker_id = uuid.uuid4()
+
+    def predict(self, request):
+        return {f"AlohaV0From{self.worker_id}": request}
+
+
+if __name__ == "__main__":
+    predictor = DummyPredictor()
+    fedml_inference_runner = FedMLInferenceRunner(predictor)
+    fedml_inference_runner.run()
diff --git a/python/examples/deploy/quick_start/__init__.py b/python/examples/deploy/quick_start/__init__.py
diff --git a/python/examples/deploy/quick_start/config.yaml b/python/examples/deploy/quick_start/config.yaml
@@ -1,21 +1,8 @@
-workspace: "./src"
+workspace: "."
 entry_point: "main_entry.py"
+
 # If you want to install some packages
 # Please write the command in the bootstrap.sh
 bootstrap: |
-  echo "Bootstrap start..."
-  sh ./config/bootstrap.sh
-  echo "Bootstrap finished"
-
-# If you do not have any GPU resource but want to serve the model
-# Try FedML® Nexus AI Platform, and Uncomment the following lines.
-# ------------------------------------------------------------
-computing:
-  minimum_num_gpus: 1           # minimum # of GPUs to provision
-  maximum_cost_per_hour: $3000   # max cost per hour for your job per gpu card
-  #allow_cross_cloud_resources: true # true, false
-  #device_type: CPU              # options: GPU, CPU, hybrid
-  resource_type: A100-80G       # e.g., A100-80G,
-  # please check the resource type list by "fedml show-resource-type"
-  # or visiting URL: https://open.fedml.ai/accelerator_resource_type
-# ------------------------------------------------------------
+  echo "Install some packages..."
+  echo "Install finished!"
diff --git a/python/examples/deploy/quick_start/main_entry.py b/python/examples/deploy/quick_start/main_entry.py
@@ -0,0 +1,27 @@
+from fedml.serving import FedMLPredictor
+from fedml.serving import FedMLInferenceRunner
+
+
+class Bot(FedMLPredictor):  # Inherit FedMLClientPredictor
+    def __init__(self):
+        super().__init__()
+
+        # --- Your model initialization code here ---
+
+        # -------------------------------------------
+
+    def predict(self, request: dict):
+        input_dict = request
+        question: str = input_dict.get("text", "").strip()
+
+        # --- Your model inference code here ---
+        response = "I do not know the answer to your question."
+        # ---------------------------------------
+
+        return {"generated_text": f"The answer to your question {question} is: {response}"}
+
+
+if __name__ == "__main__":
+    chatbot = Bot()
+    fedml_inference_runner = FedMLInferenceRunner(chatbot)
+    fedml_inference_runner.run()
diff --git a/python/examples/deploy/quick_start/src/__init__.py b/python/examples/deploy/quick_start/src/__init__.py
diff --git a/python/examples/deploy/quick_start/src/app/__init__.py b/python/examples/deploy/quick_start/src/app/__init__.py
diff --git a/python/examples/deploy/quick_start/src/app/pipe/__init__.py b/python/examples/deploy/quick_start/src/app/pipe/__init__.py
diff --git a/python/examples/deploy/quick_start/src/app/pipe/constants.py b/python/examples/deploy/quick_start/src/app/pipe/constants.py