Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Azure automated deployment for OPEA applications - Infosys #629

Merged
merged 16 commits into from
Dec 11, 2024

Conversation

kkrishTa
Copy link
Contributor

@kkrishTa kkrishTa commented Dec 6, 2024

Description

Infosys contribution to OPEA.
Azure AKS terraforms and steps to execute to deploy OPEA applications.

Issues

NA

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

None.

Tests

Tested terraforms on Azure platform.

@kkrishTa kkrishTa changed the title AKS Deployment for OPEA applications - Infosys AKS automated deployment for OPEA applications - Infosys Dec 6, 2024
@kkrishTa kkrishTa changed the title AKS automated deployment for OPEA applications - Infosys Azure automated deployment for OPEA applications - Infosys Dec 6, 2024
Copy link

@arun-gupta arun-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

I did not try out the instructions but leaving comments based upon reading.

@@ -0,0 +1,85 @@
# OPEA applications Azure ASK deployment guide

This guide shows how to deploy OPEA applications on Azure Kubernetes Service (ASK) using Terraform.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASK -> AKS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit bot has changed "AKS" -> "ASK" amd "aks" -> "ask".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that consistent error!!


## Prerequisites

- Access to Azure ASK

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AKS


## Setup

The setup uses Terraform to create ASK cluster with the following properties:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AKS - this is a global search and replace through out this doc

The setup uses Terraform to create ASK cluster with the following properties:

- 1-node ASK cluster with 50 GB disk and `Standard_D32d_v5` SPOT instance (16 vCPU and 32 GB memory)
- Cluster autoscaling up to 10 nodes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this autoscaling to 10 nodes?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable number.


The setup uses Terraform to create ASK cluster with the following properties:

- 1-node ASK cluster with 50 GB disk and `Standard_D32d_v5` SPOT instance (16 vCPU and 32 GB memory)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a SPOT instance? Can this be a standard virtual machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support cost optimization, we have mentioned SPOT. Since instance is a variable, we will mention standard/SPOT in the readme file and mention about cost optimization in the variable file.

- Cluster autoscaling up to 10 nodes
- Storage Class (SC) `azfs-sc` and Persistent Volume Claim (PVC) `model-volume` for storing the model data
- `LoadBalancer` address type for the service for external consumption
- Updates the kubeconfig file for `kubectl` access

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what needs to be done here?


```bash
kubectl get pod -n chatqna
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to state that "ensure all pods are running or show 1/1 or something similar"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure Will add the statement.

@lianhao lianhao requested review from poussa and removed request for daisy-ycguo December 9, 2024 00:57
Copy link
Collaborator

@poussa poussa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just change the cluster name from opea-chatqna to opea.

@@ -0,0 +1,6 @@
cluster_name = "opea-chatqna"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use cluster name opea. The AWS/EKS cluster will change to use that name as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated cluster name to opea

@kkrishTa
Copy link
Contributor Author

Hello @poussa ,
Review comments are fixed. Could you please approve and merge the code?

@poussa poussa requested a review from lianhao December 10, 2024 07:19
@poussa
Copy link
Collaborator

poussa commented Dec 10, 2024

Hello @poussa , Review comments are fixed. Could you please approve and merge the code?

I have already approved. We need a 2nd approver. I'll work on that now.

@poussa
Copy link
Collaborator

poussa commented Dec 10, 2024

@kkrishTa can you also fix the DCO. That is, add signed-off-by line to the commit (-s).

@kkrishTa
Copy link
Contributor Author

@kkrishTa can you also fix the DCO. That is, add signed-off-by line to the commit (-s).

@poussa Sure. Fixed this one.

@kkrishTa
Copy link
Contributor Author

Hello @poussa / @lianhao ,
The pipeline got failed with an error:
/home/runner/work/GenAIInfra/GenAIInfra/docs/_build/rst/GenAIInfra/cloud-service-provider/azure/aks/terraform/README.md: WARNING: document isn't included in any toctree

But I see that ".md files not in toc tree" is ignored as per https://github.com/opea-project/docs/blob/main/.known-issues/sphinx.conf. Though this is a warning, the pipeline step has failed.

Could you please advise fix on this?

@lianhao
Copy link
Collaborator

lianhao commented Dec 10, 2024

Could you please advise fix on this?

@poussa will create another PR to add the doc link to those exiting md files under cloud-service-provider directory, then you can add the link follow his convention.

It's a newly added doc CI check complaining.

@poussa
Copy link
Collaborator

poussa commented Dec 10, 2024

@kkrishTa once this PR#633 is merged, you can update yours to add link to Azure/AKS readme. Then the ci/cd is happy and we can merge.

lianhao and others added 11 commits December 10, 2024 21:23
* helm: Add service account support in common services

1. Add service account creation support, disabled by default.

2. Add support of sharing the same service account by setting
   global.sharedSAName, disabled by default.

Signed-off-by: Lianhao Lu <[email protected]>

* helm: Add service account support in e2e charts

1. Add service account creation support, enabled by default.

2. Add support of sharing the same service account by setting
   global.sharedSAName, enabled by default.

Signed-off-by: Lianhao Lu <[email protected]>

---------

Signed-off-by: Lianhao Lu <[email protected]>
Signed-off-by: kkrishTa <[email protected]>
* README: add links to terraform docs

Signed-off-by: Sakari Poussa <[email protected]>

* README: fix broken links

Signed-off-by: Sakari Poussa <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Sakari Poussa <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: kkrishTa <[email protected]>
@poussa
Copy link
Collaborator

poussa commented Dec 10, 2024

@kkrishTa can you resolve the conflict, please?

@poussa
Copy link
Collaborator

poussa commented Dec 10, 2024

The readme filename is README.md not README.MD as it is in the link now. That's why ci/cd is still failing.

One more fix and we are there...

@lianhao
Copy link
Collaborator

lianhao commented Dec 11, 2024

The doc team is investigating why the Check Online Document Building CI fails now. Let's hold on this until they figure out the solution.

Copy link
Collaborator

@mkbhanda mkbhanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,85 @@
# OPEA applications Azure ASK deployment guide

This guide shows how to deploy OPEA applications on Azure Kubernetes Service (ASK) using Terraform.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that consistent error!!

@mkbhanda
Copy link
Collaborator

Let me find someone on the docs team to help on this @kkrishTa. You have been most patient with our CICD tests.

@lianhao
Copy link
Collaborator

lianhao commented Dec 11, 2024

@kkrishTa The doc teams has resolved the doc build failure. For the Check Paths and Hyperlinks / check-the-validity-of-hyperlinks-in-README issue, could you please use relative link instead, just like PR #634

@kkrishTa
Copy link
Contributor Author

kkrishTa commented Dec 11, 2024

@kkrishTa The doc teams has revolved the doc build failure. For the Check Paths and Hyperlinks / check-the-validity-of-hyperlinks-in-README issue, could you please use relative link instead, just like PR #634

sure @lianhao . Once #634 is merged, I will pull the latest and update the link apporpriately.

@lianhao
Copy link
Collaborator

lianhao commented Dec 11, 2024

@kkrishTa The doc teams has revolved the doc build failure. For the Check Paths and Hyperlinks / check-the-validity-of-hyperlinks-in-README issue, could you please use relative link instead, just like PR #634

sure @lianhao . Once this is merged, I will pull the latest and update the link apporpriately.

No need to wait for PR #634, just go ahead to fix your PR by using relative links to newly added .md files. Let's get your PR to land-in first.

Signed-off-by: kkrishTa <[email protected]>
@kkrishTa
Copy link
Contributor Author

@kkrishTa The doc teams has revolved the doc build failure. For the Check Paths and Hyperlinks / check-the-validity-of-hyperlinks-in-README issue, could you please use relative link instead, just like PR #634

sure @lianhao . Once this is merged, I will pull the latest and update the link apporpriately.

No need to wait for PR #634, just go ahead to fix your PR by using relative links to newly added .md files. Let's get your PR to land-in first.

Updated readme. Since it has common read me, upadted other relative links as in #634 without breaking AWS readme link.

@mkbhanda mkbhanda merged commit e9dc58a into opea-project:main Dec 11, 2024
7 checks passed
@joshuayao joshuayao added this to the v1.2 milestone Dec 11, 2024
@joshuayao joshuayao added the feature New feature or request label Dec 15, 2024
@joshuayao joshuayao changed the title Azure automated deployment for OPEA applications - Infosys [Feature] Azure automated deployment for OPEA applications - Infosys Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants