Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Monorepo Setup & Separate by Role #133

Merged
merged 33 commits into from
Feb 6, 2024
Merged

Fix Monorepo Setup & Separate by Role #133

merged 33 commits into from
Feb 6, 2024

Conversation

arpitjasa-db
Copy link
Collaborator

@arpitjasa-db arpitjasa-db commented Jan 9, 2024

Changes

This PR makes multiple changes in line with the Separate by Role Project Proposal to separate CICD generation from project generation to allow monorepos to work and making the division more intuitive:

  • Add a new parameter input_setup_cicd_and_project that will allow three parameters: CICD_and_Project (the current default flow), Project_Only which sets up only project files, and CICD_Only which only sets up the root directory and provides a workflow to generate CI/CD files for each project in the directory.
  • Update CI/CD workflows pull requests to only trigger if files within respective project folder are modified in a PR.
  • Update docs to clarify how to set this up.

Fixes #122

Also includes various quality-of-life improvements and minor bug fixes such as constant config versions, clean-up of unused variables, and proper file generation.

Tests

Project Generation Workflow:

Screenshot 2024-01-25 at 2 53 22 PM Screenshot 2024-01-26 at 2 55 57 AM Screenshot 2024-01-25 at 2 54 16 PM

CICD Zip Generation Workflow

Screenshot 2024-02-03 at 2 08 59 AM

@vladimirk-db vladimirk-db self-requested a review January 9, 2024 19:48
Copy link
Contributor

@mingyu89 mingyu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach of copying the CICD files of the first project will not work well because it assumes that all the projects have the same CICD platform and feature store selections, which may not be the case.

@arpitjasa-db arpitjasa-db requested a review from mingyu89 January 11, 2024 00:19
@arpitjasa-db arpitjasa-db requested a review from mingyu89 January 11, 2024 03:07
template/{{.input_root_dir}}/README.md.tmpl Outdated Show resolved Hide resolved
template/{{.input_root_dir}}/README.md.tmpl Outdated Show resolved Hide resolved
template/{{.input_root_dir}}/README.md.tmpl Outdated Show resolved Hide resolved
@arpitjasa-db arpitjasa-db changed the title Fix Monorepo Setup Fix Monorepo Setup & Separate by Role Jan 25, 2024
description: |
MLflow registered model for the "{{template `project_name` .}}" ML Project. See the corresponding [Git repo]($#{var.git_repo_url}) for details on the project.

Links:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing these links since var.git_repo_url has been broken for a while, and linking to these jobs does not merit providing databricks_prod_workspace_host as a variable.

@niall-turbitt
Copy link
Contributor

testing this out and have taken the following workflow:

  • create a project with cicd plus project (intending to make this a monorepo)
  • add second project, using project_only option

All works fine at this stage. I have a mono repo with 2 projects and CI/CD for the first project. I now want to add the CI/CD pipelines for the second project, and run through initialization again, selecting CICD_only, and setting the input_root_dir to the monorepo project root dir

  • I get an error at this point, that there is already a .gitignore file
    image
  • I try deleting this gitignore file and try the same setup, but this time get an error that the readme exists
    image

What should be the intended workflow to add the CI/CD pipelines for a second project in a monorepo?

@niall-turbitt
Copy link
Contributor

I'm getting this fs_param.json file generated for projects, when feature store is not selected:
image

Can we remove this?

@arpitjasa-db
Copy link
Collaborator Author

I'm getting this fs_param.json file generated for projects, when feature store is not selected: image

Can we remove this?

Yeah good question, we could remove it and check for the presence/absence of the file instead in our workflows (but that does complicate the code a bit). Or we can rename is to project_params.json. What would be better?

@niall-turbitt
Copy link
Contributor

in the generated top level README, it looks like we're generated an additional ( under the Setting up CI/CD section
image

Copy link
Contributor

@mingyu89 mingyu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass and didn't see any major issue. This is indeed a complex PR.

@mingyu89 mingyu89 self-requested a review February 1, 2024 10:26
@mingyu89 mingyu89 dismissed their stale review February 1, 2024 10:27

The preview comments has been addressed

Copy link
Contributor

@mingyu89 mingyu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a huge improvement of the CUJs.

@niall-turbitt
Copy link
Contributor

I'm getting this fs_param.json file generated for projects, when feature store is not selected: image
Can we remove this?

Yeah good question, we could remove it and check for the presence/absence of the file instead in our workflows (but that does complicate the code a bit). Or we can rename is to project_params.json. What would be better?

project_params.json seems like a good option

@arpitjasa-db arpitjasa-db requested a review from mingyu89 February 3, 2024 10:11
@niall-turbitt
Copy link
Contributor

couple of minor things that need to be fixed in order to get the deploy cicd pipeline running for Azure DevOps. Aside from those, looks good. Thanks for this @arpitjasa-db , appreciate this was a tricky one to work through!

Set up a mono repo to test here https://dev.azure.com/niallturbitt/_git/monorepo-mlops-stacks - have added you as an admin Arpit

@niall-turbitt
Copy link
Contributor

just catching this. For the second project I generated there were no staging or prod profiles defined in the databricks.yaml file. Understand the reasoning for this; a data scientist adding a second project may not know up front the staging/prod workspace URLs.

image

What this means though, is that when the CI/CD pipelines are triggered for the second project, there is not staging/prod profiles defined for the bundles CI/CD pipeline to use

image

We either need to somehow automate the addition of staging/test profiles to the databricks.yaml file of the new project whenever we are running the deploy cicd pipeline for the new project. Or direct the user to do so somewhere in documentation

@arpitjasa-db
Copy link
Collaborator Author

just catching this. For the second project I generated there were no staging or prod profiles defined in the databricks.yaml file. Understand the reasoning for this; a data scientist adding a second project may not know up front the staging/prod workspace URLs.

image

What this means though, is that when the CI/CD pipelines are triggered for the second project, there is not staging/prod profiles defined for the bundles CI/CD pipeline to use

image

We either need to somehow automate the addition of staging/test profiles to the databricks.yaml file of the new project whenever we are running the deploy cicd pipeline for the new project. Or direct the user to do so somewhere in documentation

@niall-turbitt I actually added the profile generation as part of the deploy CICD pipeline. Did it not work?

@arpitjasa-db
Copy link
Collaborator Author

@niall-turbitt I actually added the profile generation as part of the deploy CICD pipeline. Did it not work?

Ah actually I see we forgot to include that file when we did git add. It should be fixed now! Also made the other changes you had in the monorepo I think. Would you mind taking a final pass @niall-turbitt and see if all the issues you found were fixed? Thank you!

git add .azure
git commit -m "Add CICD for {{`${{ parameters.ProjectName }}`}}"
git push origin add-cicd-for-{{`${{ parameters.ProjectName }}`}}
displayName: Push CICD Bundle to a New Branch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to be annoying with Azure DevOps...

In order to run the Push CICD Bundle to a New Branch step, we will need to enable the project Build Service to be able to contribute and create a branch for the project. i.e when the deploy CI/CD pipeline is triggered, the build service that runs this pipeline needs the necessary permissions to be able to push

To do this, you need to do the following: go to Project Settings -> Repositories -> Security -> Select <project_name> Build Service under users and set “Contribute”, “Create Branch”, and "Contribute to pull requests" to “Allow”

@arpitjasa-db this will only be applicable to the Monorepo case where you have an instantiate project, have added a second project and then want to set up CI/CD for this second project. I would suggest we add this to the Configure CI/CD - Azure DevOps docs in mlops-setup.md

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also hitting issues on this with the push.

Resolved this with : git push --force origin add-cicd-for-${{ parameters.ProjectName }}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit concerned about adding --force on the off-chance someone has the same branch or has force pushes disabled on their repository. Seems a bit weird to need to do add this flag

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added --force-with-lease to prevent the overwriting possibility, and someone can just update the pipeline to remove this if they have force disabled on the whole repo/org

@niall-turbitt
Copy link
Contributor

niall-turbitt commented Feb 5, 2024

@arpitjasa-db the above comments I had added but didn't properly publish, so don't think you caught them. I see the refs/heads/feature-branch-pr branch issue has already been resolved.

The only other thing I think needs addressing is this point around documenting the permissions for the build service to be able to run the deploy CICD pipeline successfully

@niall-turbitt
Copy link
Contributor

niall-turbitt commented Feb 5, 2024

one other minor nit that I think could be confusing for users is the following in a monorepo setup:

  • I am adding a second project to a monorepo, and I specify the project name as project-2 during initialization
  • This is added under a folder project_2 in the monorepo
  • When running the deploy CICD pipeline to set up the CICD pipelines for project 2 we have to set the name of the project via a param:
parameters:
  - name: ProjectName
    displayName: Name of the project to deploy CI/CD for.
    default: mono-mlops-stacks
    type: string

From the default mono-mlops-stacks I would assume this param should be project-2 to set up for the new project. However this param has to be project_2 to map to the folder

@arpitjasa-db
Copy link
Collaborator Author

one other minor nit that I think could be confusing for users is the following in a monorepo setup:

  • I am adding a second project to a monorepo, and I specify the project name as project-2 during initialization
  • This is added under a folder project_2 in the monorepo
  • When running the deploy CICD pipeline to set up the CICD pipelines for project 2 we have to set the name of the project via a param:
parameters:
  - name: ProjectName
    displayName: Name of the project to deploy CI/CD for.
    default: mono-mlops-stacks
    type: string

From the default mono-mlops-stacks I would assume this param should be project-2 to set up for the new project. However this param has to be project_2 to map to the folder

@niall-turbitt good catch! Added a step in GH and ADO to create a underscore version of the project name and use that for the directory instead so they can continue using the normal name

@arpitjasa-db arpitjasa-db merged commit a684031 into main Feb 6, 2024
1 check passed
@arpitjasa-db arpitjasa-db deleted the fixMonorepo branch February 6, 2024 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

monorepo doesn't work as expected
5 participants