Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify code in the repository #48

Closed
atiftw opened this issue Jan 30, 2024 · 1 comment · Fixed by #57
Closed

Simplify code in the repository #48

atiftw opened this issue Jan 30, 2024 · 1 comment · Fixed by #57

Comments

@atiftw
Copy link

atiftw commented Jan 30, 2024

These are the the high level proposed changes before we start working on them

Remove docker support but only for people running on their local machines, refactor the docker files to a subfolder to keep gitpod support

From the discussion - we should not support docker as it confuses the setup further for most people especially on a windows setup. A lot of people showing up for the interview have good knowledge of how to setup Spark on their machines. One outcome of this is that

Remove batect for running tests

Not many people are familiar with batect and find it hard to understand their way around it, replace it completely with pytests

Code refactoring

  1. src/it - this should be ideally in src/test/it which is the more standard convention used for Scala
  2. Remove geo location from package names and rename org organization := "com.thoughtworks.cd.de" to organization := "[com.thoughtworks]",
  3. For the scala version - remove the use of any curried functions and implicit classes. For python simplify the code further and reduce method bloat.

This is just the general set of things, lets add more things to the list and we can start merging these changes one by one. This is in parallel to the changes being planned to support databricks. Please add your thoughts. @lauris-tw @lleites @darshanj @jmolina4 @svishal9

@svishal9
Copy link
Contributor

svishal9 commented Feb 6, 2024

Docker/Gitpod: Different set of people have different levels of familiarity with the code. I feel we should be open to more ways to interact with code and hence keep docker and other ways while adding others.
While we are talking about adding ways, it would be great to add CI/CD as well for executing tests.

Batect: This was more of a dog food. We were contributing to batect and it was a chance to use the simplification which it offers. Having said that, more than happy to support combination of pytest & pyspark test if it is adding complexity.

Code refactoring: +1 to all points. We might also look at some of the tech debts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants