Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add persistence package #107

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add persistence package #107

wants to merge 1 commit into from

Conversation

jraddaoui
Copy link
Contributor

@jraddaoui jraddaoui commented Jan 14, 2025

This is the first of at least three PRs to address #106. The next PR will
do the actual check in workflow (hopefully it can happen before this
week release), and the other one will deal with versioned migrations.


Create a persistence package for database integration using Ent, with
support for SQLite and MySQL databases. Define database schema with
a single sip table with name and checksum columns that will be
used to check for duplicates at the bigginging of the preprocessing
workflow. The persistence layer will only be used when checkDuplicates
is set to true in the configuration.

Update the dev env/overlay to use an SQLite database and the Enduro
env/overlay to use MySQL. Enable cgo at build time and install gcc
and musl-dev in the build Docker image to be able to build an static
binary that uses github.com/mattn/go-sqlite3.

@jraddaoui jraddaoui self-assigned this Jan 14, 2025
@jraddaoui jraddaoui force-pushed the dev/issue-106-duplicates branch from 875ee0b to 41b4a87 Compare January 14, 2025 07:29
Create a persistence package for database integration using Ent, with
support for SQLite and MySQL databases. Define database schema with
a single `sip` table with `name` and `checksum` columns that will be
used to check for duplicates at the bigginging of the preprocessing
workflow. The persistence layer will only be used when `checkDuplicates`
is set to `true` in the configuration.

Update the dev env/overlay to use an SQLite database and the Enduro
env/overlay to use MySQL. Enable `cgo` at build time and install `gcc`
and `musl-dev` in the build Docker image to be able to build an static
binary that uses `github.com/mattn/go-sqlite3`.

[skip-codecov]
@jraddaoui jraddaoui force-pushed the dev/issue-106-duplicates branch from 41b4a87 to a1e1c51 Compare January 14, 2025 21:52
@jraddaoui jraddaoui changed the title WIP: Add persistence package and duplicates check Add persistence package Jan 14, 2025
@jraddaoui
Copy link
Contributor Author

This is ready for code review @djjuhasz. @mcantelon this could be a good PR to know a bit more about the persistence layer/package in other projects where we use Ent/Atlas.

@sevein I'd also like that you take a look when you have some time, specially about the cgo changes at build time to be able to use SQLite, I see other projects with support for it but not in the build process, I guess just for tests. If we see an issue with this approach we could just build as before and leave the SQLite side just for the tests.

@jraddaoui jraddaoui marked this pull request as ready for review January 14, 2025 22:14
Copy link
Contributor

@djjuhasz djjuhasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @jraddaoui. 👍

I guess adding a database was inevitable, but it was nice to avoid it as long as we did. 😉

@@ -9,6 +9,12 @@ stringData:
verbosity = 2

sharedPath = "/home/preprocessing/shared"
checkDuplicates = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want checkDuplicates to be false in the dev environment?

// be required, and a SIP that has already been processed will fail the
// preprocessing workflow.
CheckDuplicates bool
Persistence persistence.Config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a blank line after CheckDuplicates and no blank line after Persistence.

case "mysql":
db, err = OpenMySQL(ds)
case "sqlite3":
db, err = sql.Open(driver, ds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put all this in an OpenMySQL(ds) function like with MySQL?

}

// OpenMySQL opens the MySQL database driver.
func OpenMySQL(ds string) (*sql.DB, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be exported?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants