Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Database layer with doctrine/dbal #6632

Closed
dhensby opened this issue Feb 17, 2017 · 26 comments
Closed

Replace Database layer with doctrine/dbal #6632

dhensby opened this issue Feb 17, 2017 · 26 comments

Comments

@dhensby
Copy link
Contributor

dhensby commented Feb 17, 2017

More and more we're seeing issues raised about our DB layer and it not working under specific circumstances. Our DB layer is also in need of a slight refactor to properly separate concerns and stop us hard coding logic related to DB drivers into our generic connectors.

Rather than spending our time re-factoring our DB connectors and maintaining an incomplete set of DB drivers I suggest we move our DB layer to a thirdparty - this will have the advantages of:

  1. Reducing maintenance burden on core team - We currently struggle with the maintenance of external DB adapters especially MSSQL
  2. Increase support of other DB drivers - The number of DB drivers we offer is limited and can be increased by moving to a thirdparty lib

I've looked at Laravel's Illuminate and Doctrine's DBAL. Laravel's database component is far too tightly coupled to the Eloquent ORM - we'd have to pull it in even though we aren't going to use it.

Looking into doctrine/dbal and it's got a really nice API and a very similar architecture to our current DB layer. It looks like a pretty awesome and well architected DB layer as well as giving us the advantage of a lot of stock drivers and any other open-source drivers that have subsequently been written.

If we're able to swap out our DB layer, we'll drop a lot of hand-rolled code whilst improving functionality and focusing on adding value.

Other advantages:

  1. More easily move to database migrations (https://github.com/doctrine/migrations/)
  2. Defer DB connections to be "on demand" rather than connecting on every request

Issues:

  1. There is no Enum support in doctrine/dbal and so we'll have to move this logic to the application layer.*

*Personally, I think this is something that we should be doing anyway as Enums can be painful for a number of reasons see doctrine's docs on enums

@dhensby dhensby added this to the CMS 4.0.0-beta1 milestone Feb 17, 2017
@sminnee
Copy link
Member

sminnee commented Feb 17, 2017

Dan, could you outline which classes you'd see as being replaced by dbal, and where the coupling points to dbal woud be?

@dhensby
Copy link
Contributor Author

dhensby commented Feb 17, 2017

I think this work is something that will need to be done in steps. In the long term I'd like to see pretty much everything in our SilverStripe\ORM\Connect namespace go. However, this is a lot of work and I'm starting by aiming at keeping most of our SQL generation and simply using doctrine/dbal to execute the queries for us.

The more I'm looking into doctirne/dbal the more I'm seeing how powerful it is and that it appears that it will solve some problems for us or, at least, make it simpler to do things we do now. Automatica aliasing of table names is one example.

I don't see a big change in coupling points; anywhere we currently couple to DBQueryBuilder we'd likely replace with the dbal query builder. So the main coupling points would be DataQuery, the DBField classes (they need to present represent themselves as columns in the DB table, as they do already) and SilverStripe\ORM\Filters would also need to be able to build WHERE clauses.

@sminnee
Copy link
Member

sminnee commented Feb 19, 2017

Ok I'm not opposed to this, but I'm not sure I'd rate its priority highly for SS4. My instinct is that it's probably more work to do this that fix the issues we already know about.

That said, if you were going to drive this, you'd get to make the priority call. ;-)

I've got mixed feelings about doing a half-job of this to hit the SS4 timeline too. It depends a bit on what a "half job" would look like but I'd be more keen to leave SS4 in a consistent state than to just try and squeeze something in.

I'd also note that I'm much more nervous about replacing the ORM, but you seem to be more or less in agreement with that too.

@dhensby
Copy link
Contributor Author

dhensby commented Feb 20, 2017

My instinct is that it's probably more work to do this that fix the issues we already know about.

That may well be the case; but I think this is the wrong way to approach. Whilst the cost to solve the known issues now may be less than integrating a 3rd party lib, the integration will pay-off in the long run, especially if SS4 is going to be an LTS release.

Having said that, I don't think it should be a priority for SS4 - there's only so much that can be done in time for the release.

I've been taking a look at integrating it in my evenings and it does look very nice and I've got the CMS loading and so on, so I'll spend some more time looking into it but with no commitment to being able to complete for the beta.

@dhensby
Copy link
Contributor Author

dhensby commented Feb 20, 2017

RE the ORM - yes, I don't think that's anything we can remove any time soon.

Laravel's is the closest in terms of syntax and familiarity, but there's no MTI. No matter what we'd be maintaining our own extension of another ORM if that were ever to happen... I'm focused on the areas where we have little differentiation from other libraries.

Right now, doctrine/dbal looks like a great fit.

@stevie-mayhew
Copy link
Contributor

I haven't used either of those database connectors extensively so can't really comment on the specifics for each, but I think that the idea of changing out the DB connector is a good one for the exact reasons you've outlined.

@sminnee
Copy link
Member

sminnee commented Feb 20, 2017

I'd merge a PR if you did this, but it won't get into open sourcerers' backlog.

@sminnee sminnee removed this from the CMS 4.0.0-beta1 milestone Mar 6, 2017
@sminnee
Copy link
Member

sminnee commented Mar 6, 2017

This won't block the beta1 milestone. If you manage to do this, then great, otherwise it'll likely be an SS5 item.

@dhensby
Copy link
Contributor Author

dhensby commented Mar 8, 2017

From my initial investigation work on this there are a few key areas where doctrine/dbal doesn't work quite the same way that our DB layer does or has "missing features".

Potential blockers:

  1. Renaming columns doesn't seem to be possible in the DBAL
  2. Fixing table case is not currently possible because change detection on tables is done in a case-insensitive manner (see issue response from dbal)
  3. No abstract way to get the RAND() method for a DBMS
  4. Can't find a cross DBMS way to select a database. When connecting to a DB, if we supply the DB name and it doesn't exist, we get an error - so we have to reconnect without the DB name and then create and select it, this appears problematic at the moment.
  5. Enums are not supported
  6. Can't set a select statement to be DISTINCT in the query builder

High effort upgrades:

  1. Move DB query building to the DBAL query builder (though creating our own DQL builder may be better)
  2. Have to use Convert::symbol2sql() for quoting columns (ANSI mode is not on by default with dbal because of "inconsistencies" so we can't hard code " as our identifier quote character)
  3. Can't "convert" an SQLExpression object from one type to another (currently we can build an SQLUpdate query and turn it into an SQLSelect)

Benefits of doctrine/dbal:

  1. Closely aligned APIs for DB abstraction (similar architecture of classes as well as method names)
  2. On demand DB connection (doesn't connect until a query needs to be run)
  3. Good typing of data pulled out of the DB (currently all data is pulled out as string, dbal will type it correctly for us)
  4. Simple to use query builder with easy parametrisation including support for: array expansion, and named and positional parameter tracking.

@sminnee
Copy link
Member

sminnee commented Mar 8, 2017

OK, based on that feedback, particularly point 2 of "high-effort upgrades", I can't see this happening in SS4.

On the potential blockers:

  • 1: I'm not sure that we use renaming in dev/build, and if we do, I expect it's only to shuffle obsolete columns out the way. If people use it in their code it'll just be for requireDefaultRecords() migrations, and presumably we could switch to a migration system based on dbal
  • 2: If that means that tables are consistently accessible in a case-insensitive manner then perhaps that's okay?
  • 3: Seems like a PR we could supply to DBAL, although I should note that ORDER BY RAND(), which is probably our use case, is an anti-pattern (prevents use of indexes) and it might be better to avoid its use.
  • 4: I don't see an issue with try { /* connect */ } catch { /* create */ /* connect */ }. The code around switching databases for temp dbs and the like can do with a clean-up
  • 5: Guh, WTF? PR to DBAL seems like the way ahead there.
  • 6: We can probably do better than these DISTINCT queries, with tighter GROUP BY controls. We may already have that in place (the DISTINCT goes way back to SS2.0 days from memory).

@sminnee
Copy link
Member

sminnee commented Mar 8, 2017

On the benefits: I see point 2 as being achieved by moving the database to an Injector service, which is something that is much more likely for SS4.

@sminnee
Copy link
Member

sminnee commented Mar 8, 2017

If the ORM were split into a separate composer package, then you could potentially use dev-master of that package and the 4.x versions of all other packages, which would mean that a re-write based on this would be something you'd have to wait as long to use.

@dhensby
Copy link
Contributor Author

dhensby commented Mar 8, 2017

Blockers:

  1. We use it to name columns _obsolete_[colname]. I'm not really sure why this isn't in the dbal core - I understand that from schema migration you can't tell if a col is renamed or just removed, but you should still be able to rename a column explicity IMO. There's nothing stopping us implementing a column renaming method ourselves.
  2. I'm afraid it doesn't. We'd have to look at some other kind of work around.
  3. Yes, they probably don't have it for this reason, it's not great practice, we should probably discourage its usage
  4. Agreed.
  5. Doctrine have explained their logic and I agree with their points. I've thought for some time we should move the enum DBField to the application layer - this isn't really a blocker, we just need to implement it in the app layer
  6. agreed

If the ORM were split into a separate composer package, then you could potentially use dev-master of that package and the 4.x versions of all other packages, which would mean that a re-write based on this would be something you'd have to wait as long to use.

That's true and I'd like that myself, but we'd have to decide how granular we really want to split the framework... A discussion for another thread, perhaps.

@dhensby
Copy link
Contributor Author

dhensby commented Mar 8, 2017

In terms of high effort upgrade points, really the solution is to improve the abstraction layer (DataQuery) between our ORM and the DB queries so that no one is writing SQL in the application. This will remove the need for anyone to be touching Convert::symbol2sql(), devs won't need the DB QueryBuilder nor the SQLConditionalExpression class.

@wakes
Copy link

wakes commented Jun 1, 2017

Have you looked at readbeanphp http://redbeanphp.com/index.php? Very lightweight, manages the schema for you (unless you say 'freeze' e.g. in test/live environments), I've used in play code and great for prototyping, not sure re production, performance etc.

@Firesphere
Copy link
Contributor

I've had a few plays with redbean, it's pretty nice, but I'm not well-informed enough to know if it suits SilverStripe

@dhensby
Copy link
Contributor Author

dhensby commented Jun 21, 2017

I haven't looked at redbean.

I've take a quick browse now but it doesn't look quite right for SS. doctrine/dbal works really well with our current approach

@sminnee sminnee added this to the Recipe 5 milestone Jun 29, 2017
@tractorcow
Copy link
Contributor

tractorcow commented Feb 12, 2018

Summary of stuff @dhensby and I discussed on slack:

  • No more quotes for identifiers anywhere; ANSI sql or not
  • Shift all query objects to properly constructed object representation of SQL queries that can be augmented in a type-gnostic fashion; No more SQL strings anymore.
  • field / table rewriting must still work (e.g. for versioned / fluent) but it will operate on specific logical steps instead of string manipulations. E.g. rewrite field vs rewrite string
  • Middleware for object -> sql generation
  • All identifiers lowercase, some field name (foreignkey_id) conventions changed.
  • DB abstraction will probably be shifted to inside doctrine/dbal instead of outside it.
  • I think it would be cool to do closure -> sql transpiling, probably never going to get around to it. :)

On enums, they aren't supported well in many DB connectors anyway. I don't mind it being supported above the DB level if necessary.

Framework also has multi enum, which is again harder to support.

@sminnee
Copy link
Member

sminnee commented Feb 13, 2018

No more quotes for identifiers anywhere; ANSI sql or not

You need quotes for capital letters. Although you're recommending that capital letters are removed from identifiers, you're likely to get edge-case issues for example in the code that migrates from the old identifier names to the new.

As such I'd recommend that anything that auto-generates SQL only adds quotes to identifiers that contain a capital letter or other reserved character, rather than avoiding them entirely.

@tractorcow
Copy link
Contributor

tractorcow commented Feb 13, 2018

Apparently quoting happens internally in doctrine so not anything we need to worry about. :)

I'm not sure I understand the significance of capital letters in identifiers. @dhensby says we need to use all lowercase from here on anyway. I think we probably just strtolower in the sql generation (middleware?).

@dhensby
Copy link
Contributor Author

dhensby commented Feb 13, 2018

As such I'd recommend that anything that auto-generates SQL only adds quotes to identifiers that contain a capital letter or other reserved character, rather than avoiding them entirely.

doctrine/dbal does this automatically (at least for reserved words).

You can see their response to a ticket I opened a while ago here: doctrine/dbal#2710 (comment)

for example in the code that migrates from the old identifier names to the new.

We will need to write this kind of migration manually, I'm afraid. The dbal layer doesn't provide a way to distinguish between different cased identifiers in the abstraction layer. This could be part of upgrader 2.0.

It'll also spell the end of case correction logic for table names / columns.

@robbieaverill
Copy link
Contributor

I've raised an RFC to remove support for enums at #8401

@sminnee
Copy link
Member

sminnee commented Aug 25, 2020

This hasn't been touched in a few years, and I want to flag that I question whether the benefit of this will ever be worth the upgrade pain that it would inevitably cause. The fact is that the current base access layer, while not perfect, is not in my view a particularly large source of either bugs or or maintenance (feel free to challenge that).

I noted on #9520 a comment from @dnsl48:

I agree with @dhensby that it may not be worthwhile without introducing a proper dbal first.

As such, I wonder if keeping this ticket open as a "someday, maybe" item is preventing appropriate decision-making on other issues.

@dnsl48
Copy link
Contributor

dnsl48 commented Aug 25, 2020

Regarding #9520, I don't think this issue affected my judgement there :)
I was rather thinking about the suggested feature increasing complexity of the existing code base and its maintainability.

@sminnee
Copy link
Member

sminnee commented Aug 25, 2020

Ah, right :)

@dhensby
Copy link
Contributor Author

dhensby commented Aug 26, 2020

I agree there probably isn't much value in keeping this open. It's a huge amount of work. I've actually had a DBAL rewrite done and then got most of the tests green only to find our approach of using case sensitive column names was actually a bit of a problem and that using ansi-quotes was also an issue. This drastically increased the scope of the work and I haven't been able to progress it since then.

To get a new DB layer in we actually need a proper domain specific querying layer (which we almost have with DataQuery) except that we also allow raw SQL to be injected and that needs to be replaced with an abstracted layer.

All-in-all, it's going to need some proper concerted effort and a hell-of-a-lot of cross-module breaking changes...

@dhensby dhensby closed this as completed Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants