Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot test database-created objects in the context of a webtest #15634

Closed
miketheman opened this issue Mar 19, 2024 · 12 comments · Fixed by #16031
Closed

Cannot test database-created objects in the context of a webtest #15634

miketheman opened this issue Mar 19, 2024 · 12 comments · Fixed by #16031
Labels
developer experience Anything that improves the experience for Warehouse devs help needed We'd love volunteers to advise on or help fix/implement this. testing Test infrastructure and individual tests

Comments

@miketheman
Copy link
Member

Here's an example test case:

from tests.common.db.accounts import UserFactory

def test_user_profile(db_session, webtest):
    user = UserFactory.create()
    assert user.username
    # vist user's page
    resp = webtest.get(f"/user/{user.username}/")
    assert resp.status_code == 200

This fails with a 404 Not Found. I want to get a 200.

I tried following the guidance here https://docs.pylonsproject.org/projects/pyramid_tm/en/latest/#testing
but I'm overlooking something with how we create sessions/transactions, and can't seem to figure out how to share the session from db_session into the webtest test app, since pyramid_tm will start a fresh transaction for the web request.

There might be something to restructure between our database, app_config fixtures to allow this - I'm a little lost, so I thought I'd write it down for anyone else to poke at.

@miketheman miketheman added testing Test infrastructure and individual tests developer experience Anything that improves the experience for Warehouse devs labels Mar 19, 2024
@miketheman
Copy link
Member Author

Oh, and the motivation behind this is that this kind of setup would allow us to write more functional (integration) tests that create a database state, and then perform actions via the web/api interface (a la browsing, logging in, uploading things, etc) with fewer mocks.

@jzohrab
Copy link
Contributor

jzohrab commented Mar 31, 2024

Hi @miketheman - I opened a draft PR that doesn't fix anything, but maybe helps others reason with the code a bit.

I've at least confirmed that the db contains an object :-) In my projects I sometimes do db-level queries to assert table state. I also do cleanup prior to running tests, usually doing a simple and stupid for t in tables: delete_all_data_in(t). It's heavy-handed but it works.

The db_session fixture creates a session, webtest creates a tm, and pyramid creates a tm ... lots of potential sessions/tms.

@miketheman
Copy link
Member Author

Thanks @jzohrab ! I haven't had a chance to pursue that branch, but maybe @di is interested, as he's tinkered with this problem as well.

@miketheman
Copy link
Member Author

I took another look today, and am still scratching my head.
Thanks for the draft branch, @jzohrab sadly it didn't make it much clearer.

There's a bunch of pieces involved, and trying to tease them apart with what I know so far. We use (from the bottom up?) these libraries:

  • psycopg
  • sqlalchemy
  • zope.sqlalchemy
  • pyramid_tm

Then we layer on our own logic in config.py and db.py.

THEN we create fixtures to take advantage of behaviors in tests:

  • database (test session-wide) - inits the database and returns a connection string
  • app_config (test session-wide) - uses database, creates a pyramid config with the conn string, and runs all db migrations
  • db_session (initiated per test) - uses app_config, gets the engine from config, connects, starts a transaction, creates a session, starts a nested transaction, does some SAVEPOINT magic, and yields the session. When the test ends, runs finally block and closes, tears down, etc.
  • webtest (initiated per test) - uses app_config, creates a customized version of webtest.TestApp to support xmlrpc calls and disable https requirement

So something in all of these layers is not joining the active transaction/session correctly, or potentially the step that inserts data into the database isn't closing, or sqlalchemy's implicit begin is causing the previous session data to not be visible yet? More effort needed here.

@miketheman miketheman added the help needed We'd love volunteers to advise on or help fix/implement this. label Apr 12, 2024
@miketheman
Copy link
Member Author

Was reading some other SQLAlchemy docs today, and saw this, might have something to do with the issue: https://docs.sqlalchemy.org/en/20/orm/session_transaction.html#joining-a-session-into-an-external-transaction-such-as-for-test-suites

@di
Copy link
Member

di commented Apr 19, 2024

This is pretty much already what we're doing here:

warehouse/tests/conftest.py

Lines 343 to 366 in 97d8de8

@pytest.fixture
def db_session(app_config):
engine = app_config.registry["sqlalchemy.engine"]
conn = engine.connect()
trans = conn.begin()
session = Session(bind=conn)
# Start the session in a SAVEPOINT
session.begin_nested()
# Then each time that SAVEPOINT ends, reopen it
@event.listens_for(session, "after_transaction_end")
def restart_savepoint(session, transaction):
if transaction.nested and not transaction._parent.nested:
session.begin_nested()
try:
yield session
finally:
session.close()
Session.remove()
trans.rollback()
conn.close()
engine.dispose()

I added join_transaction_mode="create_savepoint" but your failing test still didn't pass.

I also tried removing the session nesting that we have, in addition to that, but that didn't change anything either.

@di
Copy link
Member

di commented Apr 19, 2024

The good news is I think we can use this to simplify the fixture a little bit: #15826

@benjaoming
Copy link
Contributor

benjaoming commented May 31, 2024

I've proposed solving this by reusing the same db session, rather than always creating a new db session in warehouse.db.

Approach was like this:

  1. Ran a logged container of Postgres (add command: ["postgres", "-c", "log_statement=all"] to the db service in docker-compose.yml) + docker compose logs db -f
  2. Noticed 2 different PIDs (unverified if that also means 2 different sessions, but log output is below)
  3. Had also wondered about the tutorials reusing the same db session: https://docs.pylonsproject.org/projects/pyramid/en/latest/tutorials/wiki2/tests.html
  4. Added a way to override creating a db session in warehouse.db and instead feed the db_session fixture: Tell warehouse.db to reuse db_session pytest fixture #16031

In the output, I note that there are two different PIDs (3727 and 3791) mentioned. I haven't found any definitive documentation about whether the PID indicates a new connection or session, but it was enough to start playing with a hypothesis that the WSGI application had created a new db session/connection.

View postgres log

db-1  | 2024-05-31 09:17:15.642 UTC [3727] LOG:  execute <unnamed>: BEGIN ISOLATION LEVEL READ COMMITTED
db-1  | 2024-05-31 09:17:15.643 UTC [3727] LOG:  statement: SAVEPOINT sa_savepoint_1
db-1  | 2024-05-31 09:17:15.648 UTC [3727] LOG:  execute <unnamed>: INSERT INTO users (username, name, password, is_active, is_superuser, is_moderator, is_psf_staff, date_joined, last_login, disabled_for, totp_secret, last_totp_value, observer_association_id) VALUES ($1, $2::VARCHAR, $3::VARCHAR, $4, $5, $6, $7, $8::TIMESTAMP WITHOUT TIME ZONE, $9::TIMESTAMP WITHOUT TIME ZONE, $10, $11, $12::VARCHAR, $13::UUID) RETURNING users.password_date, users.is_frozen, users.is_observer, users.prohibit_password_reset, users.hide_avatar, users.sitemap_bucket, users.id
db-1  | 2024-05-31 09:17:15.648 UTC [3727] DETAIL:  parameters: $1 = 'kvtYJaeoytrC', $2 = 'movement', $3 = '!', $4 = 't', $5 = 'f', $6 = 'f', $7 = 'f', $8 = '2005-11-02 21:49:24.866345', $9 = '2023-09-30 08:53:29.62694', $10 = NULL, $11 = '\x02c807f5e8f6b992119ec8b127720796adf72276', $12 = NULL, $13 = NULL
db-1  | 2024-05-31 09:17:15.661 UTC [3727] LOG:  execute <unnamed>: SELECT users.username AS users_username, users.name AS users_name, users.password AS users_password, users.password_date AS users_password_date, users.is_active AS users_is_active, users.is_frozen AS users_is_frozen, users.is_superuser AS users_is_superuser, users.is_moderator AS users_is_moderator, users.is_psf_staff AS users_is_psf_staff, users.is_observer AS users_is_observer, users.prohibit_password_reset AS users_prohibit_password_reset, users.hide_avatar AS users_hide_avatar, users.date_joined AS users_date_joined, users.last_login AS users_last_login, users.disabled_for AS users_disabled_for, users.totp_secret AS users_totp_secret, users.last_totp_value AS users_last_totp_value, users.sitemap_bucket AS users_sitemap_bucket, users.observer_association_id AS users_observer_association_id, users.id AS users_id, user_emails_1.id AS user_emails_1_id, user_emails_1.user_id AS user_emails_1_user_id, user_emails_1.email AS user_emails_1_email, user_emails_1."primary" AS user_emails_1_primary, user_emails_1.verified AS user_emails_1_verified, user_emails_1.public AS user_emails_1_public, user_emails_1.unverify_reason AS user_emails_1_unverify_reason, user_emails_1.transient_bounces AS user_emails_1_transient_bounces 
db-1  | 	FROM users LEFT OUTER JOIN user_emails AS user_emails_1 ON users.id = user_emails_1.user_id 
db-1  | 	WHERE users.id = $1::UUID
db-1  | 2024-05-31 09:17:15.661 UTC [3727] DETAIL:  parameters: $1 = '4c0686b5-3f9b-4d40-a815-5693b23be08e'
db-1  | 2024-05-31 09:17:15.664 UTC [3727] LOG:  statement: select * from users
db-1  | 2024-05-31 09:17:15.674 UTC [3791] LOG:  execute <unnamed>: BEGIN ISOLATION LEVEL READ COMMITTED
db-1  | 2024-05-31 09:17:15.675 UTC [3791] LOG:  execute <unnamed>: SELECT admin_flags.id AS admin_flags_id, admin_flags.description AS admin_flags_description, admin_flags.enabled AS admin_flags_enabled, admin_flags.notify AS admin_flags_notify 
db-1  | 	FROM admin_flags 
db-1  | 	WHERE admin_flags.id = $1::VARCHAR
db-1  | 2024-05-31 09:17:15.675 UTC [3791] DETAIL:  parameters: $1 = 'read-only'
db-1  | 2024-05-31 09:17:15.677 UTC [3791] LOG:  statement: select * from users
db-1  | 2024-05-31 09:17:15.687 UTC [3791] LOG:  execute <unnamed>: SELECT users.username AS users_username, users.name AS users_name, users.password AS users_password, users.password_date AS users_password_date, users.is_active AS users_is_active, users.is_frozen AS users_is_frozen, users.is_superuser AS users_is_superuser, users.is_moderator AS users_is_moderator, users.is_psf_staff AS users_is_psf_staff, users.is_observer AS users_is_observer, users.prohibit_password_reset AS users_prohibit_password_reset, users.hide_avatar AS users_hide_avatar, users.date_joined AS users_date_joined, users.last_login AS users_last_login, users.disabled_for AS users_disabled_for, users.totp_secret AS users_totp_secret, users.last_totp_value AS users_last_totp_value, users.sitemap_bucket AS users_sitemap_bucket, users.observer_association_id AS users_observer_association_id, users.id AS users_id, user_emails_1.id AS user_emails_1_id, user_emails_1.user_id AS user_emails_1_user_id, user_emails_1.email AS user_emails_1_email, user_emails_1."primary" AS user_emails_1_primary, user_emails_1.verified AS user_emails_1_verified, user_emails_1.public AS user_emails_1_public, user_emails_1.unverify_reason AS user_emails_1_unverify_reason, user_emails_1.transient_bounces AS user_emails_1_transient_bounces 
db-1  | 	FROM users LEFT OUTER JOIN user_emails AS user_emails_1 ON users.id = user_emails_1.user_id 
db-1  | 	WHERE users.username = $1
db-1  | 2024-05-31 09:17:15.687 UTC [3791] DETAIL:  parameters: $1 = 'kvtYJaeoytrC'
db-1  | 2024-05-31 09:17:15.932 UTC [3791] LOG:  execute <unnamed>: ROLLBACK
db-1  | 2024-05-31 09:17:16.033 UTC [3727] LOG:  statement: ROLLBACK TO SAVEPOINT sa_savepoint_1
db-1  | 2024-05-31 09:17:16.034 UTC [3727] LOG:  execute <unnamed>: ROLLBACK
db-1  | 2024-05-31 09:17:16.040 UTC [3799] LOG:  statement: ALTER DATABASE "tests" with allow_connections false;
db-1  | 2024-05-31 09:17:16.046 UTC [3799] LOG:  execute <unnamed>: SELECT pg_terminate_backend(pg_stat_activity.pid)FROM pg_stat_activity WHERE pg_stat_activity.datname = $1;
db-1  | 2024-05-31 09:17:16.046 UTC [3799] DETAIL:  parameters: $1 = 'tests'
db-1  | 2024-05-31 09:17:16.047 UTC [3799] LOG:  statement: DROP DATABASE IF EXISTS "tests";

@woodruffw
Copy link
Member

woodruffw commented Sep 26, 2024

I could be wrong, but I think CI failures like https://github.com/pypi/warehouse/actions/runs/11057423074/job/30721387457?pr=16778 indicate that the current approach isn't 100% right -- I haven't fully triaged it, but from what I can tell:

  1. In the integration tests, everything uses a transaction.TransactionManager instead of pyramid_tm. This gets plumbed in by overriding the default TM in TestApp.
  2. However, the "factory" helpers (e.g. ProjectFactory) manage their own sessions, which get pulled directly from a scoped_session that isn't tied to the custom TM.

For reference, here's the Session factory that the test model factories use:

Session = scoped_session(Session)

and here's where it gets loaded into each model factory, via Meta:

class WarehouseFactory(SQLAlchemyModelFactory):
class Meta:
abstract = True
sqlalchemy_session = Session

If I understand the lifetimes correctly here, the end result is that calls like ProjectFactory.create() are tied to session that isn't actually tracked by the integration test TM.

I can't tell if this is the actual root cause, but it seems like a decent candidate for the strange test behavior I've seen (e.g. where tests sometimes pass, sometimes fail because the models appear to not be updated, sometimes fail because a route that traverses them 404s instead of 200ing).

I think the fix for the above is to somehow bind these model factories' sessions to the custom TM, but I'm not sure how best to do that.

@benjaoming
Copy link
Contributor

@woodruffw that might indeed be the case. Here's something that I load in my Pytest db session fixtures:

@pytest.fixture
def db_session(...):

    ...

    # Ensure that all factories use the same session
    # From: https://github.com/tiangolo/sqlmodel/discussions/615
    for factory in SQLAlchemyModelFactory.__subclasses__():
        factory._meta.sqlalchemy_session = session

@woodruffw
Copy link
Member

Nice, thanks @benjaoming! I was trying to get something similar with patching _meta.sqlalchemy_session working, but I was overthinking it.

@benjaoming
Copy link
Contributor

Just sharing someone else's idea - open source ftw 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer experience Anything that improves the experience for Warehouse devs help needed We'd love volunteers to advise on or help fix/implement this. testing Test infrastructure and individual tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants