-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation
@jeancochrane tests are failing for me. first i ran into this.
i think we need to install i did but then, my tests entered another world of hurt.
so that seems like it was the wrong thing to do... |
oh, i see, it looks like test directories no longer need an i deleted
|
@hancush Thanks for flagging this! I did some digging and discovered that all of the mysterious errors go away when you remove the Unfortunately I'm still confused by Python packaging and I can't quite figure out why pip is installing the |
@hancush Got the testing environment fixed up thanks to @fgregg's help! You'll find updated installation instructions here: datamade/django-councilmatic#252 This still won't address the problem of specifying a GitHub source for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you so much for taking this on, @jeancochrane!
i especially like that you've kept the methods in tact, even though they follow a similar pattern of behavior. i think that we erred a little too far on the side of abstraction in our now-obsolete import_data
script, so i'm happy to see this landed closer to the middle!
to summarize the comments:
- we started implementing custom managers for the first-class objects (events and bills, in particular) in django-councilmatic that cast their date attributes to datetime objects, but it seems like we didn't always handle null dates. do you think we ought to make that change in django-councilmatic, so you can leverage it here?
- i think excluding objects with null end dates instead of improvising the coalesce operation would be more straightforward.
- does
distinct
onid
remove anything from a queryset?
happy to talk more about this out loud, if you like!
), | ||
Value(str(self.get_threshold(minutes+1))) | ||
), | ||
# If the 'date' is null, set it beyond the threshold to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you simplify this by excluding actions with null dates before you run the query, e.g., BillAction.objects.exclude(date__exact='').annotate(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a feeling there was a smarter way to do this 😅After refactoring the Coalesce
logic into django-councilmatic
, I'll swap this to .exclude(date__isnull=True)
.
cursor = connection.cursor() | ||
cursor.execute(new_actions, [tuple(bill_ids)]) | ||
def find_bill_action_updates(self, bill_ids, minutes=15): | ||
new_actions = ocd_legislative_models.BillAction.objects.annotate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, this annotation
operation strikes me as something that should happen in django-councilmatic
, like we do for memberships. then you would query the proxy model, instead of the base ocd model. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely the right place for this to happen! If we can do that it'll also simplify these queries enormously. I'll go ahead and make a PR onto django-councilmatic
moving this logic over.
committee_updates.append(committee_group) | ||
|
||
for action in new_actions.distinct( | ||
'date', 'organization__id', 'bill__id', 'id' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this distinct
? wouldn't it return everything, if it's distinct
on id
(the pk?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure! I couldn't figure out why it was in the previous query so I kept it on superstitiously. I suspect there was something about the preceding joins that could return duplicate rows. But since neither of us can figure it out I'll go ahead and remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i think in general the distincts in the queries were because of the joins, e.g., joining event to event participant would return a row for every participant -> duplicate events. i feel ok about omitting them here!
def find_committee_event_updates(self, committee_ids, minutes=15): | ||
committee_updates = [] | ||
for committee_id in committee_ids: | ||
new_event_particip = ocd_legislative_models.EventParticipant.objects.annotate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's becoming clear to me that we might need to handle null end dates in all the custom managers in django-councilmatic
.
otoh, if you think it's just notifications that needs to deal with this, maybe break out this annotation into a helper method on this class, to reduce repeat code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this particular instance, could you the query councilmatic Event
, whose default manager includes the datetime annotation, then grab the event participant off those objects meeting the filter criteria?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's becoming clear to me that we might need to handle null end dates in all the custom managers in django-councilmatic.
It could be a good idea to standardize django-councilmatic
model managers to provide _dt
attributes that are cast to DateTimeField
s with proper nulls, but I don't think I could make that change with confidence in the context of this PR. I went ahead and added/updated these attributes for models that are implicated in Notifications, however.
could you the query councilmatic Event, whose default manager includes the datetime annotation, then grab the event participant off those objects meeting the filter criteria?
I can't believe I missed EventManager
! This is very smart -- done.
created_at__lte=self.get_threshold(minutes), | ||
updated_at__gte=self.get_threshold(minutes), | ||
start_datetime__gte=datetime.now(pytz.timezone(settings.TIME_ZONE)) | ||
).distinct('start_datetime', 'id').order_by('start_datetime') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto distinct on id.
).filter( | ||
created_at__gte=self.get_threshold(minutes), | ||
start_datetime__gte=datetime.now(pytz.timezone(settings.TIME_ZONE)) | ||
).distinct('start_datetime', 'id').order_by('start_datetime') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto distinct on id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, is distinct
necessary at all here?
notifications/urls.py
Outdated
SubscriptionsManageView, person_subscribe, person_unsubscribe, bill_subscribe, \ | ||
bill_unsubscribe, committee_events_subscribe, committee_events_unsubscribe, \ | ||
committee_actions_subscribe, committee_actions_unsubscribe, search_check_subscription, \ | ||
search_subscribe, search_unsubscribe, events_subscribe, events_unsubscribe, \ | ||
send_notifications | ||
|
||
import django_rq | ||
if django.VERSION < (1, 11): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this release depends on django-councilmatic>=1.0
, and that requires django>=2.0
, do we need to do this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! I thought I was being clever supporting multiple Django versions, but you're right, the dependency tree excludes <=1.0
.
notifications/views.py
Outdated
from django.core.exceptions import ObjectDoesNotExist | ||
from django.core.mail import EmailMessage | ||
from django.core.cache import cache | ||
from django.core import management | ||
|
||
if django.VERSION < (2, 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto on version question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @hancush! Your point about factoring date-casting logic out into django-councilmatic
was a really nice design suggestion and wound up simplifying the queries here by a big margin. I'll open up a PR in django-councilmatic
corresponding to those changes.
I think my two biggest remaining questions are:
-
What are the
distinct
filters doing in these queries, and do we need to preserve them in the ORM? As far as I can tell they're superfluous but I don't understand what they were doing before so I'm worried I don't grok the data model well enough to make an informed decision. -
Am I being too cautious about timezones? I made sure to adjust queries so that any time we retrieve a
datetime.now()
object we make it timezone-aware according tosettings.TIME_ZONE
, but if e.g. all OCD models are stored in the database in UTC time, then it might be better to enforce the timezone as UTC.
committee_updates.append(committee_group) | ||
|
||
for action in new_actions.distinct( | ||
'date', 'organization__id', 'bill__id', 'id' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure! I couldn't figure out why it was in the previous query so I kept it on superstitiously. I suspect there was something about the preceding joins that could return duplicate rows. But since neither of us can figure it out I'll go ahead and remove it.
def find_committee_event_updates(self, committee_ids, minutes=15): | ||
committee_updates = [] | ||
for committee_id in committee_ids: | ||
new_event_particip = ocd_legislative_models.EventParticipant.objects.annotate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's becoming clear to me that we might need to handle null end dates in all the custom managers in django-councilmatic.
It could be a good idea to standardize django-councilmatic
model managers to provide _dt
attributes that are cast to DateTimeField
s with proper nulls, but I don't think I could make that change with confidence in the context of this PR. I went ahead and added/updated these attributes for models that are implicated in Notifications, however.
could you the query councilmatic Event, whose default manager includes the datetime annotation, then grab the event participant off those objects meeting the filter criteria?
I can't believe I missed EventManager
! This is very smart -- done.
} | ||
for event_particip in new_event_particip.distinct( | ||
'event__start_date', 'organization__id', 'event__id' | ||
).order_by('-event__start_date'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hancush Do you think this distinct
filter is necessary?
).filter( | ||
created_at__gte=self.get_threshold(minutes), | ||
start_datetime__gte=datetime.now(pytz.timezone(settings.TIME_ZONE)) | ||
).distinct('start_datetime', 'id').order_by('start_datetime') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, is distinct
necessary at all here?
notifications/urls.py
Outdated
SubscriptionsManageView, person_subscribe, person_unsubscribe, bill_subscribe, \ | ||
bill_unsubscribe, committee_events_subscribe, committee_events_unsubscribe, \ | ||
committee_actions_subscribe, committee_actions_unsubscribe, search_check_subscription, \ | ||
search_subscribe, search_unsubscribe, events_subscribe, events_unsubscribe, \ | ||
send_notifications | ||
|
||
import django_rq | ||
if django.VERSION < (1, 11): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! I thought I was being clever supporting multiple Django versions, but you're right, the dependency tree excludes <=1.0
.
I think I'm all set for another look @hancush! Still curious what you think about timezones, but other than that this should be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: datetime, it looks like created and updated at timestamps, plus event start times, are localized, so the way you've handled comparisons should be ok!
i do have a question about how to identify new or updated bill-related objects, e.g., bill actions and bill sponsorships. we used to separate these in the data import, but we don't do that anymore. i think we can use the created at timestamp from the bill to find new related objects, because sponsorships and actions associated with a new bill will also be new, but i don't think we can use the updated at timestamp to find updates to related objects, because it will be updated for any change in the bill.
am i missing something obvious, or misinterpreting intent? called out specific instances inline.
def find_bill_action_updates(self, bill_ids, minutes=15): | ||
new_actions = councilmatic_models.BillAction.objects.filter( | ||
bill__id__in=bill_ids, | ||
date_dt__gte=self.get_threshold(minutes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this column refers to the date of the action itself, not necessarily when it was added to the database. my first instinct was to query through the updated at on the bill, however that can be toggled for any bill update, not just new actions. we used to create a new table by left joining incoming objects with the existing table to separate them into new and updated objects. i don't know that we have a good way of knowing whether a bill action (or other event- or bill-related object without its own updated at timestamp) is "new" in the refactor... @fgregg, do you have any thoughts here?
person_updates = [] | ||
for person_id in person_ids: | ||
new_sponsorships = councilmatic_models.BillSponsorship.objects.filter( | ||
bill__created_at__gte=self.get_threshold(minutes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are sponsorships ever added some time after a bill is created? if so, i don't think this will capture them, but i have the same uncertainty about how to capture changes in related objects as above.
for committee_id in committee_ids: | ||
new_actions = councilmatic_models.BillAction.objects.filter( | ||
organization__id=committee_id, | ||
date_dt__gte=self.get_threshold(minutes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto on changes in related objects.
You are right @hancush. I don't have a perfect idea. I think the best is to create subclasses in django-councilmatic that have updated_at fields. |
Creating subclasses with |
yes.
…On Mon, Jul 15, 2019 at 5:10 PM Jean Cochrane ***@***.***> wrote:
Creating subclasses with updated_at attributes seems reasonable. Will we
then have to update the django-councilmatic signals to set these fields
when objects get created or updated?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33?email_source=notifications&email_token=AAEDC3KALS3VGFJV63GKCYDP7TYVDA5CNFSM4H67SPH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ7DV7Y#issuecomment-511589119>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEDC3NRJDU554EFFS5SPTDP7TYVDANCNFSM4H67SPHQ>
.
|
Looks like the signals approach won't work for
Curious to hear what @fgregg thinks about all this. |
I think the easiest thing to do would be the add update_at fields to the
base python-opencivicdata models. I bet the Jameses will accept that PR.
…On Tue, Jul 16, 2019 at 4:29 PM Jean Cochrane ***@***.***> wrote:
Looks like the signals approach won't work for BillActions and
BillSponsorships. Pupa uses bulk_create() to create related entities
<https://github.com/opencivicdata/pupa/blob/fd599b7fa82059f2ef94bce432dada193bbfdd6c/pupa/importers/base.py#L401>,
which doesn't fire pre_save or post_save signals
<https://docs.djangoproject.com/en/2.2/ref/models/querysets/#bulk-create>.
@hancush <https://github.com/hancush> and I talked about a few possible
alternatives:
1. *Write a migration for django-councilmatic to set DB-level Postgres
triggers, a la BGA
<https://github.com/datamade/bga-payroll/blob/master/payroll/migrations/0007_slug_trigger.py>.*
This is a close approximation to the signals approach, but feels like the
wrong level of abstraction to me, in that would introduce application-level
functionality via a database migration.
2. *Adjust python-opencivicdata model managers to use a custom
bulk_create method that fires post_save signals.* This is the solution
recommended by the SO threads we found on the topic (see example
<https://stackoverflow.com/a/30632974/7781189>). This would let us
preserve the signals approach as-is, but would require a higher-order
change to an upstream library. The custom manager would also be a bit
confusing since it would only exist to adjust behavior in an upstream
library, not in python-opencivicdata itself.
3. *Adjust pupa to allow optionally creating related models with a
naked save() instead of post_save().* This would preserve the signals
approach, but require an upstream change and reduce the performance of the
operation that creates related entities.
4. *Add created_at and updated_at fields to the RelatedBase class in
python-opencivicdata.* This is my personal favorite approach but I
suspect it'll be controversial, since it requires a data model change to an
upstream library. I don't fully understand why created_at and
updated_at aren't already part of the spec
<https://opencivicdata.readthedocs.io/en/latest/data/bill.html#sponsors-and-actions>
for related entities, though, so maybe I'm missing something important.
Curious to hear what @fgregg <https://github.com/fgregg> thinks about all
this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33?email_source=notifications&email_token=AAEDC3L6SMLIWU6FFKCQULTP7Y4UPA5CNFSM4H67SPH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2CG2GA#issuecomment-511995160>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEDC3LAJUUKCD52WIBDNRTP7Y4UPANCNFSM4H67SPHQ>
.
|
Superceded by #34. |
Overview
Make sure the app is compatible with
django-councilmatic
v1.0. The major changes in this update include:send_notifications
management command to use the ORM and the new OCD model structureTesting instructions
pip install -e .[tests]
pytest
and confirm all tests pass