Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRM-20958 - Track creation+modification times for activities+cases #10754

Merged
merged 13 commits into from
Sep 6, 2017

Conversation

totten
Copy link
Member

@totten totten commented Jul 25, 2017

Overview

In developing workflows, reports, and UIs for activities and cases, it is useful to sort and filter based based on when a record was created or when a record was last modified.

Before

  • The tables civicrm_activity and civicrm_case do not have the timestamp columns created_date and modified_date.
  • The table civicrm_log includes some records to indicate creation/modification time:
    • It does include records for activities which have been created or updated via BAO. (Notwithstanding the possibility of local-admin pruning.)
    • It may not include records for sample activities created during installation or others created via SQL.
    • It does not include records for cases. (According to the original design philosophy, all changes to cases would be reflected in the activities, so I suppose civicrm_log would have been redundant low-priority.)
    • The timestamps stored in civicrm_log.modified_date are not entirely reliable -- they are stored as DATETIME without any adjustments for timezone. This is a pre-existing and systemic problem with fields being flagged as DATETIME instead of TIMESTAMP.

After

  • The tables civicrm_activity and civicrm_case do have the timestamp columns created_date and modified_date.
    • These fields can be read via APIv3/BAO/DAO/SQL. They cannot be directly modified via APIv3/BAO.
    • These fields are automatically initialized and updated whenever one:
      • Creates or updates an Activity via APIv3/BAO/DAO/SQL.
      • Creates or updates a Case via APIv3/BAO/DAO/SQL.
      • Creates or updates an Activity via "Contacts => New Activity", "View Contact => Actions", or "View Contact => Activities". (This stems from SQL triggers, but it's been tested separately.)
      • Creates or updates a Case via "Cases => New Case" or "Manage Cases" UI. (This stems from SQL triggers, but it's been tested separately.)
  • The system status check displays NOTICEs if...
    • There are any case or activity records which do not have created_date or modified_date.
    • Any columns are marked as DATETIME instead of a more sensible TIMESTAMP. (This addresses both civicrm_log.modified_date and the pre-existing discrepancies left over from CRM-9683).
  • Both NOTICEs suggest installation of the new (experimental) extension, doctorwhen, to handle migration+cleanup.

screen shot 2017-07-31 at 8 45 53 pm

If the admin installs doctorwhen, then a new item appears in "Administer => Doctor When":
screen shot 2017-07-31 at 8 59 33 pm

Technical Details

  • For initializing/maintaining the fields {civicrm_activity,civicrm_case}.{created_date,modified_date}, there are several SQL triggers. The configuration closely parallels civicrm_contact:
    • The SQL column definition for modified_date specifies ON UPDATE CURRENT_TIMESTAMP). (Note: MySQL only allows one column to be flagged with a default value of CURRENT_TIMESTAMP.)
    • The services civi.activity.triggers and civi.case.triggers install SQL triggers which initialize the created_date.
    • The services civi.activity.triggers and civi.case.triggers install SQL triggers which update modified_date whenever custom-data changes.
    • The service civi.case.staticTriggers installs SQL triggers which update the civicrm_case.modified_date whenever a related activity is created, modified, or deleted.
  • The list of SQL columns to consider migrating is tracked in CRM_Utils_Check_Component_Timestamps.

Comments

  • The JIRA ticket CRM-20958 includes a lengthy list of acceptance criteria.
  • This PR description has been updated to reflect issues that were discussed during development.


$title = sprintf('CRM-20958 - Compute civicrm_activity.created_date from civicrm_log (%d => %d)', $startId, $endId);
$sql = 'UPDATE civicrm_activity
SET created_date = (SELECT MIN(l.modified_date) FROM civicrm_log l WHERE l.entity_table ="civicrm_activity" AND civicrm_activity.id = l.entity_id)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm not certain about... this code initializes civicrm_activity.created_date (etal) using civicrm_log.modified_date, but they have different data-types (TIMESTAMP vs DATETIME, respectively).

@eileenmcnaughton @seamuslee001 To my recollection, DATETIME and TIMESTAMP handle timezones differently, so wondering if there's some kind of filtering I should do here to adjust for timezone.

However, my gut says that storing civicrm_log.modified_date as DATETIME is actually a subtle dirtiness/corruption. Ex: This suggests that civicrm_log.modified_date would be initialized based on date('YmdHis') (ie the user's perceived timestamp based on active PHP timezone setting) rather than a canonical timestamp. But different civicrm_log records can be created by different people/sessions/timezones, so the raw data in civicrm_log wouldn't be reliably articulated in any one timezone.

This wouldn't really the fault of my issue/patch -- if the theory is true, you'd expect to see idiosyncrasies in the display of this data. For migration purposes, somewhere you'd have to bite the bullet and accept that timestamps would be wrong +/- a few hours?

Copy link
Member Author

@totten totten Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed the data is dirty:

  • Open a new browser. Login to d46 as admin. Set your timezone to America/New_York. Edit contact 139. Observe "Change Log" has new entry with timestamp July 26th, 2017 7:29 PM.
  • Open a new browser. Login to d46 as demo. Set your timezone to America/Los_Angeles. Edit contact 139. Observe "Change Log" has new entry with timestamp July 26th, 2017 4:30 PM. (In reality, this is 1 minute later.)

But the changelog (as viewed by either user) shows weird data: admin's change actually happened 1 minute before demo's change, but it looks like it happened 179 minutes after.

screen shot 2017-07-26 at 4 34 29 pm

@seamuslee001
Copy link
Contributor

Yep so timestamp stores everything in UTC and does conversion to UTC if connection is in another Timezone. Date time is just what ever time is sent to it from the client

@totten totten force-pushed the master-actcase-ts branch from 71aebd9 to b1abc8e Compare July 27, 2017 02:00
@xurizaemon
Copy link
Member

xurizaemon commented Jul 27, 2017

CRM-9683 seems relevant to all this - same basic problem, over in CiviMail?

@seamuslee001
Copy link
Contributor

I thought that CiviMail fields had been converted to timestamp already

@seamuslee001
Copy link
Contributor

Maybe that was only on the Fuzion branch of things

@eileenmcnaughton
Copy link
Contributor

@seamuslee001 we have converted some fields on new installs but not on existing due to nervousness about the - the fundamental thing I believe it when you convert you should be in the most accurate tz possible

@seamuslee001
Copy link
Contributor

@eileenmcnaughton for some reason i thought the CiviMail was already converted, maybe AUG / fuzion thing

@eileenmcnaughton
Copy link
Contributor

@seamuslee001 yes, converted on all Fuzion customers & any site that wishes to convert can do so with a simple ALTER TABLE mysql statement. Also converted for new installs - but we have not figured out what to do with existing ones (mostly because the timestamp will be relative to the tz used when converting & there is some room for debate there)

@eileenmcnaughton
Copy link
Contributor

(there is a case to be made for providing a change over routine that people can run through UI or cli depending which is more appropriate)

@totten totten force-pushed the master-actcase-ts branch from b1abc8e to 93891bf Compare July 28, 2017 05:05
@totten
Copy link
Member Author

totten commented Jul 28, 2017

(there is a case to be made for providing a change over routine that people can run through UI or cli depending which is more appropriate)

Yeah, this is my main question at this point.

On one hand... this PR seems to be working for me. The schema is created; the data migrates according to the evil scenario described in the commit-notes; the API seems to return and sort the fields; the fields for cases+activities are updated when using "Manage Case" UI; there's new test coverage; and the existing tests are now passing.

On the other hand... it might be a safer phase-in process if we only add the new columns -- and put the other stuff (the triggers and migration) behind some kind of opt-in.

@eileenmcnaughton
Copy link
Contributor

I think the triggers should be uncontroversial - since they are also in the contact schema. The question is more the migration. On balance I probably would, since that was done for the civicrm_contact table. But if you were going to leave something off I would limit it to the migration

@totten totten force-pushed the master-actcase-ts branch from 93891bf to 1ee5377 Compare August 1, 2017 01:55
totten added a commit to totten/civicrm-core that referenced this pull request Aug 1, 2017
As discussed in [civicrm#10754](civicrm#10754),
the source material for populating `created_date` and `modified_date` has some
quality issues (vis-a-vis timezones).

If we attempt to immediately initialize `created_date` and `modified_date` from
pre-existing data, then it gets a lot harder to clean up the pre-existing data.
Fortunately, most systems don't actually *need* this data immediately.
So we can defer.

I've moved these migration rules over to an extension

https://github.com/civicrm/org.civicrm.doctorwhen
@totten totten changed the title (WIP) CRM-20958 - Track creation+modification times for activities+cases CRM-20958 - Track creation+modification times for activities+cases Aug 1, 2017
@totten
Copy link
Member Author

totten commented Aug 1, 2017

@eileenmcnaughton @xurizaemon @seamuslee001 I've moved the migration logic to an extension https://github.com/civicrm/org.civicrm.doctorwhen and put a status check in CRM_Utils_Check_Component_Timestamps. Additionally, since CRM-9683 sorta left the transition from DATETIME to TIMESTAMP in a netherrealm, and since doctorwhen is labelled "experimental", it also includes checkboxes for migrating those columns.

I've removed the "WIP" flag.

@eileenmcnaughton
Copy link
Contributor

cool - I wonder how people would find out about that if they wanted to switch

@totten
Copy link
Member Author

totten commented Aug 1, 2017

CC @demeritcowboy - you might find this PR interesting.

@seamuslee001
Copy link
Contributor

makes sense to me

@demeritcowboy
Copy link
Contributor

Thanks for the notice. Hmm, DoctorWhen - Timezones and Relative Dates in Civi?

Don't think it makes a difference here but noting the original use of civicrm_log wasn't entirely redundant - for example to see an activity's original creation date you still needed to look in civicrm_log.

It also probably doesn't make a difference but since the opening paragraph mentions it as the rationale, noting that at least in my experience Activity Date (activity_date_time) is usually the field used for sorting/reporting, and it allows users to have some control over how things sort without messing up the "system" dates like creation date. The system dates are usually just used during investigation. Cases and some of their activities are often allocated to certain reporting periods via the activity date, similar to financial reporting. But none of that prevents making changes for create/modify.

Copy link
Member

@davialexandre davialexandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @totten, left some inline comments

/**
* Check that the timestamp columns are populated. (CRM-20958)
*
* @return array<CRM_Utils_Check_Message>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember seeing this syntax before and I'm not sure of what it means. If it returns an array or a CRM_Utils_Check_Message instance, shouldn't it be array|CRM_Utils_Check_Message?

Copy link
Member Author

@totten totten Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array<CRM_Utils_Check_Message> means "an array in which the elements are instances of class CRM_Utils_Check_Message".

Personally, I go a bit bonkers reading @return array because it doesn't really say anything.

AFAIK, php.net hasn't blessed a notation for array typing, but there a few references for comparison:

  • It's similar to Java's notation for generics (eg Vector<Message>, HashMap<String,Message>).
  • It matches Hack's array notation. (Hack is PHP derivative with stronger typing.)
  • PHP-FIG has a draft spec for docblocks. It mentions both C-style arrays (Message[]) as well Java-style generics (Collection<Message>).
  • In my local copy of PHPStorm, it provides drilldown for C-style arrays (Message[]) but not Java-style generics.
  • I'm not particularly consistent. Sometimes it seems meaningful to convey even more information, so I use really verbose descriptor like:
  @return array
     Array(string $msgId => CRM_Utils_Check_Message $msg).

I guess standardizing/documenting/cleaning-up would be good, but that should be probably be a separate project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that most projects use the notation defined by http://phpdoc.org, which uses the CRM_Utils_Check_Message[]. Given that the PHP-FIG draft is a derivative of that phpdoc and that most projects already use the C-style arrays, I think safe enough to assume this is the "right" way to do it.

Sometimes I prefer a more verbose description too

*/
public function checkSchema() {
$problems = array();
foreach (self::getConvertedTimestamps() as $tgt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does tgt mean? Perhaps, for the sake of readability, it would be better to not use an abbreviation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename it to $target.

* @package CRM
* @copyright CiviCRM LLC (c) 2004-2017
*/
class StaticTriggers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, we use singular for class names. Any reason for using the plural here? The same applies for the namespace, which, in a way, is part of the class name.

Copy link
Member Author

@totten totten Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: SqlTriggers namespace -- Agree that should be singular. Pretty much all namespaces can be equally described as singular or plural, and it's nice to have consistent style.

Re: StaticTriggers class -- I don't think the singular/plural issue as a matter of style here -- it's really a semantic difference. SqlTrigger would just be one object managing one trigger. But if one object manages multiple triggers, then you'd say SqlTriggers or SqlTriggerSet or SqlTriggerList or SqlTriggerArray or SqlTriggerGroup. Personally, I prefer for the simplest construction that's accurate.

In this case, the instance of StaticTriggers contains three triggers that serve a common goal. The goal is to update civicrm_case.modified_date whenever a related activity changes -- but that decomposes to three triggers for INSERT, UPDATE, and DELETE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say that I agree, but I don't have a better name to suggest. Since you think it's good enough, I won't argue :)

* @see \CRM_Core_DAO::triggerRebuild
* @see http://issues.civicrm.org/jira/browse/CRM-10554
*
* @param $info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the type of this param?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question, but the answer is to drill-down on @see \CRM_Utils_Hook::triggerInfo. I don't want to reproduce the hook spec here. Will put another comment directing folks there.

* @see http://issues.civicrm.org/jira/browse/CRM-10554
*
* @param $info
* @param null $tableFilter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of value is expected here, when it isn't null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question, but the answer is to drill-down on @see \CRM_Utils_Hook::triggerInfo. I don't want to reproduce the hook spec here. Will put another comment directing folks there.

));
$this->assertRegExp(';^\d\d\d\d-\d\d-\d\d \d\d:\d\d;', $case_1['created_date']);
$this->assertRegExp(';^\d\d\d\d-\d\d-\d\d \d\d:\d\d;', $case_1['modified_date']);
$this->assertApproxEquals(strtotime($case_1['created_date']), strtotime($case_1['created_date']), 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're comparing the created_date with itself here. Is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll fix that.

));
$this->assertRegExp(';^\d\d\d\d-\d\d-\d\d \d\d:\d\d;', $activity_1['created_date']);
$this->assertRegExp(';^\d\d\d\d-\d\d-\d\d \d\d:\d\d;', $activity_1['modified_date']);
$this->assertApproxEquals(strtotime($activity_1['created_date']), strtotime($activity_1['created_date']), 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too


$this->assertEquals($activity_1['created_date'], $activity_2['created_date']);
$this->assertNotEquals($activity_1['modified_date'], $activity_2['modified_date']);
$this->assertTrue(strtotime($activity_1['modified_date']) < strtotime($activity_2['modified_date']),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using assertLessThan() instead of assertTrue()?

$this->assertRegExp(';^\d\d\d\d-\d\d-\d\d \d\d:\d\d;', $case_2['modified_date']);
$this->assertEquals($case_1['created_date'], $case_2['created_date']);
$this->assertNotEquals($case_2['created_date'], $case_2['modified_date']);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must confess it was a bit hard for me to read this test and understand what was going on. I feel it would have been easier if it was split into smaller tests, each one having a descriptive name of what is being tested. Examples:

  • testCreatedDateShouldBeTheSameAsModifiedDateWhenACaseIsCreated
  • testCreatedDatedIsNotUpdatedDuringWhenACaseIsUpdated
  • testModifiedDateShouldBeUpdatedWhenACaseIsUpdated
  • testCreatedDateAndModifiedDateCannotBeManuallyChanged
  • testDateFormatOfCreatedDateAndModifiedDateShouldBeCorrect

Note that the examples have some scenarios that were not even covered by tests. In addition to that, do you think it would make sense to also have tests for the BAO (that is, check that the same rules will work when you do something like Case::create()?

*
* @see CRM_Core_DAO::executeQuery
*/
public static function task_executeQuery($ctx, $sql, $vars) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed it, but it doesn't look like you're using this anywhere. Are you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point. It was a helper to support the migration task, but then the "migration task" migrated itself off to doctorwhen, so we don't really need this straggler.

OTOH, it is a handy straggler...

@totten totten force-pushed the master-actcase-ts branch from eeb3739 to dfe984e Compare August 2, 2017 20:56
totten added a commit to totten/civicrm-core that referenced this pull request Aug 2, 2017
As discussed in [civicrm#10754](civicrm#10754),
the source material for populating `created_date` and `modified_date` has some
quality issues (vis-a-vis timezones).

If we attempt to immediately initialize `created_date` and `modified_date` from
pre-existing data, then it gets a lot harder to clean up the pre-existing data.
Fortunately, most systems don't actually *need* this data immediately.
So we can defer.

I've moved these migration rules over to an extension

https://github.com/civicrm/org.civicrm.doctorwhen
$result = $this->callAPISuccessGetSingle('Case', array('case_id' => $id));
// Modification dates are likely to differ by 0-2 sec. Check manually.
$this->assertTrue($result['modified_date'] >= $case['modified_date']);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you accepted my suggestion to use assertLessThan() so maybe can you use assertGreaterThanOrEqual here, especially because there's no custom message for this assertion

@totten totten force-pushed the master-actcase-ts branch from 40399e1 to 609ecd0 Compare August 3, 2017 22:12
totten added a commit to totten/civicrm-core that referenced this pull request Aug 3, 2017
As discussed in [civicrm#10754](civicrm#10754),
the source material for populating `created_date` and `modified_date` has some
quality issues (vis-a-vis timezones).

If we attempt to immediately initialize `created_date` and `modified_date` from
pre-existing data, then it gets a lot harder to clean up the pre-existing data.
Fortunately, most systems don't actually *need* this data immediately.
So we can defer.

I've moved these migration rules over to an extension

https://github.com/civicrm/org.civicrm.doctorwhen
@totten
Copy link
Member Author

totten commented Aug 11, 2017

@agh1 @MegaphoneJon @Stoob - I think you guys are pretty thoughtful about messaging. Could you comment on how well these screens present the matter of transitioning/cleaning-up dates/times? See the screenshots and hyperlink from the PR description.

In years past, we might have tried to address the change/cleanup of time data with an automated upgrade script, and then there'd be fall-out when the script doesn't anticipate some issue... which is fairly likely because no one has access to the kind of context/tools/datasets needed to ensure a change works everywhere. In theory, the RC period helps, but I'm feeling a little untrusting there. So this PR tries to break it into smaller steps -- eg make the smallest change possible, show a notification, say "Don't panic", and then link to experimental docs/tools for the rest.

The in-app messages will probably set the tone+reactions. Most of the folks on this ticket have already formed some perspective (due historical issues) -- so I wanted to ping you guys for a fresh perspective on the messaging/transition-path.

@totten totten force-pushed the master-actcase-ts branch from a18077f to 9443b00 Compare August 18, 2017 04:50
totten added a commit to totten/civicrm-core that referenced this pull request Aug 18, 2017
As discussed in [civicrm#10754](civicrm#10754),
the source material for populating `created_date` and `modified_date` has some
quality issues (vis-a-vis timezones).

If we attempt to immediately initialize `created_date` and `modified_date` from
pre-existing data, then it gets a lot harder to clean up the pre-existing data.
Fortunately, most systems don't actually *need* this data immediately.
So we can defer.

I've moved these migration rules over to an extension

https://github.com/civicrm/org.civicrm.doctorwhen
@totten totten force-pushed the master-actcase-ts branch from 9443b00 to d0f064f Compare August 18, 2017 05:10
== Before ==

 * SQL triggers to populate `civicrm_contact.created_date` and  `civicrm_contact.modified_date` are
   generate via `CRM_Contact_BAO_Contact::triggerInfo($info, $tableName)`

== After ==

 * `CRM_Contact_BAO_Contact::triggerInfo` calls a helper `TimestampTriggers`
 * The helper `TimestampTriggers` accepts arguments describing the names of the tables/columns
   which needed for the timestamp triggers.

== Comments ==

To test, I used this command to update and dump the schema:

```
cv api system.flush triggers=1 && mysqldump --triggers ...
```

The schema was identical before and after.  (Notably, by alternately hacking
the code, I was able to validate the test was capable of revealing
discrepencies.)
…reation/modification date.

The technique of using hard-coded example record doesn't work with this data.
There appears to be some application logic which follows a process like this:

 1. Read the case
 2. Tweak the data
 3. Save updated case

The problem is comes if step civicrm#4 resaves a timestamp loaded in step #1, which
is fairly likely to happen if you read+save the same record.

This was specifically observed on the "Manage Case" screen when editing
activities, but the data-flow is pretty common, so make a general fix to the
BAO.
…ation date.

Checking the `modified_date` is a bit racy -- depending on sub-second
performance/alignment, the original `Case` creation and the subsequent
`Case` update may have the same `modified_date` or may have different
`modified_date`.
@totten
Copy link
Member Author

totten commented Sep 6, 2017

Following up on my questions about messaging... I escalated to civicrm-dev and moved half of the messaging to separate/smaller JIRA issue and PR (CRM-21079, #10874), which reduced the scope of the current PR.

Also, just rebased to handle a merge-conflict.

@colemanw colemanw merged commit 27b55ab into civicrm:master Sep 6, 2017
@totten totten deleted the master-actcase-ts branch September 6, 2017 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants