Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis v2 alpha 01: io and new lagged diff for sparse time series #100

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eraderna
Copy link
Contributor

@eraderna eraderna commented Jan 7, 2025

WiP, hence under an "alpha" directory.

Main development for review here is proposed replacement for add-diff, add-sparse-lag1-diff-by-group. For this, request review of:

  • rationale, in comment block in witan.send.adroddiad.analysis-v2.alpha.summarise-domain-test
  • interface for add-sparse-lag1-diff-by-group, described in defn docstring in witan.send.adroddiad.analysis-v2.alpha.summarise-domain.
  • tests for add-sparse-lag1-diff-by-group illustrating capabilities and usage.
  • implementation of add-sparse-lag1-diff-by-group.

Previously relied on deps of calling project to specify some of the
required libs.
@eraderna eraderna requested a review from otfrom January 7, 2025 17:55
Copy link
Member

@otfrom otfrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is an easier way to do the add-diff. I certainly think we should look at factoring a few things out to give them names and make the main body clearer.

Comment on lines +19 to +23
(defn transitions->transitions-count
"Given a `transitions` dataset (with one record per CYP per `:calendar-year`),
returns a `transitions-count` dataset with counts in `:transition-count`."
;; TODO: Move to `witan.send.adroddiad.transitions`
;; TODO: Extend to cope with existing `:transition-count`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this to count-transitions so we can have a separate sum-transitions?

Comment on lines +61 to +62
[{:keys [project-dir]
{historic-transitions-file :transitions} :file-inputs}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix formatting, tho this might be a bit too much clever destructuring. We should probably learn it though

The `projections-config` must contain key `:output-parameters`,
containing keys `:project-dir`, `:output-dir` and `:prefix`
(like when read from a projections config.edn by `read-projections-config`)."
[{{:keys [project-dir output-dir prefix]} :output-parameters}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

someone has definitely been reading up on destructuring




;;; # Components
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very different use of the word components from the rest of the clojure community https://github.com/stuartsierra/component

Just remove it as there is only one function after this?

Comment on lines +194 to +202
_ (when (and check-index-range
(or (and (some? min-index)
(> min-index min-ds-index))
(and (some? max-index)
(< max-index max-ds-index))))
(throw (ex-info (str "Input dataset temporal index range exceeds that specified.")
{:temporal-index-col temporal-index-col
:index-range-in-ds [min-ds-index max-ds-index]
:index-range-specified [min-index max-index]})))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice check. Perhaps worth factoring out to give it a name?

Comment on lines +211 to +220
(when check-key-unique
(let [non-unique-keys-ds (-> ds
(tc/group-by key-cols)
(tc/aggregate {:row-count tc/row-count})
(tc/select-rows #(-> % :row-count (> 1))))]
(when (< 0 (tc/row-count non-unique-keys-ds))
(throw (ex-info (str "Input dataset has non-unique keys.")
{:group-key group-key
:temporal-index-col temporal-index-col
:non-unique-keys-ds non-unique-keys-ds})))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be factored into a separate function as well, and might just be a useful predicate (so it wouldn't throw, but you could have another wrapper that did)

{:group-key group-key
:temporal-index-col temporal-index-col
:non-unique-keys-ds non-unique-keys-ds})))))
(-> (tc/full-join (-> ds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(-> ds) can just be ds here I think

(tc/update-columns temporal-index-col (partial map inc))
;; Add (blank) record for `min-index` for each group (to ensure in final dataset)
((fn [ds']
(tc/concat-copying (-> ds')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(-> ds') can just be ds' here I think

;;; # Components
;; Replacement for add-diff (though returns previous value rather than calculating pct-diff):
;; See test ns `witan.send.adroddiad.analysis-v2.alpha.summarise-domain-test` for rationale and examples.
(defn add-sparse-lag1-diff-by-group
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to discuss this function a bit more and I'm curious to know how well it performs with our typical data sizes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants