-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis v2 alpha 01: io and new lagged diff for sparse time series #100
base: main
Are you sure you want to change the base?
Conversation
Previously relied on deps of calling project to specify some of the required libs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there is an easier way to do the add-diff. I certainly think we should look at factoring a few things out to give them names and make the main body clearer.
(defn transitions->transitions-count | ||
"Given a `transitions` dataset (with one record per CYP per `:calendar-year`), | ||
returns a `transitions-count` dataset with counts in `:transition-count`." | ||
;; TODO: Move to `witan.send.adroddiad.transitions` | ||
;; TODO: Extend to cope with existing `:transition-count`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename this to count-transitions
so we can have a separate sum-transitions
?
[{:keys [project-dir] | ||
{historic-transitions-file :transitions} :file-inputs}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix formatting, tho this might be a bit too much clever destructuring. We should probably learn it though
The `projections-config` must contain key `:output-parameters`, | ||
containing keys `:project-dir`, `:output-dir` and `:prefix` | ||
(like when read from a projections config.edn by `read-projections-config`)." | ||
[{{:keys [project-dir output-dir prefix]} :output-parameters}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
someone has definitely been reading up on destructuring
|
||
|
||
|
||
;;; # Components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very different use of the word components from the rest of the clojure community https://github.com/stuartsierra/component
Just remove it as there is only one function after this?
_ (when (and check-index-range | ||
(or (and (some? min-index) | ||
(> min-index min-ds-index)) | ||
(and (some? max-index) | ||
(< max-index max-ds-index)))) | ||
(throw (ex-info (str "Input dataset temporal index range exceeds that specified.") | ||
{:temporal-index-col temporal-index-col | ||
:index-range-in-ds [min-ds-index max-ds-index] | ||
:index-range-specified [min-index max-index]}))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice check. Perhaps worth factoring out to give it a name?
(when check-key-unique | ||
(let [non-unique-keys-ds (-> ds | ||
(tc/group-by key-cols) | ||
(tc/aggregate {:row-count tc/row-count}) | ||
(tc/select-rows #(-> % :row-count (> 1))))] | ||
(when (< 0 (tc/row-count non-unique-keys-ds)) | ||
(throw (ex-info (str "Input dataset has non-unique keys.") | ||
{:group-key group-key | ||
:temporal-index-col temporal-index-col | ||
:non-unique-keys-ds non-unique-keys-ds}))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be factored into a separate function as well, and might just be a useful predicate (so it wouldn't throw, but you could have another wrapper that did)
{:group-key group-key | ||
:temporal-index-col temporal-index-col | ||
:non-unique-keys-ds non-unique-keys-ds}))))) | ||
(-> (tc/full-join (-> ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(-> ds)
can just be ds
here I think
(tc/update-columns temporal-index-col (partial map inc)) | ||
;; Add (blank) record for `min-index` for each group (to ensure in final dataset) | ||
((fn [ds'] | ||
(tc/concat-copying (-> ds') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(-> ds') can just be ds' here I think
;;; # Components | ||
;; Replacement for add-diff (though returns previous value rather than calculating pct-diff): | ||
;; See test ns `witan.send.adroddiad.analysis-v2.alpha.summarise-domain-test` for rationale and examples. | ||
(defn add-sparse-lag1-diff-by-group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to discuss this function a bit more and I'm curious to know how well it performs with our typical data sizes
WiP, hence under an "alpha" directory.
Main development for review here is proposed replacement for
add-diff
,add-sparse-lag1-diff-by-group
. For this, request review of:witan.send.adroddiad.analysis-v2.alpha.summarise-domain-test
add-sparse-lag1-diff-by-group
, described indefn
docstring inwitan.send.adroddiad.analysis-v2.alpha.summarise-domain
.add-sparse-lag1-diff-by-group
illustrating capabilities and usage.add-sparse-lag1-diff-by-group
.