Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TS typing for Dataframe #2382

Merged
merged 7 commits into from
Aug 17, 2021
Merged

TS typing for Dataframe #2382

merged 7 commits into from
Aug 17, 2021

Conversation

bkmartinjr
Copy link
Contributor

@bkmartinjr bkmartinjr commented Aug 15, 2021

This PR includes:

  • TS typing for Dataframe
  • a few minor bug fixes turned up by static type analysis
  • Partial upstream typing of Dataframe users. (the PR was getting big, so I will finish the rest of the Dataframe-consumer typing in another PR).

Notable API changes:

  • summarize() and histogram() are no longer polymorphic and have separate functions for categorical, continuous and by-category (eg, summarizeContinuous(), summarizeCategorical(), etc).
  • LabelIndex.getOffset() and getOffsets() returns -1 instead of undefined if the label is unknown.

This is another partial fix for #2366

@codecov
Copy link

codecov bot commented Aug 15, 2021

Codecov Report

Merging #2382 (54142b1) into main (3fdf5ca) will not change coverage.
The diff coverage is n/a.

❗ Current head 54142b1 differs from pull request most recent head ec89306. Consider uploading reports for the commit ec89306 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2382   +/-   ##
=======================================
  Coverage   68.23%   68.23%           
=======================================
  Files         127      127           
  Lines       12259    12259           
=======================================
  Hits         8365     8365           
  Misses       3894     3894           
Flag Coverage Δ
frontend 66.99% <ø> (ø)
javascript 66.99% <ø> (ø)
smokeTest ∅ <ø> (∅)
smokeTestAnnotations ∅ <ø> (?)
unitTest 66.99% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3fdf5ca...ec89306. Read the comment docs.

@bkmartinjr bkmartinjr marked this pull request as ready for review August 15, 2021 23:29
@bkmartinjr bkmartinjr self-assigned this Aug 16, 2021
Copy link
Contributor

@tihuan tihuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!! Thanks for the massive PR, Bruce 🤩 👏 !! This looks amazing!

Just a few nits I could find 😆 Thanks again!

@@ -141,17 +163,13 @@ class Dataframe {
Object.freeze(this);
}

// eslint-disable-next-line @typescript-eslint/explicit-module-boundary-types --- FIXME: disabled temporarily on migrate to TS.
/** @internal */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ayyy @internal 🔥 !!

return offset >= 0 && offset < length;
const has = function has(rlabel: LabelType) {
const offset = getRowOffset(rlabel);
return offset !== undefined && offset >= 0 && offset < length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: do we need to check offset !== undefined if getRowOffset() always returns a number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - that is a BUG!!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hurray! Do we need code change here? It seems to still be offset !== undefined && offset >= 0 && offset < 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only because the comment refers to the outdated code. Take a look at the most recent commit to the PR, not the obsolete commit :-)

// eslint-disable-next-line @typescript-eslint/explicit-module-boundary-types, @typescript-eslint/no-explicit-any -- - FIXME: disabled temporarily on migrate to TS.
isubsetMask(rowMask: any, colMask = null, withRowIndex = null) {
isubsetMask(
rowMask: Uint8Array | boolean[] | null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!! Types are so helpful here 🤩 👏

return Number.isInteger(i) && i >= 0 && i < this.maxOffset ? i : undefined;
return Number.isInteger(offset) && offset >= 0 && offset < this.maxOffset
? offset
: undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to return -1 instead of undefined here to have consistency with getOffset() above ?

Copy link
Contributor Author

@bkmartinjr bkmartinjr Aug 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't. -1 is a legal label.

Remember:

  • offsets are numeric indices. -1 is "no such offset"
  • labels are any string|bool|number. undefined is "no such label"

This is documented in the comments for getOffset() and getLabel()

What makes this particular class confusing is that it is the identity label index (aka offset===label).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh got it! Thanks for the explanation 🎉 Makes sense to me now 😄 !

// eslint-disable-next-line @typescript-eslint/no-explicit-any --- FIXME: disabled temporarily on migrate to TS.
return arr.map((i: any) => this.getLabel(i));
this.getLabel = function getLabel(offset: number) {
return Number.isInteger(offset) ? labels[offset] : undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here for undefined vs. -1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer as above. See the abstract method definition.

// eslint-disable-next-line @typescript-eslint/no-explicit-any --- FIXME: disabled temporarily on migrate to TS.
return arr.map((i: any) => this.getLabel(i));
this.getLabel = function getLabel(offset: number) {
return Number.isInteger(offset) ? rindex[offset] : undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question here for undefined vs. -1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer as above. See the abstract method definition.

}

// eslint-disable-next-line @typescript-eslint/no-explicit-any --- a legitimate use of any.
export type AnyFunction<T = unknown> = (...args: any[]) => T;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious if unknown[] would work here instead of any[]?

https://blog.logrocket.com/when-to-use-never-and-unknown-in-typescript-5e4d6c5799ad/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it won't work as you can't call the memoized function without a type error. See, for example, how TS defines the signatures to Function.call, Function.bind, etc.

What would be better is to improve the type pass through for the memoize() function, which is the only user of this type. I'll fix that for both memoize() and callOnceLazy().

Eg,

export function memoize<
  T extends (...args: any[]) => any = (...args: any[]) => any
>(
  fn: T,
  hashFn: (...args: Parameters<T>) => string,
  maxResultsCached = -1
): (...args: Parameters<T>) => ReturnType<T> {
    ...
}

Side note: fixing this pointed out a small typing bug! Yay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps. if you haven't already done so, it is worth studying the type signatures for stuff like Function.bind, Function.apply, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh didn't think of checking out those signatures! Thanks for the tip 👍

Using Parameters<T> SGTM!

Copy link
Collaborator

@MillenniumFalconMechanic MillenniumFalconMechanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Bruce!

@bkmartinjr bkmartinjr merged commit 45cecad into main Aug 17, 2021
@bkmartinjr bkmartinjr deleted the bkmartinjr/2366-dataframe branch August 17, 2021 17:43
bkmartinjr pushed a commit that referenced this pull request Aug 19, 2021
This was referenced Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants