Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove most "type: ignore"s (fix #738) #761

Merged
merged 18 commits into from
Aug 29, 2023
Merged

Remove most "type: ignore"s (fix #738) #761

merged 18 commits into from
Aug 29, 2023

Conversation

rec
Copy link
Contributor

@rec rec commented Aug 24, 2023

Description

Related Issues

Checklist

  • Is this code covered by new or existing unit tests or integration tests?
  • Did you run make test successfully?
  • Do new classes, functions, methods and parameters all have docstrings?
  • Were existing docstrings updated, if necessary?
  • Was external documentation updated, if necessary?

Additional Notes or Comments

@rec rec changed the title Types 2 Massive type fix Aug 24, 2023
@rec rec changed the title Massive type fix Massive type fix (fix #738) Aug 24, 2023
@codecov-commenter
Copy link

codecov-commenter commented Aug 24, 2023

Codecov Report

Patch coverage: 69.58% and project coverage change: -0.12% ⚠️

Comparison is base (dd3ef35) 78.54% compared to head (8edfbd8) 78.43%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #761      +/-   ##
==========================================
- Coverage   78.54%   78.43%   -0.12%     
==========================================
  Files          81       80       -1     
  Lines        5080     5160      +80     
==========================================
+ Hits         3990     4047      +57     
- Misses       1090     1113      +23     
Files Changed Coverage Δ
superduperdb/container/serializable.py 95.12% <ø> (ø)
superduperdb/db/base/download_content.py 86.53% <0.00%> (-1.70%) ⬇️
superduperdb/db/mongodb/data_backend.py 42.37% <0.00%> (-3.09%) ⬇️
superduperdb/server/client.py 74.81% <ø> (ø)
superduperdb/ext/openai/model.py 50.45% <28.57%> (+0.45%) ⬆️
superduperdb/cli/config.py 75.00% <33.33%> (+3.57%) ⬆️
superduperdb/container/component.py 80.00% <40.00%> (-7.50%) ⬇️
superduperdb/ext/sklearn/model.py 76.08% <42.85%> (-2.74%) ⬇️
superduperdb/db/mongodb/query.py 78.44% <46.66%> (-0.88%) ⬇️
superduperdb/data/cache/uri_cache.py 75.00% <50.00%> (ø)
... and 21 more

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rec
Copy link
Contributor Author

rec commented Aug 24, 2023

This fixes #738!


Once this is committed, we should strongly avoid adding new noqa or type: ignore statements to the codebase.

I believe that almost all the instances of type: ignore in the codebase are code that can't work - they are highlighted with TODO and an explanation of the problem.


We also have the issue of "color changing objects" - where a member or variable has different types at different stages in the program.

(I see this as a minor defect we should address eventually, as it makes everyone's IDE type results a bit confusing if nothing else.)

To satisfy the type checking for these, I am using asserts, example assert not isinstance(thing, str) - a lot of things start as possibly strs and get fixed.

It works fairly well and is documentary. I don't believe that any of these will actually trigger in practice, but if they do, the code was certain to crash on the next line.

I propose we only use asserts, only for this purpose. For all other error checking, we explicitly raise an exception.

@rec rec requested review from thejumpman2323 and blythed August 24, 2023 10:28
Copy link
Contributor

@nenb nenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great work, I'm far too excited about all of this! 🚀

I haven't approved simply because there are a couple of sections where I needed to defer to @blythed for further inspection (these are tagged in the PR).

To unlock this, we do need to shift a bunch of work into runtime ie assert statements. Long-term, we will want to remove these and have the code structured so that (hopefully) the type-checker can do all this work at validation time ie no run-time hit. Is this issue tracked somewhere (and could you create it if not).

Great work 💪

superduperdb/__init__.py Show resolved Hide resolved
superduperdb/__main__.py Outdated Show resolved Hide resolved
superduperdb/cli/config.py Show resolved Hide resolved
@@ -21,6 +21,8 @@ def wrapper(


class Job:
callable: t.Optional[t.Callable]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put the type hint on the instance parameter instead, and then we do not need this class variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then our documentation builder doesn't pick it up:

https://superduperdb.github.io/superduperdb/source/superduperdb.container.html#module-superduperdb.container.job

Whoa, is that out of date. :-o


class One:
    """dox"""
    def __init__(self, one: str): 
        self.one = one

@dc.dataclass
class Two:
    """dox"""
    two: str

class Three
    """dox"""
    three: str

    def __init__(self, three str): 
        self.three = three

In class One the class member is visible to mypy, but not to our documentation builder, though it does represent the constructor correctly.

In Two, everything is visible.

In Three, everything is visible, but there's duplication in our code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about these suggestion solutions?

superduperdb/container/vector_index.py Show resolved Hide resolved
superduperdb/container/model.py Outdated Show resolved Hide resolved
if self.train_y is not None:
out.append(self.train_y)
return out # type: ignore[return-value]
if isinstance(self.train_y, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is with self.train_X and self.train_y which I think are supposed to be strings, but in some code paths are treated as lists of strings? Or... the other way around?

superduperdb/container/model.py Show resolved Hide resolved
superduperdb/ext/torch/model.py Outdated Show resolved Hide resolved
@@ -86,6 +91,29 @@ def evaluating(self):
def train(self):
raise NotImplementedError

if t.TYPE_CHECKING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blythed For lines 94-116

Copy link
Collaborator

@blythed blythed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great to reduce the number of mypy quibbles, so that
we can actually heed it's warnings. What is the secret sauce here?
I see that assert isinstance... helps. Is there anything else?

There are some changes you made to fix the disagreement between typing and code. I think that the intention was the other way around. Good spot, however, and the fixes would have been correct under the assumptions you made.

@@ -203,11 +203,11 @@ def _predict_with_select(
):
ids = []
if overwrite:
query = select.select_ids
query = select.select_ids() # TODO: is this right?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is not correct. (Original is correct.) Was this flagged by mypy or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's an error in a declaration elsewhere, this code is correct as you say!

Fixed the declaration and this code.

outputs = [
self.encoder(
x
).encode() # very very very very asdfasdfasdfasdfasdfasdfasdfasdfa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URG, hahaha, I needed to break the line into parts so I could exactly find the error, and put the comment in to force ruff not to break it up.

The code is fine.

)
results = {}

# TODO: metrics is definitely not iterable
# TODO: metrics is definitely not iterable: this can't work
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is an error type should be t.List[Metric]

for r in validation_set.data
],
)
out = m(prediction_X, prediction_y) # type: ignore[index]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics should be a list.

@rec rec closed this Aug 24, 2023
@rec rec deleted the types-2 branch August 24, 2023 14:51
@rec rec reopened this Aug 24, 2023
@blythed
Copy link
Collaborator

blythed commented Aug 25, 2023

There was an issue with compute_classification_metrics - this is a deprecated function which can be removed.

@rec rec force-pushed the types-2 branch 2 times, most recently from 039cb36 to 7542b93 Compare August 25, 2023 09:28
@rec
Copy link
Contributor Author

rec commented Aug 25, 2023

Mostly the secret sauce was deducing the right types and putting them in.

The assert has exactly one purpose - to disambiguate a type when there is more than one possibility.

Occasionally mypy doesn't understand the type of a local variable. Often just reorganizing the expressions makes it work. Sometimes you need to declare the local variable the first time it is seen.

In one case I had a add a type annotation and I have no idea why.


About the duplication of the declaration of member variables: sphinx doesn't parse the code so it can't guess any type annotations from inside executable code, e.g. the constructor.

So without the top-level annotations, the sphinx documentation won't show class members at all.

Our trouble is that we have to serve three masters, Python, mypy and sphinx, and sphinx doesn't do anything fancy, it just imports stuff and then pulls the type annotations and doc comments off them.

mypy can look into the constructor, see the assignment of self.member, and deduce its type, but mypy cannot.

Using one of our three possible data classes solves all three problems neatly.


Two out of four compilations failed: 3.8 and 3.11. Weirdly, these are the two I test locally!

I think it's a glitch but I see too many of these...

@rec
Copy link
Contributor Author

rec commented Aug 25, 2023

So I need approval!

There are a couple more TODOs yet unanswered, but I'll push those at you in a later commit.

@rec rec force-pushed the types-2 branch 3 times, most recently from a06052b to 34faa1f Compare August 27, 2023 11:08
@rec
Copy link
Contributor Author

rec commented Aug 27, 2023

@blythed @nenb There are a couple more TODOs but let's get this out and I can address those if needs be later, they will probably vanish in the dead code binge.

@nenb
Copy link
Contributor

nenb commented Aug 27, 2023

@rec I'll need to wait for @blythed to approve this. There are a couple of parts that I can't tell are right or wrong, and I need @blythed to check this (@blythed see the parts where I have tagged you).

Sorry.

@rec rec force-pushed the types-2 branch 2 times, most recently from cbed66e to a499ff7 Compare August 27, 2023 14:36
@rec rec changed the title Massive type fix (fix #738) Remove most "type: ignore"s (fix #738) Aug 28, 2023
@rec rec force-pushed the types-2 branch 2 times, most recently from 80564c4 to bb98a09 Compare August 28, 2023 15:31
@rec
Copy link
Contributor Author

rec commented Aug 29, 2023

@blythed Let's get it out!!!

@rec rec merged commit 83c4224 into superduper-io:main Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants