Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#269: fixed token class span output #270

Merged
merged 23 commits into from
Nov 21, 2024

Conversation

MarleneKress79789
Copy link
Collaborator

@MarleneKress79789 MarleneKress79789 commented Oct 31, 2024

All Submissions:

  • Is the title of the Pull Request correct?
  • Is the title of the corresponding issue correct?
  • Have you updated the changelog?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Are you mentioning the issue which this PullRequest fixes ("Fixes...")
  • Before you merge don't forget to run tests in AWS CodeBuild, by adding [CodeBuild] to the commit message

fixes #269
fixes #272
fixes #273

result_output = Output(result[0].rows)
mock_base_model_factory: Union[ModelFactoryProtocol, MagicMock] = create_autospec(ModelFactoryProtocol,
_name="mock_base_model_factory")
number_of_intendet_used_models = params.expected_model_counter# todo is this always same?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the test case, so having the number in the parameters is correct

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the name expected_model_counters is a bit odd. Maybe expected_model_calls or something like thst

class ErrorNotCachedMultipleModelMultipleBatch:
"""
not cached error, multiple model, multiple batch
"""
expected_model_counter = 0
expected_model_counter = 1
Copy link
Collaborator

@tkilias tkilias Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve docstring

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve docstrings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new issue for docstring improvements #276

class ErrorNotCachedSingleModelMultipleBatch:
"""
not cached error, single model, multiple batch
"""
expected_model_counter = 1
expected_model_counter = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve doc string

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve docstring

entity_covered_text, entity_type, score, entity_docid, entity_char_begin, entity_char_end,
error_msg)]

#todo if use in all tests
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a permanent todo, please create a ticket

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

score=0.1
error_msg = None

#todo comment explain entity/token naming mess
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is with this todo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


def run(ctx: UDFContext):
udf.run(ctx)

class ErrorOnPredictionMultipleModelMultipleBatch:
"""
not cached error, multiple model, multiple batch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve docstring


def run(ctx: UDFContext):
udf.run(ctx)

class ErrorOnPredictionSingleModelMultipleBatch:
"""
error on prediction, single model, multiple batch,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve docstring

entity_type="ENTITY_TYPE"
score=0.1
error_msg = None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type hints to functions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also added to new ticket

while the type/class of the found token is called "entity_group".
unless aggregation_strategy == "none", then the type/class of the found
token is called "entity" in the model output.
returns a list of number_entities times the model output row.
Copy link
Collaborator

@tkilias tkilias Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use proper docstring formatting for return values, such that we use a unified style, see other projects. See https://google.github.io/styleguide/pyguide.html#383-functions-and-methods

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also added to new ticket


def make_number_of_strings(input_str: str, desired_number: int):
"""
returns desired number of "input_strX", where X is counting up to desired_number.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring style

@@ -110,6 +119,66 @@ def create_mock_metadata(udf_wrapper):
)
return meta

# todo these functions should be reusable for the other unit tests. should we move them to a utils file or something?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need them, yes it is probably a good idea to move them. However, I suggest you create a ticket and do it in another PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to #274

token_docid = 1
start = 0
end = 20
bfs_conn1, bfs_conn2 = make_number_of_strings(bucketfs_conn, 2) # todo why two in this test case? multiple model could still be same bfs con right?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably no specific reason might make sense to use the same bfsconn

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -13,34 +13,33 @@ class MultipleModelMultipleBatchComplete:
data_size = 2
n_entities = 3

bfs_conn1, bfs_conn2 = make_number_of_strings(bucketfs_conn, 2) # todo why two in this test case? multiple model could still be same bfs con right?
sub_dir1, sub_dir2 = make_number_of_strings(sub_dir, 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for another PR, also multiple subdir might be not necessary

@MarleneKress79789 MarleneKress79789 merged commit 4091c57 into main Nov 21, 2024
4 checks passed
@MarleneKress79789 MarleneKress79789 deleted the bug/#269_fix_token_class_span_output branch November 21, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants