Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix name parsing #979

Closed
wants to merge 1 commit into from
Closed

Conversation

khl02007
Copy link
Collaborator

Description

My name is not parsed correctly when it is given by the NWB file as Lee, Kyu Hyun because there is a space within my first name. This change tells the code to use whatever is on the left of , as the last name and whatever is on the right as the first name, regardless of the presence of space within the first or the last name.

Checklist:

  • This PR should be accompanied by a release: (yes/no/unsure)
  • If release, I have updated the CITATION.cff
  • This PR makes edits to table definitions: (yes/no)
  • If table edits, I have included an alter snippet for release notes.
  • I have updated the CHANGELOG.md with PR number and description.
  • I have added/edited docs/notebooks to reflect the changes

@khl02007 khl02007 requested review from CBroz1, edeno and samuelbray32 May 18, 2024 00:10
samuelbray32
samuelbray32 previously approved these changes May 20, 2024
@CBroz1
Copy link
Member

CBroz1 commented May 20, 2024

Sorry this wasn't working for you. I'm confused, though, about where the prev version was breaking down. I see the same outcome when I load in the diff versions

image

@khl02007
Copy link
Collaborator Author

@CBroz1 This is the error that I got. Maybe I should have changed somehwhere else?

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 1
----> 1 sgi.insert_sessions(nwb_file_name, rollback_on_fail=True, raise_err=True)

File ~/repos/spyglass/src/spyglass/data_import/insert_sessions.py:77, in insert_sessions(nwb_file_names, rollback_on_fail, raise_err)
     75 copy_nwb_link_raw_ephys(nwb_file_name, out_nwb_file_name)
     76 Nwbfile().insert_from_relative_file_name(out_nwb_file_name)
---> 77 return populate_all_common(
     78     out_nwb_file_name,
     79     rollback_on_fail=rollback_on_fail,
     80     raise_err=raise_err,
     81 )

File ~/repos/spyglass/src/spyglass/common/populate_all_common.py:146, in populate_all_common(nwb_file_name, rollback_on_fail, raise_err)
    122 table_lists = [
    123     [  # Tables that can be inserted in a single transaction
    124         Session,
   (...)
    142     ],
    143 ]
    145 for tables in table_lists:
--> 146     single_transaction_make(
    147         tables=tables,
    148         nwb_file_name=nwb_file_name,
    149         raise_err=raise_err,
    150         error_constants=error_constants,
    151     )
    153 err_query = InsertError & error_constants
    154 nwbfile_query = Nwbfile & {"nwb_file_name": nwb_file_name}

File ~/repos/spyglass/src/spyglass/common/populate_all_common.py:87, in single_transaction_make(tables, nwb_file_name, raise_err, error_constants)
     85 except Exception as err:
     86     if raise_err:
---> 87         raise err
     88     log_insert_error(
     89         table=table, err=err, error_constants=error_constants
     90     )

File ~/repos/spyglass/src/spyglass/common/populate_all_common.py:84, in single_transaction_make(tables, nwb_file_name, raise_err, error_constants)
     82 for pop_key in (key_source & file_restr).fetch("KEY"):
     83     try:
---> 84         table().make(pop_key)
     85     except Exception as err:
     86         if raise_err:

File ~/repos/spyglass/src/spyglass/common/common_session.py:87, in Session.make(self, key)
     84 Lab().insert_from_nwbfile(nwbf)
     86 logger.info("LabMember...")
---> 87 LabMember().insert_from_nwbfile(nwbf)
     89 logger.info("Subject...")
     90 Subject().insert_from_nwbfile(nwbf)

File ~/repos/spyglass/src/spyglass/common/common_lab.py:68, in LabMember.insert_from_nwbfile(cls, nwbf)
     64 # each person is by default the member of their own LabTeam
     65 # (same as their name)
     67 full_name, _, _ = decompose_name(experimenter)
---> 68 LabTeam.create_new_team(
     69     team_name=full_name, team_members=[full_name]
     70 )

File ~/repos/spyglass/src/spyglass/common/common_lab.py:195, in LabTeam.create_new_team(cls, team_name, team_members, team_description)
    193 member_list = []
    194 for team_member in team_members:
--> 195     LabMember.insert_from_name(team_member)
    196     query = (
    197         LabMember.LabMemberInfo() & {"lab_member_name": team_member}
    198     ).fetch("google_user_name")
    199     if not query:

File ~/repos/spyglass/src/spyglass/common/common_lab.py:81, in LabMember.insert_from_name(cls, full_name)
     72 @classmethod
     73 def insert_from_name(cls, full_name):
     74     """Insert a lab member by name.
     75 
     76     Parameters
   (...)
     79         The name to be added.
     80     """
---> 81     _, first, last = decompose_name(full_name)
     82     cls.insert1(
     83         dict(
     84             lab_member_name=f"{first} {last}",
   (...)
     88         skip_duplicates=True,
     89     )

File ~/repos/spyglass/src/spyglass/common/common_lab.py:294, in decompose_name(full_name)
    291 parts = full_trimmed.split(delim)
    293 if delim is None or len(parts) != 2:  # catch unsupported format
--> 294     raise ValueError(
    295         f"Name has unsupported format for {full_name}. \n\t"
    296         + "Must use exactly one comma+space (i.e., ', ') or space.\n\t"
    297         + "And provide exactly two name parts.\n\t"
    298         + "For example, 'First Last' or 'Last1 Last2, First1 First2' "
    299         + "if name(s) includes spaces."
    300     )
    302 first, last = parts if delim == " " else parts[::-1]
    303 full = f"{first} {last}"

ValueError: Name has unsupported format for Kyu Hyun Lee. 
	Must use exactly one comma+space (i.e., ', ') or space.
	And provide exactly two name parts.
	For example, 'First Last' or 'Last1 Last2, First1 First2' if name(s) includes spaces.

@khl02007
Copy link
Collaborator Author

The experimenter field of this NWB file has two names: experimenter: ['Lee, Kyu Hyun' 'Adenekan, Philip']. Could this be the problem?

@CBroz1
Copy link
Member

CBroz1 commented May 20, 2024

It looks to me like common/common_lab.py:68 is assuming (a) only one experimenter in the file, and (b) using the lab member name as the team name - which wouldn't make sense for larger teams. I think that's where we would need to target edits. For A, that seems as easy as looping over members, but B is tougher. Is there a team name that could be pulled from the nwb file?

@CBroz1 CBroz1 dismissed samuelbray32’s stale review May 20, 2024 16:36

PR does not address motivating issue

@khl02007
Copy link
Collaborator Author

@CBroz1

  1. I think we can just use the first experimenter as the team name. Thoughts?
  2. I now realize that my change doesn't address the issue as you point out. I thought that this had to do with parsing Lee, Kyu Hyun since that is what the NWB file contains but what is passed to decompose_name is actually Kyu Hyun Lee which leads to value error for both this and the previous version. Is this because team names are supposed to be first last instead of last, first?

@CBroz1
Copy link
Member

CBroz1 commented May 20, 2024

I think we can just use the first experimenter as the team name. Thoughts?

Ideally, the database would reflect the actual structure of the team involved, rather than just assigning each member their own team, but there's no reason we can't

I now realize that my change doesn't address the issue as you point out. I thought that this had to do with parsing `Lee, Kyu Hyun` since that is what the NWB file contains but what is passed to `decompose_name` is actually `Kyu Hyun Lee` which leads to value error for both this and the previous version. Is this because team names are supposed to be `first last` instead of `last, first`?

Yes, a byproduct of the current design is 'names with spaces must use comma+space delimiter'. Can that be fixed on the nwb side before this stage? For this specific case, we could change the logic to accommodate a first name with a space, but that would exclude last names with spaces.

@khl02007
Copy link
Collaborator Author

Closing and moving the discussion to #983

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants