Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sgi.insert_session crashes if first name has a space #810

Closed
zoldello opened this issue Jan 31, 2024 · 2 comments · Fixed by #822
Closed

sgi.insert_session crashes if first name has a space #810

zoldello opened this issue Jan 31, 2024 · 2 comments · Fixed by #822
Assignees
Labels
bug Something isn't working common

Comments

@zoldello
Copy link
Collaborator

zoldello commented Jan 31, 2024

Describe the bug
if experimenter has a space in hs/her first name in the NWB file, like "Kyu Hyun" (this is Kyu's real first name), spyglass crashes in sgi.insert_session. This is related is #809 and may be a band-aid for it.

To Reproduce

  1. create a nwb file with an experimenter named "Kyu Hyun Lee"
  2. sgi.insert_session crashes and leaving the data insertion in an complete state

Expected behavior
sgi.insert_session to not crash and populate relevant tables

Potential solution
I am trying to meet a deadline and no one else is identifying this issue. So, I will polish up the potential fix, write unit test and make a PR later in the week. A potential untested solution is common_lab.py -

def decompose_name(full_name: str) -> tuple:
    """Return the full, first and last name parts of a full name.

    Names capitalized. Decomposes either comma- or space-separated.

    Parameters
    ----------
    full_name: str
        The full name to decompose.

    Returns
    -------
    tuple
        A tuple of the full, first, last name parts.
        
        
    Raises
    ------
    KeyError
        When full_name is in an unsupported format
    
    """
    
    # Notes - 
    #(1) spyglass supports having space in a first name, but not the last name.
    #(2) regex was considered but abandoned as it is overkill
    
    full_name_trimmed = None if not full_name else full_name.strip()  
    
    if not full_name_trimmed or len(full_name_trimmed) == 0:
        raise ValueError(f"Name is none/null or empty, Skipping {full_name}")
    elif full_name_trimmed.count(", ") > 1 or full_name_trimmed.count(",") > 1:
        raise ValueError(
            f"Names has multiple commas. Skipping {full_name}"
        )
    elif full_name_trimmed.count(", ") == 0 and full_name_trimmed.count(' ') == 0:
        raise ValueError(
            f"One-syllable name not supported. Skipping {full_name}"
        )
    elif ", " in full_name_trimmed:
        last, first = full_name_trimmed.title().rsplit(", ", 1)
    else:
        first, last = full_name_trimmed.title().rsplit(" ", 1)

    full = f"{first} {last}"

    return full, first, last
@zoldello zoldello self-assigned this Jan 31, 2024
@edeno
Copy link
Collaborator

edeno commented Jan 31, 2024

Also related to #304

@CBroz1
Copy link
Member

CBroz1 commented Feb 1, 2024

I have some suggestions for the draft above to reduce the number of checks/raises

def decompose_name(full_name: str) -> tuple:
    """Return the full, first and last name parts of a full name.

    Names capitalized. Decomposes either comma- or space-separated.
    If name includes spaces, must use exactly one comma.

    Parameters
    ----------
    full_name: str
        The full name to decompose.

    Returns
    -------
    tuple
        A tuple of the full, first, last name parts.

    Raises
    ------
    ValueError
        When full_name is in an unsupported format
    """

    # default to empty string if None so that .count() doesn't fail
    full_trimmed = "" if not full_name else full_name.strip()

    # decide case early to avoid multiple checks
    # catches empty string, multiple commas, multispace w/o comma
    delim = (
        ", "
        if full_trimmed.count(", ") == 1
        else " "
        if full_trimmed.count(" ") == 1
        else None
    )

    # rsplit -> split bc exactly one enforced above
    # .title removed to avoid enforcing unwanted capitalization
    #     on names like "McDonald" -> "Mcdonald"
    # running split w/None delim still works
    parts = full_trimmed.split(delim)

    if delim is None or len(parts) != 2:  # catch unsupported format
        raise ValueError(
            f"Name has unsupported format for {full_name}. \n\t"
            + "Must use exactly one comma+space (i.e., ', ') or space.\n\t"
            + "And provide exactly two name parts.\n\t"
            + "For example, 'First Last' or 'Last1 Last2, First1 First2' "
            + "if name(s) includes spaces."
        )

    first, last = parts if delim == " " else parts[::-1]
    full = f"{first} {last}"

    return full, first, last

@edeno edeno added bug Something isn't working common labels Feb 2, 2024
CBroz1 added a commit to CBroz1/spyglass that referenced this issue Feb 8, 2024
@CBroz1 CBroz1 linked a pull request Feb 8, 2024 that will close this issue
4 tasks
@CBroz1 CBroz1 mentioned this issue Feb 8, 2024
4 tasks
@edeno edeno closed this as completed in #822 Feb 9, 2024
edeno pushed a commit that referenced this issue Feb 9, 2024
* Change dependencies. See details.

Across `pyproject.toml` and both `environment.yml`:
- Alphabetical sort to allow for easier comparison across toml/yaml files.
- Remove dependencies required by other dependencies to reduce build time.
- Move `black` to `test` dependencies. Could be `dev` instead?
- Move `click` to `test` dependencies. CLI should be its own package.
- Add ruff config to `pyproject.toml`. Flake8 replacement to allow removal of
  `setup.cfg`.
- Remove `tqdm`, as it is required by `datajoint`

Standard environment:
- Add comments for all dependencies specific to position environment to allow
  for easier comparison across yaml files.
- Remove `dask`, as it is required by `ghostipy`
- Remove `hdmf`, as it is required by `pynwb`
- Remove `pymysql`, as it is required by `datajoint`
- Remove `pyyaml`, as it is required by `sortingview`
- Remove `pydotplus` from `pip` section, as it is already in `conda` section

Position environment:
- Rename `_position` or `-position` to `_dlc` or `-dlc` to be more precise.
- Remove `ipython`, `numba`, and `tensorflow`, as they are all required by
  `deeplabcut`
- Add items present in standard environment but not in position environment.

* Add pybind11 for std env.yml

* Change dependencies. See details.

- `trajectory_analysis_tools` and `replay_trajectory_classification` are
  superceded by `non_local_detector`.
- `dask-cuda` not in use, per @edeno

* Fixes for running notebooks up to 41

* Further dependency adjustments following PR review @edeno

* Depedencies for panel

* #810
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working common
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants