Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove parse_sequence_from_file #110

Merged
merged 3 commits into from
Jan 21, 2025
Merged

Remove parse_sequence_from_file #110

merged 3 commits into from
Jan 21, 2025

Conversation

davidbbeyer
Copy link
Contributor

@davidbbeyer davidbbeyer commented Jan 20, 2025

Removes parse_sequence_from_file.

This function is no longer needed to parse sequence when pmb.df have been read from file since the parsing is done when using pmb.df #30

@pm-blanco
Copy link
Collaborator

pm-blanco commented Jan 21, 2025

@davidbbeyer thank you for taking care of this. Revisiting this method, I see that it was introduced as a temporary fix in calculate_HH because sequence was not a list when retrieved from pyMBE dataframes that had been read from file. However, we fixed this issue in #30 and now sequence should always be a list regardless if the pyMBE dataframe was read or not from file.

Since right now this function is simply adding non-functional lines of code to the library I suggest to deprecate it instead of adding coverage a dedicated CI test for it. What do you think @davidbbeyer?

I tested that calculate_HH works without needing parse_sequence_from_file by doing the following modifications:

  1. in pyMBE.py
def calculate_HH(self, molecule_name, pH_list=None, pka_set=None):
        if pH_list is None:
            pH_list=np.linspace(2,12,50)
        if pka_set is None:
            pka_set=self.get_pka_set()
        self.check_pka_set(pka_set=pka_set)
        charge_number_map = self.get_charge_number_map()
        Z_HH=[]
        for pH_value in pH_list:
            Z=0
            index = self.df.loc[self.df['name'] == molecule_name].index[0].item()
            residue_list = self.df.at [index,('residue_list','')]
            sequence = self.df.at [index,('sequence','')]
            if np.any(pd.isnull(sequence)):
                # Molecule has no sequence
                for residue in residue_list:
                    list_of_particles_in_residue = self.search_particles_in_residue(residue)
                    for particle in list_of_particles_in_residue:
                        if particle in pka_set.keys():
                            if pka_set[particle]['acidity'] == 'acidic':
                                psi=-1
                            elif pka_set[particle]['acidity']== 'basic':
                                psi=+1
                            else:
                                psi=0
                            Z+=psi/(1+10**(psi*(pH_value-pka_set[particle]['pka_value'])))
                Z_HH.append(Z)
            else:
                """
                This lines are no longer needed
                # Molecule has a sequence
                if not isinstance(sequence, list):
                    # If the df has been read by file, the sequence needs to be parsed.
                    sequence = self.parse_sequence_from_file(sequence=sequence)
                """
                for name in sequence:
                    if name in pka_set.keys():
                        if pka_set[name]['acidity'] == 'acidic':
                            psi=-1
                        elif pka_set[name]['acidity']== 'basic':
                            psi=+1
                        else:
                            psi=0
                        Z+=psi/(1+10**(psi*(pH_value-pka_set[name]['pka_value'])))
                    else:
                        state_one_type = self.df.loc[self.df['name']==name].state_one.es_type.values[0]
                        Z+=charge_number_map[state_one_type]
                Z_HH.append(Z)

        return Z_HH
  1. In samples/plot_branched_polyampholyte.py
pmb.read_pmb_df("df.csv")
""" Read the definitions from file instead than defining them on the same script
# Acidic particle
pmb.define_particle(
    name = "A",
    acidity = "acidic",
    pka = 4,
    sigma = 1*pmb.units('reduced_length'),
    epsilon = 1*pmb.units('reduced_energy'))
    
# Basic particle
pmb.define_particle(
    name = "B",
    acidity = "basic",
    pka = 9,
    sigma = 1*pmb.units('reduced_length'),
    epsilon = 1*pmb.units('reduced_energy'))

# Define different residues
pmb.define_residue(
    name = "Res_1",
    central_bead = "I",
    side_chains = ["A","B"])
    
pmb.define_residue(
    name = "Res_2",
    central_bead = "I",
    side_chains = ["Res_1"])

# Define the molecule
pmb.define_molecule(
    name = "polyampholyte",
    residue_list = 5*["Res_1"] + 5*["Res_2"])
"""
  1. After these modification, the pyMBE pipeline for the polyampholyte sample case still runs fine:
python3 branched_polyampholyte.py 
python3 analyze_time_series.py --data_folder time_series/branched_polyampholyte 
python3 plot_branched_polyampholyte.py 

@davidbbeyer
Copy link
Contributor Author

@pm-blanco Even better, then we can just remove it.

@pm-blanco
Copy link
Collaborator

Agreed. Can you please do it @davidbbeyer ?

@davidbbeyer
Copy link
Contributor Author

Done.

@pm-blanco pm-blanco changed the title Add parsing test Remove parse_sequence_from_file Jan 21, 2025
Copy link
Collaborator

@pm-blanco pm-blanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@pm-blanco pm-blanco merged commit 6149dc1 into pyMBE-dev:main Jan 21, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants