Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in checking peak_position availability using select_runs #388

Closed
Jianyu010 opened this issue Jan 30, 2021 · 2 comments
Closed

Errors in checking peak_position availability using select_runs #388

Jianyu010 opened this issue Jan 30, 2021 · 2 comments

Comments

@Jianyu010
Copy link

Jianyu010 commented Jan 30, 2021

The select_runs function was behaving weirdly:

Every first time st.select_runs(run_mode, available = 'peak_positions') is run, with or without specified run_mode, it will return an error like:

st.select_runs(available='peak_positions_cnn',exclude_tags=('messy', 'bad'))
Fetching run info from MongoDB: 100%|██████████| 5208/5208 [00:00<00:00, 25052.25it/s]
Checking data availability: 100%|██████████| 2/2 [00:12<00:00,  6.35s/it]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-7aa359056c1e> in <module>
----> 1 st.select_runs(available='peak_positions_cnn',exclude_tags=('messy', 'bad'))

/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.8/site-packages/strax/run_selection.py in select_runs(self, run_mode, run_id, include_tags, exclude_tags, available, pattern_type, ignore_underscore)
    215             # available = ('data_type',)
    216             self.runs[d + '_available'] = d_available
--> 217             dsets[d + '_available'] = d_available
    218     for d in have_available:
    219         dsets = dsets[dsets[d + '_available']]

/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.8/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3161         else:
   3162             # set column
-> 3163             self._set_item(key, value)
   3164 
   3165     def _setitem_slice(self, key: slice, value):

/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.8/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   3237         """
   3238         self._ensure_valid_index(value)
-> 3239         value = self._sanitize_column(key, value)
   3240         NDFrame._set_item(self, key, value)
   3241 

/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.8/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
   3894 
   3895             # turn me into an ndarray
-> 3896             value = sanitize_index(value, self.index)
   3897             if not isinstance(value, (np.ndarray, Index)):
   3898                 if isinstance(value, list) and len(value) > 0:

/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.8/site-packages/pandas/core/internals/construction.py in sanitize_index(data, index)
    749     """
    750     if len(data) != len(index):
--> 751         raise ValueError(
    752             "Length of values "
    753             f"({len(data)}) "

ValueError: Length of values (5208) does not match length of index (4568)

The error message persists as the available argument is switched among 'peak_positions', 'peak_position_bla'. However if now I add in (a valid) run_mode argument the error message goes away and the output dataframe looks nice.

It is the same if I begin with a run_mode input, see the error message, and then delete the run_mode argument. After the error is gone I can add back the run_mode with no problem.

Also it is the same if I begin with one run_mode input, see the error message, and change to another run_mode.

The problem can be solved even if after seeing the error message I run exactly the same code in another block in my notebook...

@Jianyu010
Copy link
Author

Jianyu010 commented Jan 30, 2021

The following doesn't work. (And the error does not go away by rerun it as described)

st.set_config(dict(check_available=('peak_positions_gcn', 'peak_positions_mlp','peak_positions_cnn', 'peak_basics')))
some_mode=st.select_runs(run_mode='some_mode',exclude_tags=('messy', 'bad'))
mask_available= some_mode.peak_positions_gcn_available.values
some_mode=some_mode[mask_available]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-aeaf275b3c05> in <module>
      1 st.set_config(dict(check_available=('peak_positions_gcn', 'peak_positions_mlp','peak_positions_cnn', 'peak_basics')))
      2 some_modest.select_runs(run_mode='some_mode',exclude_tags=('messy', 'bad'))
----> 3 mask_available= some_mode.peak_positions_gcn_available.values
      4 some_mode=some_mode[mask_available]
      5 #selected = some_mode[(some_mode.number>12127)]

/opt/XENONnT/anaconda/envs/XENONnT_development/lib/python3.8/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5460             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5461                 return self[name]
-> 5462             return object.__getattribute__(self, name)
   5463 
   5464     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'peak_positions_gcn_available'

@JoranAngevaare
Copy link
Contributor

The example of course should have been:

st.set_context_config(
	dict(
			check_available=(
					'peak_positions_gcn', 
					'peak_positions_mlp',
					'peak_positions_cnn', 
					'peak_basics')
		))
some_mode=st.select_runs(run_mode='some_mode', exclude_tags=('messy', 'bad'))
mask_available = some_mode.peak_positions_gcn_available.values
some_mode_with_pp=some_mode[mask_available]
some_mode_with_pp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants