Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assign with "df.loc[index_value][column_name] = value" fails to assign properly #35743

Closed
Boris-Molina opened this issue Aug 15, 2020 · 4 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Usage Question

Comments

@Boris-Molina
Copy link

  • [Y] I have checked that this issue has not already been reported.

  • [Y] I have confirmed this bug exists on the latest version of pandas.

  • [N] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

This line of code:

self.strategy.loc[bar]['FundingRate'] = np.log(F1/F0)

Fails to assign the proper value . I think it either does nothing (leaves the original NaN from when the df "strategy" was created) or assigns a NaN. In any case, I added a set of flags to get the values input with this code:

print('Inputs for np.log F0={} F1={}'.format(F0,F1))
print('Index label bar={},  type={}'.format(bar, type(bar)))
value = np.log(F1/F0)
print('This should be the assigned value={}'.format(value))
self.strategy.loc[bar]['FundingRate'] = value
assert self.strategy.loc[bar]['FundingRate'] == value, 'Error PANDAS fails to assign {}, instead we find {}'.format(value, self.strategy.loc[bar]['FundingRate'])

Problem description

Assignment operations have started to fail to properly assign values after upgrading from v 1.0.4 to v1.0.5. While I need to run my code in Python 3.7/Pandas 1.0.5 due to other package dependencies, I recreated the problem with Python 3.8 and Pandas v1.1.0

The assignments work if I change to:

df.loc[bar, 'FundingRate'] = value

Or with:

df['FundingRate'].loc[bar] = value

I can't reproduce this problem in a simple setting. It only occurs in runtime on a system of +6k lines of code which has been working seamlessly with pervasive use of these types of assignments (df.loc[indexvalue][column_name] = value). Also, I can't do a dill.dump because there are tensorflow objects that are not serializable.

Runtime Output:

The assert fails. This is the printout of the code output:

Inputs for np.log F0=180.4753051802489 F1=180.4753051802489
Index label bar=2000-01-03 00:00:00,  type=<class 'pandas._libs.tslibs.timestamps.Timestamp'>
This should be the assigned value=0.0
Traceback (most recent call last):

  File "/media/WORK/Boris/LEM_Strategy/Software/LEM_Classes/main_nn_seq.py", line 150, in <module>
    run_stats = AA.run_strategy(consensus_type=consensus_type)

  File "/media/WORK/Boris/LEM_Strategy/Software/LEM_Classes/lib/nn_seqstrat.py", line 642, in run_strategy
    date, _ = self.get_date_price(bar, when='_CLOSE')

  File "/media/WORK/Boris/LEM_Strategy/Software/LEM_Classes/lib/backtester_sequential.py", line 152, in get_date_price
    assert self.strategy.loc[bar]['FundingRate'] == value, 'Error PANDAS fails to assign {}, instead we find {}'.format(value, self.strategy.loc[bar]['FundingRate'])

AssertionError: Error PANDAS fails to assign 0.0, instead we find nan

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.1.15-surface-linux-surface
Version : #8 SMP Thu Jun 27 12:03:55 EDT 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200814
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.50.1

This is the file used to create the environment where the error occurs.

########################################
#
# LEM_Strategy Conda Environment
#
# run: conda env create -f quant_conda_env.yml
#
########################################
name: quant  # COMMENT OUT TO CREATE TEST ENVIRONMENTS (conda env create -n test -f quant_conda_env.yml)
########################################
channels:
  - plotly
  - defaults
  - conda-forge
  #- bjrn        # Channel for google, V20 package (RAY)
########################################
dependencies:
  - python=3.8  # 
  - numpy
  - scipy
  - pandas  #==1.0.4
  - numba
  - cython
  - numexpr
  - statsmodels
  - scikit-learn
  - xlrd
  - xlsxwriter
  - ipywidgets
  - pathos
  - tensorflow-gpu  #==2.1.*
  - keras-gpu
  - tsfresh
  - pytables
  - pyzmq  # ZeroMQ: sockets
# Graphics and Plotting
  - plotly
  - plotly-orca
  - requests
  - matplotlib
  - seaborn
  - cufflinks-py
# Utilities
  - nb_conda_kernels  # For Jupyter
  - spyder-kernels    # Spyder
  - git               # Github support: custom packages and bug forks
# PIP Dependencies (otherwise installed via PIP: Use Conda to install to improve environment integrity over time)
  - yaml                  # For TPQOA (OANDA Wrapper)
  - ujson                 # For TPQOA (OANDA Wrapper)
 # - v20                   # OANDA API V2.0 (from brjn conda channel) (TPQOA installs from PIP))
 # - modin                 # Ray Tutorial  INSTALL MANUALLY
 # - opencv                # Ray Tutorial  INSTALL MANUALLY
 # - gym                   # Ray Tutorial  INSTALL MANUALLY
  - aiohttp               # Ray
  - colorama              # Ray
  - filelock              # Ray
  - redis                 # Ray
  - multidict             # Ray
  - yarl                  # Ray
  - async_timeout         # Ray
  - beautifulsoup4        # Ray
  - soupsieve             # Ray
 # - redis                 # Ray (get from "$pip_deps ray"  shell script. Output '<3.5.0,>=3.3.2') (Ray installs from PIP))
 # - py-spy>=0.2.0         # Ray (get from "$pip_deps ray"  shell script) (Ray installs from PIP))
 # - google                # Ray (from brjn conda channel) (Ray installs from PIP))
# Non-Conda Packages via PIP 
  - pip               # First install PIP PACKAGE MANAGER
  - pip: 
    - "git+git://github.com/yhilpisch/tpqoa.git"      # TPQ OANDA Wrapper
    - "git+git://github.com/yhilpisch/tstables.git"   # Time Series Tables with pandas=1.0 bug fix
    #- "git+git://github.com/tensorflow/model-optimization.git" #  TensorFlow Model Optimization  "import tensorflow_model_optimization as tfmot"
    - cardinality  
    - ray #  Ray: fast and simple framework for building and running distributed applications.
@Boris-Molina Boris-Molina added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 15, 2020
@Boris-Molina Boris-Molina changed the title BUG: BUG: Assign with df.loc[index_value][column_name] = value fails to assign properly Aug 15, 2020
@Boris-Molina Boris-Molina changed the title BUG: Assign with df.loc[index_value][column_name] = value fails to assign properly BUG: Assign with df.loc[index_value][column_name] = value fails to assign properly Aug 15, 2020
@Boris-Molina Boris-Molina changed the title BUG: Assign with df.loc[index_value][column_name] = value fails to assign properly BUG: Assign with "df.loc[index_value][column_name] = value" fails to assign properly Aug 15, 2020
@jreback
Copy link
Contributor

jreback commented Aug 15, 2020

you are using chained assignment which is explicitly never recommended : https://pandas.pydata.org/docs/user_guide/indexing.html#indexing-view-versus-copy

you should be getting a warning

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 15, 2020
@jreback jreback added this to the No action milestone Aug 15, 2020
@jreback jreback closed this as completed Aug 15, 2020
@Boris-Molina
Copy link
Author

Thanks, but there are no warnings and has been working seamlessly. Was this functionality ever deprecated?

I have +6k lines of code to manually check... and no warning?

The lack of warning is clearly a bug.

@jreback
Copy link
Contributor

jreback commented Aug 15, 2020

pls read the docs
this is related to the memory layout and is not guaranteed to work ever -

@Boris-Molina
Copy link
Author

OK, I understand.

I once went over the user guide section on indexing: maybe the warning about "Why does assignment fail when using chained indexing?" should be at the very top!

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants