Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer #30

Closed
ktasha45 opened this issue Sep 19, 2022 · 5 comments · Fixed by #35
Closed

Comments

@ktasha45
Copy link

input:
python sortgs.py --kw "recommender system survey" OR "recommentation system survey" --startyear 2021

output:
Loading next 10 results
Loading next 20 results
Loading next 30 results
Loading next 40 results
Loading next 50 results
Loading next 60 results
Loading next 70 results
Loading next 80 results
Loading next 90 results
Loading next 100 results
Traceback (most recent call last):
File "C:\Users\ktash\Downloads\sort-google-scholar-master\sort-google-scholar-master\sortgs.py", line 313, in
main()
File "C:\Users\ktash\Downloads\sort-google-scholar-master\sort-google-scholar-master\sortgs.py", line 285, in main
data['cit/year']=data['cit/year'].round(0).astype(int)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5815, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 418, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply
applied = getattr(b, f)(**kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 591, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1309, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1257, in astype_array
values = astype_nansafe(values, dtype, copy=copy)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1168, in astype_nansafe
return astype_float_to_int_nansafe(arr, dtype, copy)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1213, in astype_float_to_int_nansafe
raise IntCastingNaNError(
pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

My OS is windows. I run code by anaconda.
Thank you.

@HmzaMalik
Copy link

input: python sortgs.py --kw "recommender system survey" OR "recommentation system survey" --startyear 2021

output: Loading next 10 results Loading next 20 results Loading next 30 results Loading next 40 results Loading next 50 results Loading next 60 results Loading next 70 results Loading next 80 results Loading next 90 results Loading next 100 results Traceback (most recent call last): File "C:\Users\ktash\Downloads\sort-google-scholar-master\sort-google-scholar-master\sortgs.py", line 313, in main() File "C:\Users\ktash\Downloads\sort-google-scholar-master\sort-google-scholar-master\sortgs.py", line 285, in main data['cit/year']=data['cit/year'].round(0).astype(int) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5815, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 418, in astype return self.apply("astype", dtype=dtype, copy=copy, errors=errors) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply applied = getattr(b, f)(**kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py", line 591, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1309, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1257, in astype_array values = astype_nansafe(values, dtype, copy=copy) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1168, in astype_nansafe return astype_float_to_int_nansafe(arr, dtype, copy) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py", line 1213, in astype_float_to_int_nansafe raise IntCastingNaNError( pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

My OS is windows. I run code by anaconda. Thank you.

Greetings,
https://stackoverflow.com/questions/48511484/data-type-conversion-error-valueerror-cannot-convert-non-finite-values-na-or
Hereby this issue is addressed. There are NaN values in the dataset, specifically at 'cit/year' column, we can place "fillna" function in the code file, that is "sortgs.py" line number 285. i.e.
285-> data['cit/year']=data['cit/year'].fillna(0).round(0).astype(int)
To be on safe side, I have also added "fillna" to line number 280 where data frame is created i.e.
279-> data = pd.DataFrame(list(zip(author, title, citations, year, publisher, venue, links)), index = rank[1:],
280-> columns=['Author', 'Title', 'Citations', 'Year', 'Publisher', 'Venue', 'Source']).fillna(0)

it may be 'cit/year' column is later created from the data set, any division by 0 could generate an error.

Hope it helps.

@WittmannF
Copy link
Owner

Tried replicating here but didn't get your output:

❯ python sortgs.py --kw "recommender system survey" OR "recommentation system survey" --startyear 2021
Loading next 10 results
Loading next 20 results
Loading next 30 results
Loading next 40 results
Loading next 50 results
Loading next 60 results
Loading next 70 results
Loading next 80 results
Loading next 90 results
Loading next 100 results
                                    Author  ... cit/year
Rank                                        ...
39                 Koren, S Rendle, R Bell  ...      539
59      Ji, S Pan, E Cambria, P Marttinen…  ...      308
73       Reig, A Forner, J Rimola, J Ferre  ...      264
46        Chaudhari, V Mithal, G Polatkan…  ...      156
55    Li, Z Wen, Z Wu, S Hu, N Wang, Y Li…  ...      121
...                                    ...  ...      ...
23                         Raj, VG Renumol  ...       14
58             Chen, M Jiang, F Jia, G Liu  ...       12
83                    Kwon, J Park, JY Son  ...        6
71       Shen, J Li, MR Bouadjenek, Z Mai…  ...        1
38          Chakraoui, A Elkalay, N Mouhni  ...        0

[100 rows x 8 columns]

@WittmannF
Copy link
Owner

For now, a quick fix can be done by adding a fillna(0). But ideally would be nice to check where's the source of such NaN (missing number of citations or count of years?).

@kevinsmia1939
Copy link

I can replicate the error with this
python3 sortgs.py --kw "ionic liquid water splitting electrolysis" --sortby "cit/year"

@WittmannF
Copy link
Owner

WittmannF commented Nov 15, 2023

Thanks @kevinsmia1939 , found the issue. It seems there are references "from the future" (which I find weird), the following result from your keyword made the error to be raised: https://www.sciencedirect.com/science/article/abs/pii/S001623612302793X

It is from 2024. This is messing up with the cit/year calculation. Let me fix that.

WittmannF added a commit that referenced this issue Nov 15, 2023
WittmannF added a commit that referenced this issue Nov 15, 2023
- Clip reference year to current year when calculating cit/year to avoid zero division in cit/year
@WittmannF WittmannF linked a pull request Nov 15, 2023 that will close this issue
WittmannF added a commit that referenced this issue Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants