Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: unsupported operand type(s) for +: 'float' and 'str' #45

Open
jteijema opened this issue Aug 6, 2024 · 4 comments
Open

TypeError: unsupported operand type(s) for +: 'float' and 'str' #45

jteijema opened this issue Aug 6, 2024 · 4 comments

Comments

@jteijema
Copy link
Member

jteijema commented Aug 6, 2024

C:\Users\5927226\Downloads> asreview data compose output.xlsx -r '.\asreview_dataset_all_Search update 2023.xlsx' -u .\combined_update_dedup.xlsx
C:\Users\5927226\AppData\Local\Programs\Python\Python311\Lib\site-packages\asreview\io\utils.py:142: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(LABEL_NA, inplace=True)
Detected 260 records with label '1', from which 0 duplicate records with the same label were removed.
Detected 285 records with label '-1', from which 0 duplicate records with the same label were removed.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\5927226\AppData\Local\Programs\Python\Python311\Scripts\asreview.exe\__main__.py", line 7, in <module>
  File "C:\Users\5927226\AppData\Local\Programs\Python\Python311\Lib\site-packages\asreview\__main__.py", line 50, in main
    _execute_entry_point(entry, sys.argv[2:])
  File "C:\Users\5927226\AppData\Local\Programs\Python\Python311\Lib\site-packages\asreview\__main__.py", line 28, in _execute_entry_point
    entry().execute(args)
  File "C:\Users\5927226\AppData\Local\Programs\Python\Python311\Lib\site-packages\asreviewcontrib\datatools\entrypoint.py", line 95, in execute
    compose(
  File "C:\Users\5927226\AppData\Local\Programs\Python\Python311\Lib\site-packages\asreviewcontrib\datatools\compose.py", line 246, in compose
    df_composition = create_composition(
                     ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\5927226\AppData\Local\Programs\Python\Python311\Lib\site-packages\asreviewcontrib\datatools\compose.py", line 157, in create_composition
    df_conflicting_dups = as_conflict.df[as_conflict.duplicated(pid)]
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\5927226\AppData\Local\Programs\Python\Python311\Lib\site-packages\asreview\data\base.py", line 538, in duplicated
    pd.Series(self.texts)
              ^^^^^^^^^^
  File "C:\Users\5927226\AppData\Local\Programs\Python\Python311\Lib\site-packages\asreview\data\base.py", line 282, in texts
    [self.title[i] + " " + self.abstract[i] for i in range(len(self))],
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\5927226\AppData\Local\Programs\Python\Python311\Lib\site-packages\asreview\data\base.py", line 282, in <listcomp>
    [self.title[i] + " " + self.abstract[i] for i in range(len(self))],
     ~~~~~~~~~~~~~~^~~~~
TypeError: unsupported operand type(s) for +: 'float' and 'str'

When testing on 2 different devices, @Gui921 and I ran across is error when using the compose feature.

@jteijema
Copy link
Member Author

jteijema commented Aug 9, 2024

data (1).xlsx
data (2).xlsx

Here's small excerpts of the dataset. Still giving the error.

@PeterLombaers
Copy link
Member

PeterLombaers commented Aug 15, 2024

When reading your data, ASReview needs to know in which column the titles and abstract are. It accepts two columns names for the titles, title and primary_title, and three for the abstract, abstract, abstract note and notes_abstract (they are defined here). If they are both available, which column ASReview will pick depends on the order of the columns.

One of your datasets contains the columns title, primary_title, abstract and notes_abstract, the other dataset only contains title and abstract. So for titles, when combining the two datasets, it will have both columns, but some rows will have missing values for primary_title and notes_abstract. In your case it turns out that ASReview picks primary_title and notes_abstract as the columns to use. But there are missing values in half the rows, and because compose has merged dataframes, apparently it interprets the missing values as missing numbers and not missing string. Then it tries to combine missing numbers with strings and that causes the actual error you see.

We should probably also fix this in the main ASReview repo and here in datatools. When merging, compose should probably make sure that all the rows have the same data type (all strings or all numbers), because apparently this can go wrong if there is missing data in one of the input datasets. And in ASReview it is probably wise the give a warning when the user has multiple columns for the same type of data. We are in the process of changing how the ASReview data API works, though, so it might already get fixed automatically.

For now, the quick fix would be to remove the primary_title and notes_abstract columns from your dataset or rename them before running asreview data compose.

@jteijema
Copy link
Member Author

Thank you, @PeterLombaers. This fixes our issue, we've got the new dataset!

@J535D165 J535D165 reopened this Aug 19, 2024
@J535D165
Copy link
Member

We reopened it and are waiting for an upstream fix. Additional changes (fix or informative error) to datatools are also welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants