You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been using the SingleCells class from pycytominer cells.py file to extract image and object features measured with CellProfiler. I noticed that the image features were not being extracted into the outputted CSV file when I set these parameters:
The outputted CSV has the same number of columns when I have the add_image_features parameter set to true and false.
When I go into the code, the self.add_image_features is calling a function called extract_image_features, which says it should return two things:
Returns
-------
image_features_df : pandas.core.frame.DataFrame
Dataframe with extracted image features.
image_feature_categories : list of str
Correctly formatted image feature categories.
Firstly, this function only returns the image_features_df and does not return a list of correctly formatted image feature categories.
Second and most important, this function is returning an empty list due to how the function is formatted.
The extract_image_features function first uses the check_image_features function to determine if the list of categories given is within the image_df. The way it determines this is by checking if one of the categories in the list is within a column as the first index (e.g., if I have Correlation in my list, then it will not give an error if I have columns in the image_df that have the name Image_Correlation...). Since it isn't stated in the documentation, then this function is the only thing that tells me that the format of the image_feature_categories should look like this;
['Correlation', 'ImageQuality', 'Texture', ...]
The list contains the first index of the column names in the image_df.
But, when I use this list, the extract_image_features will return an empty list because this portion of the function:
# Extract Image features from image_feature_categoriesimage_features=list(
image_df.columns[
image_df.columns.str.startswith(tuple(image_feature_categories))
]
)
The code block creates a list of image columns based on any column within the image_df that starts with any of the feature categories. Since the only way to pass the check_image_features function, the list must be formatted as I showed earlier. That means this function is trying to find any column that starts with this list, but when I go to the SQLite file exported from CellProfiler, all columns start with Image_.
What is very confusing to me is that the check_image_features function expects the category to be located in the first index of the column name, but the extract_image_features uses the startswith function which should never work for this situation because the columns prefix would be the zero index.
Based on all this, that means that this function or class will never output image features if the SQLite file has been directly exported from CellProfiler.
The only way I was able to get around this was by editing this function so that it doesn't use the check_image_features function and uses a list that is the same as the list above, but all strings within the list have the prefix of Image_. But this fix will create a separate CSV file with the metadata and image features and will not add the image features with the object features in one CSV which is what I am assuming was meant to happen in this class.
What would be the best way to edit this function to be more flexible to the CellProfiler SQLite output since this works for SQLite files using CellProfiler features collected differently?
The text was updated successfully, but these errors were encountered:
I have been using the SingleCells class from pycytominer cells.py file to extract image and object features measured with CellProfiler. I noticed that the image features were not being extracted into the outputted CSV file when I set these parameters:
The outputted CSV has the same number of columns when I have the
add_image_features
parameter set to true and false.When I go into the code, the
self.add_image_features
is calling a function calledextract_image_features
, which says it should return two things:Firstly, this function only returns the
image_features_df
and does not return a list of correctly formatted image feature categories.Second and most important, this function is returning an empty list due to how the function is formatted.
The
extract_image_features
function first uses thecheck_image_features
function to determine if the list of categories given is within the image_df. The way it determines this is by checking if one of the categories in the list is within a column as the first index (e.g., if I haveCorrelation
in my list, then it will not give an error if I have columns in the image_df that have the nameImage_Correlation...
). Since it isn't stated in the documentation, then this function is the only thing that tells me that the format of theimage_feature_categories
should look like this;The list contains the first index of the column names in the image_df.
But, when I use this list, the
extract_image_features
will return an empty list because this portion of the function:The code block creates a list of image columns based on any column within the image_df that starts with any of the feature categories. Since the only way to pass the
check_image_features
function, the list must be formatted as I showed earlier. That means this function is trying to find any column that starts with this list, but when I go to the SQLite file exported from CellProfiler, all columns start withImage_
.What is very confusing to me is that the
check_image_features
function expects the category to be located in the first index of the column name, but theextract_image_features
uses thestartswith
function which should never work for this situation because the columns prefix would be the zero index.Based on all this, that means that this function or class will never output image features if the SQLite file has been directly exported from CellProfiler.
The only way I was able to get around this was by editing this function so that it doesn't use the
check_image_features
function and uses a list that is the same as the list above, but all strings within the list have the prefix ofImage_
. But this fix will create a separate CSV file with the metadata and image features and will not add the image features with the object features in one CSV which is what I am assuming was meant to happen in this class.What would be the best way to edit this function to be more flexible to the CellProfiler SQLite output since this works for SQLite files using CellProfiler features collected differently?
The text was updated successfully, but these errors were encountered: