Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent input preprocessing in PyTorch demo #18

Open
function2-llx opened this issue Dec 21, 2023 · 2 comments
Open

inconsistent input preprocessing in PyTorch demo #18

function2-llx opened this issue Dec 21, 2023 · 2 comments

Comments

@function2-llx
Copy link

Dear author,

Thank you for the contribution of this work. I'm trying to use the pre-trained network as a feature extractor. To best utilize the pre-trained weights, I must figure out how input images are pre-processed during pre-training and follow it exactly. However, I find two different pre-processing approaches in this repository.

The first one is found in the Tensorflow training code. The data will firstly be rescaled to [0, 1]. Then in preprocess_input function, since the default mode is caffe, it will reorder the channels from "RGB" to "BGR", and subtract the mean value of ImageNet (doc).

train_data_generator = ImageDataGenerator(
rescale=1./255,
preprocessing_function=preprocess_input,
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='nearest')

The second one (in the PyTorch Demo) rescales the input to [-1, 1], which is different from the first one and will result in different input distribution and output of the pre-trained network.

class createDataset(Dataset):
    def __init__(self, dataframe, transform=None):
        self.dataframe = dataframe
        self.transform = transforms.Compose([transforms.ToTensor()])

    def __len__(self):
        return self.dataframe.shape[0]
        
    def __getitem__(self, index):
        image = self.dataframe.iloc[index]["img_dir"]
        image = cv2.imread(image)
        image = (image-127.5)*2 / 255
        image = cv2.resize(image,(224,224))
        #image = np.transpose(image,(2,0,1))   
        if self.transform is not None:
            image = self.transform(image)
        label = self.dataframe.iloc[index]["label"]
        return {"image": image , "label": torch.tensor(label, dtype=torch.long)}

It will be much appreciated if there can be clarification on this issue. Thanks!

@Can-Zhao
Copy link

Can-Zhao commented Mar 1, 2024

I'm also confused by it. I checked ImageDataGenerator code, https://github.com/keras-team/keras/blob/601488fd4c1468ae7872e132e0f1c9843df54182/keras/preprocessing/image.py#L1849-L1852. What it did inside is to first use preprocess_input to zero-center the images, then use rescale=1./255 to rescale. So during training, the images will be normalized by img = (img-mean)/255. In the case when original img has range [0,255] with mean of 127.5, it will be normalized to [-0.5,0.5]? If so, it seems demo code needs to be changed

@jooho7lee
Copy link

Are there any updates regarding this matter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants