Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure python Inception and scala Inception examples have the same result #2515

Open
jason-dai opened this issue May 2, 2018 · 19 comments
Assignees

Comments

@jason-dai
Copy link
Contributor

Especially for the training results.

@jason-dai
Copy link
Contributor Author

Any update?

@yiheng
Copy link
Contributor

yiheng commented May 23, 2018

@jenniew Can you provide the example code to reproduce this issue?

@jason-dai
Copy link
Contributor Author

@yiheng yiheng assigned wzhongyuan and unassigned yiheng and qiuxin2012 May 23, 2018
@yiheng
Copy link
Contributor

yiheng commented May 23, 2018

@wzhongyuan please take a look at this issue

@wzhongyuan
Copy link
Contributor

The python code and related parameters are same as scala ones, finally found the difference caused by preprocessing :) , Python uses OpenCV while Scala still uses legacy ones

 DataSet.SeqFileFolder.files(path, sc, classNumber).transform(
      MTLabeledBGRImgToBatch[ByteRecord](
        width = imageSize,
        height = imageSize,
        batchSize = batchSize,
        transformer = (BytesToBGRImg() -> BGRImgCropper(imageSize, imageSize)
          -> DatasetHFlip(0.5) -> BGRImgNormalizer(0.485, 0.456, 0.406, 0.229, 0.224, 0.225))
      ))
    train_transformer = Pipeline([PixelBytesToMat(),
                                  RandomCrop(image_size, image_size),
                                  RandomTransformer(HFlip(), 0.5),
                                  ChannelNormalize(0.485, 0.456, 0.406, 0.229, 0.224, 0.225),
                                  MatToTensor(to_rgb=True),
                                  ImageFrameToSample(input_keys=["imageTensor"], target_keys=["label"])
                                  ])

Will raise a PR to fix it

@wzhongyuan
Copy link
Contributor

Need to implement a replacement of MTLabeledBGRImgToBatch compatible with OpenCV and then apply new Opencv transformers

@jason-dai
Copy link
Contributor Author

I think the main issue is that the Top5 accuracy results of these two examples are different (the Scala version has higher accuracy). @jenniew can you provide the testing results?

@jenniew
Copy link
Contributor

jenniew commented May 25, 2018

scala: top5: 88.2 top1: 68.2
python: top5: 87.4 top1: 67.1

@wzhongyuan
Copy link
Contributor

Seems we cannot refactor scala code for now as python has worse performance... holding the change for now.
I will run a full training to re-produce the number and will try to find if any referenced caffe parameters and run accordingly with the same config to see how it works if caffe has better accuracy

@wzhongyuan
Copy link
Contributor

@jenniew thanks for providing the data

@jenniew
Copy link
Contributor

jenniew commented May 25, 2018

Python ChannelNormalize has an issue: #2544
Maybe it's related with the accuracy difference.

@jenniew
Copy link
Contributor

jenniew commented May 25, 2018

caffe also uses opencv to transform, so I think using opencv would not decrease accuracy.

@wzhongyuan
Copy link
Contributor

@jenniew I think that's the issue causing the discrepancy. The training result after the fix is much better now. I will put the fix in, quite simple change.

@wzhongyuan
Copy link
Contributor

Will refactor the scala transformers as well

@jason-dai
Copy link
Contributor Author

I believe caffe also use opencv based preprocessing? maybe we can just compared to caffe.

@wzhongyuan
Copy link
Contributor

We mirrored intel caffe (please refer to #2555), but did not get compatible result, the final accuracy is below than what we've done with legacy transformers.

@wzhongyuan
Copy link
Contributor

Closing this issue as it's fixed

@jason-dai
Copy link
Contributor Author

Reopen - cannot reproduce the results

@wzhongyuan
Copy link
Contributor

Probably caused by regression from commit ce7e573 , this commit is made after the convergence testing. we'll check and get back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants