This project aims towards implementing a conditional random field for optical character recognition (OCR) with emphasis on inference and performance test. The project is divided into four parts:
- Conditional Random Fields: We derive the gradients with respect to weights Wj and transition matrix Tij. We then show the gradient of log(Z) is exactly the expectation of features. Finally, we implement a decoder with computational cost O(m(y)^2) using the max-sum algorithm. We also implemented a brute-force solution in time O((y)^m) to test that decoder implementation is correct.
- Training Conditional Random Fields: We now implement a DP algorithm to compute log(p(y|X)) and its gradient with respect to Wj and Tij. We used scipy.optimize.check_grad to ensure the correctness of the computed gradient. The second part of this portion dealt with using fmin_bfgs to get the optimal values for Wj and Tij. Finally, we used the decoder from part 1 and used the new weights to predict labels for characters in test.txt.
- Benchmarking with Other Methods: We benchmarked our implementation with other methods such as SVM-Struct and SVM-MC and plotted the letter-wise and word-wise prediction accuracy on test data.
- Robustness to Tampering: We explore what effects do rotation and translation of images have on the decoder and its accuracy. We ran the CRF and SVM-MC algorithm on the distorted images and checked the letter-wise and word-wise prediction accuracy on test data.
Packages required to run the code can be found in requirements.txt:
Run: pip install -r requirements.txt
- scipy
- scikit-learn
- numpy
Run each script for respective question. Comment out parts which are long running.
Note: TF is for checking with our final solution, to run, use pip install tensorflow or pip install tensorflow-gpu if CUDA is installed
- Run q1.py to run the max-sum decoder on 100 characters from the initial dataset. The results are stored in result/decoder_output.txt.
- Run q2.py to calculate log(p(y|X)), calculate the gradients, and start the optimizer. The gradients can be found in result/gradient.txt. The predictions from 2(b) can be found in result/prediction.txt
- The optimal parameters are pre-computed based on C values and can be found under result/solution[C].txt where C=1, 10, 100, 1000.
- The optimal paramters are pre-computed and stored under result/solution[C]distortion.txt where C=500, 1000, 1500, 2000.
- Professor Xinhua Zhang
- Vignesh Ganapathiraman