I've past the test, but I was intent to uneditable PDF into pure text, so we can make flashcards, tag them, make a graph showing their relations, and analyse the frequency of each word. To get this work done, software including PyCharm, Sumblime Text, MS Excel, and Adobe Acrobat, and techinque like RegEx, Python were used.
In the result, he original pdf is broken down into pure text and images, and the structure of pure text is like:
# | Answer | Choice A | Choice B | Choice C | Choice D(if any) | picture links(if any) |
---|---|---|---|---|---|---|
1 | (B) | ... | ... | ... | ... | <img src = "../pics/xxx.jpeg"> |
pics is a zip file to be extracted!
ENJOY!