-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encountered error of preprocess data #127
Comments
Hi @yingdehuijin , I don't know if there is anything wrong with this dataset, I have never used it. However, it does seem like the files there will not parse. Are they raw java files? Maybe they have a different format? Our preprocessing pipeline expects raw java files. Can you provide a single example from the dataset? Best, |
Thank you for your reply |
The "nl: matches a float" are part of the same file? See also: #45 Best, |
Hello, I encountered the same issue while preprocessing the files. Does the original JAR package handle exceptions, such as skipping files that do not meet the format requirements without preprocessing them? I'm using it to process my own dataset, but it's throwing errors. I'm not sure if it will keep getting stuck there. |
Hi @lidiancracy , The truth is that I don't remember, this code was written about 5 years ago. If you wish to debug it go ahead, the entire java code is available in this repo. But I recommend using newer models such as PolyCoder: Best, |
@urialon Thank you for your timely reply. My .sh file now terminates normally and has produced 4 files with the .c2s extension. I think the logic in the JAR package is probably fine. By the way, can I continue to train a new dataset on a model that has been trained well, similar to transfer learning and incremental training? I did not find any relevant information in the readme, did I miss something?Thank you in advance. |
Yes |
Hi,Uri
Hi, I am using code2seq to run on EMSE-DeepCom https://github.com/xing-hu/EMSE-DeepCom newest datasets. I followed your suggestiones to run scripts preprocess.sh,but i have encountered errors on test/val/train datasets.The error_log.txt and stdout show the following information:
b'java.util.concurrent.ExecutionException: com.github.javaparser.ParseProblemException: Encountered unexpected token: ">" ">"\n at line 2, column 407.\n\nWas expecting one of:\n\n
And examples are decreased: 20000 test methods hava decreased to 17060 , 20000 valid methods decreased to 17043 and 480000 methods decreased to 380001. Are there something wrong with the datasets?
Looking forward your reply!
Wcc
The text was updated successfully, but these errors were encountered: