Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application to real case study #176

Open
Avv22 opened this issue Feb 18, 2023 · 11 comments
Open

Application to real case study #176

Avv22 opened this issue Feb 18, 2023 · 11 comments

Comments

@Avv22
Copy link

Avv22 commented Feb 18, 2023

Hello Code2Vec team,

Could you please give some hints giving a whole software of code written in different programming languages, how it's possible to apply your tool on it?

@urialon
Copy link
Collaborator

urialon commented Feb 18, 2023

Hey @Avv22 ,
Thank you for your interest in our work!

Our repository supports only Java and C#.
We have a newer model that supports all languages called PolyCoder. Loading it takes only a few lines of code using the Huggingface Transformers library. see:

https://arxiv.org/pdf/2202.13169.pdf
https://github.com/VHellendoorn/Code-LMs#october-2022---polycoder-is-available-on-huggingface

Best,
Uri

@Avv22
Copy link
Author

Avv22 commented Feb 18, 2023

Hey @Avv22 , Thank you for your interest in our work!

Our repository supports only Java and C#. We have a newer model that supports all languages called PolyCoder. Loading it takes only a few lines of code using the Huggingface Transformers library. see:

https://arxiv.org/pdf/2202.13169.pdf https://github.com/VHellendoorn/Code-LMs#october-2022---polycoder-is-available-on-huggingface

Best, Uri

Thank you. I mean if we have a big software of source code in Java. What would be your strategy you decompose the software and give it to your tool please?

@urialon
Copy link
Collaborator

urialon commented Feb 19, 2023

Sorry, I don't understand you're question.
What is your goal? What are you trying to do?

@Avv22
Copy link
Author

Avv22 commented Feb 21, 2023

Sorry, I don't understand you're question. What is your goal? What are you trying to do?

Thank you for your quick reply. I just meant that if we have a complete system. How we can decompose it and pass if to your model so that it predicts the names of blocks inside the system? Do you suggest decomposing the system method-wise and then try to predict a name for each method?

I was trying to use your tool to generate a script (name to tell what software does).

@urialon
Copy link
Collaborator

urialon commented Feb 28, 2023

Do you suggest decomposing the system method-wise and then try to predict a name for each method?

Yes, this is basically what our preprocessing pipeline does automatically.

@Avv22
Copy link
Author

Avv22 commented Feb 28, 2023

Do you suggest decomposing the system method-wise and then trying to predict a name for each method?

Yes, this is basically what our preprocessing pipeline does automatically.

Thank you very much. You split the code method-based, but can you please show (reference) where you do that in your code? Did you do it by yourself or you used a tool for that?

@urialon
Copy link
Collaborator

urialon commented Feb 28, 2023

First, our code goes through all files in the directory:
https://github.com/tech-srl/code2vec/blob/master/JavaExtractor/JPredict/src/main/java/JavaExtractor/App.java#L43-L47

Then, I used JavaParser to parse each file in the project, and traverse the resulting AST and extract "method nodes":
https://github.com/tech-srl/code2vec/blob/master/JavaExtractor/JPredict/src/main/java/JavaExtractor/FeatureExtractor.java#L39-L49

But that's a very Java-specific pipeline, I wouldn't use the same code for JavaScript.

@Avv22
Copy link
Author

Avv22 commented Mar 13, 2023

@urialon. Thank you. Appreciate it. Would you recommend a similar PythonPraser for Python please to extract the method node if possible?

@urialon
Copy link
Collaborator

urialon commented Mar 13, 2023

Yes,
Our newer project Code2seq has a Python extractor, and the model itself is also much better.

Best,
Uri

@Avv22
Copy link
Author

Avv22 commented Mar 14, 2023

Yes, Our newer project Code2seq has a Python extractor, and the model itself is also much better.

Best, Uri

Thanks. The Python extractor you developed works similarily to how JavaParser works by extracting method node form python AST source code, please?

@urialon
Copy link
Collaborator

urialon commented Mar 14, 2023

It was contributed from the community, so it might be a little different. I think that by default, it was designed to process a specific dataset. However its logic is the same and its output fits the code2seq model.

Best,
Uri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants