-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Parsing C++ Files for Code2Seq #198
Comments
Hi! At the start astminer will tell you how many threads will be created (by default only 1 but you can change it in the config with |
of course this the file, thanks for you suport. |
It's really strange. Astminer created only 1 thread, but some subcomponents created multiple threads. Could you also provide us with some of your's system info: OS, gcc compiler version, JDK version etc. I think anything can help :) In the meantime you can run astminer from docker. For this run |
@illided Thank you very much for your support. unfortunately still I have the issue,although this time the processing went so far than the previous tried, but likewise this couldn't make it at least 20%, the strangest it was that this time I executed it over cli and docker. the app is being run over an instance machine with 4 cpu and 132 gb ram, under linux(ubuntu 20.04 and debian 10) and with stable latest version of java 11. pdt when I executed the app under docker, I didn't know configure the parameters associated to heap in Java like -Xms -Xms for this reason I executed the app of the default way. I share you the last 100000 lines of log |
Ok, now it's super strange :) |
Also, if you are in a real hurry, you can implement C ++ support through a tree sitter grammar. Parser that we are using for c++ right now is not the most stable. We are planing to do this in the future, but PR is welcome. |
@illided of course very kind of you :). here in the next link you will find the dataset that I'm trying to process. |
I'm now running into the same situation:OOM error when trying to create thread, and my dataset is also very large. Hope you will find out the reason soon, thanks! |
Sorry, didn't had a time to respond. As I suspected fuzzy parser creates multiple threads and can't close them properly. In fact it creates up to 6 threads for each file! This issue is fixed in new version of fuzzy parser, but unfortunately it's not compatible with astminer and we have no time right now to fully rewrite this part :( You can try to experiment with the code, or, as i said earlier, try to implement support for c++ through tree-sitter. It shouldn't be that hard, as there a lot of examples of what you need to implement. See |
Hi, illided. Thank you for your prompt reply, I write a shell script to run astminer on a small batch of dataset each time and concat the path_context.c2s, I think this can be a quick solution for this problem temporarily. |
Sounds interesting! Could you share this solution here in case anyone will get the same problem? |
please, may you share the script . |
Hi everyone! |
Hi, sorry for not reply in time, this is my shell script, and I haven't fully tested it, you can refer to it as a sample code. VAL_DIR=your val set
AST_MINER_HOME=your astminer directory
DATASET_NAME=your dataset name
#model.yaml : example is given blow
MODEL_CONFIG=${AST_MINER_HOME}/model.yaml
CURRENT_WORK_DIRECTORY=$PWD
VAL_OUTPUT_DIR=${CURRENT_WORK_DIRECTORY}/data/${DATASET_NAME}/output/val
mkdir -p data
mkdir -p data/${DATASET_NAME}
mkdir -p data/${DATASET_NAME}/output
mkdir -p VAL_OUTPUT_DIR
mkdir -p ${VAL_OUTPUT_DIR}/total
#This is the c2s file after concatenating.
VAL_DATA_FILE=${VAL_OUTPUT_DIR}/total/path_contexts.c2s
touch $TRAIN_DATA_FILE
touch $VAL_DATA_FILE
touch $TEST_DATA_FILE
>$TRAIN_DATA_FILE
>$VAL_DATA_FILE
>$TEST_DATA_FILE
#before this, you have to split your dataset into small folders!!!!!
cd $VAL_DIR
echo "change pwd to:$PWD"
nums=$(ls -l|grep "^d"| wc -l)
echo $nums
dirs=$(ls -l |awk '/^d/ {print $NF}')
dirs=(${dirs//,/ })
VAL_CONFIG_TMP=$AST_MINER_HOME/val_tmp.yaml
touch $VAL_CONFIG_TMP
> $VAL_CONFIG_TMP
cd $AST_MINER_HOME
echo "change pwd to:$PWD"
for dir in ${dirs[@]};do
echo $VAL_DIR/$dir
rm -f ${VAL_OUTPUT_DIR}/c/data/path_contexts.c2s
echo "inputDir: $VAL_DIR/$dir">$VAL_CONFIG_TMP
echo "outputDir: ${VAL_OUTPUT_DIR}">>$VAL_CONFIG_TMP
cat $MODEL_CONFIG >>$VAL_CONFIG_TMP
source $AST_MINER_HOME/cli.sh $VAL_CONFIG_TMP
VAL_DATA_FILE_TMP=${VAL_OUTPUT_DIR}/c/data/path_contexts.c2s
if [ -e $VAL_DATA_FILE_TMP ]; then
cat $VAL_DATA_FILE_TMP >> $VAL_DATA_FILE
echo "" >> $VAL_DATA_FILE
fi
done
|
Hi sir, I want to take this occasion to express my admiration for your great job extremely useful for my Bachelor's thesis.
Currently, I am trying to convert my dataset of about 100000 cpp files, but always I am getting this error.
Originally thought that the issue was related to the capacity of memory, but quickly this was dismissed because I work with instances of machines of up to 150G Ram, also I checked of limits set by the OS linux and these are far above of the thread quantity created when the App crash. even thanks to the capacity of instances of the machines with whom I have worked, it's possible create a thread by each cpp file and the system would not have any problem of performance or configuration.
I have kept track of the logs JVM and with the help an analyzing tool. I found that over 99% thread is established in state-timed waiting and all these are asociated to MVStore.
may you help me with this issue.
Thank you very much for your attention
The text was updated successfully, but these errors were encountered: