Settings configuration

In order to run the tool, you need to create a file settings.json in this directory, based on settings.json.example. Here are the detailed explanations of the settings:

Settings for the collect-cgs mode:

Name	Description
gumtree_bin_path	path to GumTree binary file
git_repositories_dir	path to the directory with Git repositories
traverse_file_max_line_count	the maximum number of lines in the analyzed files (processing larger files may sometimes cause memory issues)
traverse_async	true for the asynchronous processing of repositories
traverse_min_date	(optional) the date in the %d.%m.%Y format, no changes older than this date will be processed
change_graphs_storage_dir	path to the output directory
change_graphs_store_interval	batch size of the number of change graphs to be saved in a single file (to prevent the files from getting too big)

Settings for the patterns mode:

By default, the input for mining patterns is the output of collecting change graphs, so this step should be run after the previous one.

Name	Description
patterns_output_dir	path to the output directory (in order to correctly visualize the results, this directory must be located here or have all the same files in the parent directory)
patterns_output_details	true for saving a JSON for each pattern instance with its details: repository name, commit hash, contacts of the author, and the names of the functions (please note that if you want such information for all the instances, you also need to switch patterns_full_print to true)
patterns_min_frequency	minimum frequency of the changes graph repetition to be considered a pattern
patterns_max_frequency	frequency of the changes graph repetition for the pattern to be considered common (for such patterns, some instances are ignored for optimization purposes)
patterns_async_mining	true for the asynchronous mining of patterns (not recommended)
patterns_full_print	true for saving the information about every individual instance of a pattern, false for saving one instance per pattern
patterns_hide_overlapped_fragments	true for ignoring pattern instances with overlapping code fragments
patterns_min_size	minimum number of nodes that the pattern must have to be included in the output
patterns_min_date	(optional) the date in the %d.%m.%Y format, no changes older than this date will be processed

Additional settings:

Name	Description
logger_file_path	path to the log output file
logger_file_log_level	log level to be saved into the log file, possible values: ERROR, WARNING, INFO, DEBUG
logger_stdout_log_level	log level to be printed into the console, possible values: ERROR, WARNING, INFO, DEBUG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

help.md

help.md

Settings configuration

Settings for the collect-cgs mode:

Settings for the patterns mode:

Additional settings:

Files

help.md

Latest commit

History

help.md

File metadata and controls

Settings configuration

Settings for the collect-cgs mode:

Settings for the patterns mode:

Additional settings: