-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve dataset structure #175
Preserve dataset structure #175
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but needs a little bit of change.
Also, could you write logic with datasets into docs/storage.md
configs/antlr_java_js_ast.yaml
Outdated
name: code2seq | ||
length: 9 | ||
width: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change the config example?
parsingResultFactory.parseFiles(files) { parseResult -> | ||
for (labeledResult in branch.process(parseResult)) { | ||
storage.store(labeledResult) | ||
for ((holdoutType, holdout) in holdouts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's name it holdoutPath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But holdout type is File. Maybe holdoutDir
?
Resolve conflicts, please |
# Conflicts: # src/main/kotlin/astminer/pipeline/Pipeline.kt
private fun <T : Closeable, R> T.useSynchronously(callback: (T) -> R) = | ||
this.use { | ||
synchronized(this) { | ||
callback(this) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks complicated ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add in storage interface syncronizedStore
:
fun syncronizedStore(labeledResult: LabeledResult<out Node>) = syncronized {
store(labeledResult)
}
} | ||
|
||
private fun printHoldoutStat(files: List<File>, holdoutType: DatasetHoldout) { | ||
var output = "${files.size} file(s) found" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use StringBuilder for manipulation with strings to avoid using mutable strings
This PR adds ability to recognize dataset structure (folders train, test, val) and recreate their structure in the final output directory.