-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better performance and feedback #171
Conversation
Will be ready for review after adding some stress tests to check if critical section is synchronized |
Thank you for taking care of this!
Do we have a working performance benchmark at the moment?
… On 2 Aug 2021, at 20:53, Ilya Utkin ***@***.***> wrote:
Will be ready for review after adding some stress tests to check if critical section is synchronized
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
Unfortunately, the old benchmarks are very outdated and require a serious revision, so for now they have been removed from the astminer. 30-50 percent was obtained on my (not the most powerful 😄) computer, so it is possible that the numbers are slightly different. |
For some reason if you use anything then extension function chunked to split files by threads it results in +20% working time. If anyone has an idea why it happens please write it down here. |
|
||
val normalizedToken: String by lazy { | ||
val normalizedToken: String = run { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you use run
command? I think you can directly call originalToken?.let {}
|
||
private val logger = KotlinLogging.logger("HandlerFactory") | ||
private const val NUM_OF_THREADS = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use configure parameters. So users can easily change the number of threads.
fun <T> parseFiles(files: List<File>, action: (ParsingResult<out Node>) -> T) = | ||
fun <T> parseFiles( | ||
files: List<File>, | ||
progressBar: ProgressBar? = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like passing the progress bar into this function. Maybe it's better to put an updating progress bar at the end of the action function, which is defined inside the pipeline. And pipeline is the right place for working with progress bar.
return results | ||
} | ||
|
||
fun <T> parseFilesAsync(files: List<File>, action: (ParsingResult<out Node>) -> T): List<T?> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's multithread, not async.
parsingResultFactory.parseFilesAsync(files) { parseResult -> | ||
for (labeledResult in branch.process(parseResult)) { | ||
storage.store(labeledResult) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, you can update progress bar here
import java.io.FileReader | ||
import kotlin.test.assertEquals | ||
|
||
class PipelineAsyncStressTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, you use multithreading, not asynchronous exectution.
|
||
synchronized(results) { | ||
files.chunked(files.size / NUM_OF_THREADS + 1).filter { it.isNotEmpty() } | ||
files.chunked(files.size / numOfThreads + 1).filter { it.isNotEmpty() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use ceil(files.size / numOfThreads)
. This is more accurate (the case is taken into account if it is divided entirely), as well as more readable (here I had to hang up, why are you +1 doing)
@@ -14,5 +14,6 @@ data class PipelineConfig( | |||
val parser: ParserConfig, | |||
val filters: List<FilterConfig> = emptyList(), | |||
@SerialName("label") val labelExtractor: LabelExtractorConfig, | |||
val storage: StorageConfig | |||
val storage: StorageConfig, | |||
val performance: PerformanceConfig = defaultPerformanceConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use yet another config object? Maybe we can set the number of threads directly in this config?
Also, I suggest setting the number of threads nullable. Null means no thread at all.
This PR is aimed to improve user experience when working with the astminer.