Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better performance and feedback #171

Merged
merged 20 commits into from
Aug 6, 2021

Conversation

illided
Copy link
Contributor

@illided illided commented Aug 2, 2021

This PR is aimed to improve user experience when working with the astminer.

  • Better performance. Astminer is very slow and works in one thread. By adding parallelism and some optimizations (precompiling regexes for example) it was possible to speed astminer up by 30%-50%. This PR uses ordinary Java threads, but if you have better idea how to improve astminer performance by using coroutines please write it down here. It just doesn't work that well when i try it 😔
  • Status bar and minor feedback improvements. Right now astminer have no feedback about what he is doing and if it's actually working. This PR addes some sort of feedback and cool looking status bar which is thread-safe and doesn't load CPU very much.

@illided illided marked this pull request as draft August 2, 2021 18:52
@illided
Copy link
Contributor Author

illided commented Aug 2, 2021

Will be ready for review after adding some stress tests to check if critical section is synchronized

@vovak
Copy link
Member

vovak commented Aug 2, 2021 via email

@illided
Copy link
Contributor Author

illided commented Aug 3, 2021

Do we have a working performance benchmark at the moment?

Unfortunately, the old benchmarks are very outdated and require a serious revision, so for now they have been removed from the astminer. 30-50 percent was obtained on my (not the most powerful 😄) computer, so it is possible that the numbers are slightly different.

@illided
Copy link
Contributor Author

illided commented Aug 4, 2021

For some reason if you use anything then extension function chunked to split files by threads it results in +20% working time. If anyone has an idea why it happens please write it down here.

@illided illided marked this pull request as ready for review August 5, 2021 13:19

val normalizedToken: String by lazy {
val normalizedToken: String = run {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use run command? I think you can directly call originalToken?.let {}


private val logger = KotlinLogging.logger("HandlerFactory")
private const val NUM_OF_THREADS = 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use configure parameters. So users can easily change the number of threads.

fun <T> parseFiles(files: List<File>, action: (ParsingResult<out Node>) -> T) =
fun <T> parseFiles(
files: List<File>,
progressBar: ProgressBar? = null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like passing the progress bar into this function. Maybe it's better to put an updating progress bar at the end of the action function, which is defined inside the pipeline. And pipeline is the right place for working with progress bar.

return results
}

fun <T> parseFilesAsync(files: List<File>, action: (ParsingResult<out Node>) -> T): List<T?> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's multithread, not async.

parsingResultFactory.parseFilesAsync(files) { parseResult ->
for (labeledResult in branch.process(parseResult)) {
storage.store(labeledResult)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, you can update progress bar here

import java.io.FileReader
import kotlin.test.assertEquals

class PipelineAsyncStressTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, you use multithreading, not asynchronous exectution.


synchronized(results) {
files.chunked(files.size / NUM_OF_THREADS + 1).filter { it.isNotEmpty() }
files.chunked(files.size / numOfThreads + 1).filter { it.isNotEmpty() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use ceil(files.size / numOfThreads). This is more accurate (the case is taken into account if it is divided entirely), as well as more readable (here I had to hang up, why are you +1 doing)

@@ -14,5 +14,6 @@ data class PipelineConfig(
val parser: ParserConfig,
val filters: List<FilterConfig> = emptyList(),
@SerialName("label") val labelExtractor: LabelExtractorConfig,
val storage: StorageConfig
val storage: StorageConfig,
val performance: PerformanceConfig = defaultPerformanceConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use yet another config object? Maybe we can set the number of threads directly in this config?
Also, I suggest setting the number of threads nullable. Null means no thread at all.

@illided illided requested a review from SpirinEgor August 6, 2021 11:02
@SpirinEgor SpirinEgor merged commit 7aefe8e into JetBrains-Research:master-dev Aug 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants