Better performance and feedback #171

illided · 2021-08-02T18:43:03Z

This PR is aimed to improve user experience when working with the astminer.

Better performance. Astminer is very slow and works in one thread. By adding parallelism and some optimizations (precompiling regexes for example) it was possible to speed astminer up by 30%-50%. This PR uses ordinary Java threads, but if you have better idea how to improve astminer performance by using coroutines please write it down here. It just doesn't work that well when i try it 😔
Status bar and minor feedback improvements. Right now astminer have no feedback about what he is doing and if it's actually working. This PR addes some sort of feedback and cool looking status bar which is thread-safe and doesn't load CPU very much.

illided · 2021-08-02T18:53:25Z

Will be ready for review after adding some stress tests to check if critical section is synchronized

vovak · 2021-08-02T20:37:35Z

Thank you for taking care of this! Do we have a working performance benchmark at the moment?

On 2 Aug 2021, at 20:53, Ilya Utkin ***@***.***> wrote: Will be ready for review after adding some stress tests to check if critical section is synchronized — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

illided · 2021-08-03T12:14:06Z

Do we have a working performance benchmark at the moment?

Unfortunately, the old benchmarks are very outdated and require a serious revision, so for now they have been removed from the astminer. 30-50 percent was obtained on my (not the most powerful 😄) computer, so it is possible that the numbers are slightly different.

illided · 2021-08-04T12:19:12Z

For some reason if you use anything then extension function chunked to split files by threads it results in +20% working time. If anyone has an idea why it happens please write it down here.

SpirinEgor · 2021-08-05T13:45:36Z

src/main/kotlin/astminer/common/model/ParsingModel.kt


-    val normalizedToken: String by lazy {
+    val normalizedToken: String = run {


Why do you use run command? I think you can directly call originalToken?.let {}

SpirinEgor · 2021-08-05T13:46:24Z

src/main/kotlin/astminer/common/model/ParsingResultModel.kt


 private val logger = KotlinLogging.logger("HandlerFactory")
+private const val NUM_OF_THREADS = 16


It's better to use configure parameters. So users can easily change the number of threads.

SpirinEgor · 2021-08-05T13:51:07Z

src/main/kotlin/astminer/common/model/ParsingResultModel.kt

-    fun <T> parseFiles(files: List<File>, action: (ParsingResult<out Node>) -> T) =
+    fun <T> parseFiles(
+        files: List<File>,
+        progressBar: ProgressBar? = null,


I don't like passing the progress bar into this function. Maybe it's better to put an updating progress bar at the end of the action function, which is defined inside the pipeline. And pipeline is the right place for working with progress bar.

SpirinEgor · 2021-08-05T13:51:28Z

src/main/kotlin/astminer/common/model/ParsingResultModel.kt

+        return results
+    }
+
+    fun <T> parseFilesAsync(files: List<File>, action: (ParsingResult<out Node>) -> T): List<T?> {


It's multithread, not async.

SpirinEgor · 2021-08-05T13:53:48Z

src/main/kotlin/astminer/pipeline/Pipeline.kt

+                    parsingResultFactory.parseFilesAsync(files) { parseResult ->
+                        for (labeledResult in branch.process(parseResult)) {
+                            storage.store(labeledResult)
+                        }


AFAIK, you can update progress bar here

SpirinEgor · 2021-08-05T13:54:36Z

src/test/kotlin/astminer/pipeline/PipelineAsyncStressTest.kt

+import java.io.FileReader
+import kotlin.test.assertEquals
+
+class PipelineAsyncStressTest {


Once again, you use multithreading, not asynchronous exectution.

SpirinEgor · 2021-08-06T08:27:51Z

src/main/kotlin/astminer/common/model/ParsingResultModel.kt


        synchronized(results) {
-            files.chunked(files.size / NUM_OF_THREADS + 1).filter { it.isNotEmpty() }
+            files.chunked(files.size / numOfThreads + 1).filter { it.isNotEmpty() }


Let's use ceil(files.size / numOfThreads). This is more accurate (the case is taken into account if it is divided entirely), as well as more readable (here I had to hang up, why are you +1 doing)

SpirinEgor · 2021-08-06T08:31:21Z

src/main/kotlin/astminer/config/PipelineConfig.kt

@@ -14,5 +14,6 @@ data class PipelineConfig(
    val parser: ParserConfig,
    val filters: List<FilterConfig> = emptyList(),
    @SerialName("label") val labelExtractor: LabelExtractorConfig,
-    val storage: StorageConfig
+    val storage: StorageConfig,
+    val performance: PerformanceConfig = defaultPerformanceConfig


Should we use yet another config object? Maybe we can set the number of threads directly in this config?
Also, I suggest setting the number of threads nullable. Null means no thread at all.

illided added 4 commits July 30, 2021 19:08

status bar prototype added

d7f2978

some performance improvement

3177259

calculation heavy part moved from critical section

30f1c19

small improvements and rollback to normalized token

23bdd3d

illided marked this pull request as draft August 2, 2021 18:52

progress bar close added

c86986d

illided added 6 commits August 3, 2021 19:26

json AST stress tests added

9e36101

code style issues fixed

e4e44a3

file distribution fix

9fdc80e

normalization moved to critical section again

05c9bc8

lazy stayed in Node but look ahead calculation for antlr was added

306a6a3

code style fixes

a027ce3

illided added 3 commits August 5, 2021 15:31

path stress tests added

1699388

look ahead calculation removed because it had no effect

3bf52db

unused import removed

942ad49

illided marked this pull request as ready for review August 5, 2021 13:19

SpirinEgor suggested changes Aug 5, 2021

View reviewed changes

illided added 3 commits August 5, 2021 21:52

config and multiple tweaks added

8fbeb39

code style fixes

08048e6

performance config added

10cd055

SpirinEgor suggested changes Aug 6, 2021

View reviewed changes

illided added 3 commits August 6, 2021 13:29

some little improvements

e2000da

num of thread usage added to configs

6c8ba43

final new lines added

56b6929

illided requested a review from SpirinEgor August 6, 2021 11:02

SpirinEgor approved these changes Aug 6, 2021

View reviewed changes

SpirinEgor merged commit 7aefe8e into JetBrains-Research:master-dev Aug 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better performance and feedback #171

Better performance and feedback #171

illided commented Aug 2, 2021 •

edited

Loading

illided commented Aug 2, 2021

vovak commented Aug 2, 2021 via email

illided commented Aug 3, 2021

illided commented Aug 4, 2021

SpirinEgor Aug 5, 2021

SpirinEgor Aug 5, 2021

SpirinEgor Aug 5, 2021

SpirinEgor Aug 5, 2021

SpirinEgor Aug 5, 2021

SpirinEgor Aug 5, 2021

SpirinEgor Aug 6, 2021

SpirinEgor Aug 6, 2021


		val normalizedToken: String by lazy {
		val normalizedToken: String = run {


		private val logger = KotlinLogging.logger("HandlerFactory")
		private const val NUM_OF_THREADS = 16

Better performance and feedback #171

Better performance and feedback #171

Conversation

illided commented Aug 2, 2021 • edited Loading

illided commented Aug 2, 2021

vovak commented Aug 2, 2021 via email

illided commented Aug 3, 2021

illided commented Aug 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

illided commented Aug 2, 2021 •

edited

Loading