Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(URGENT) Critical performance problem caused by recent refactoring #159

Open
ronaldtse opened this issue Jan 28, 2025 · 23 comments
Open

(URGENT) Critical performance problem caused by recent refactoring #159

ronaldtse opened this issue Jan 28, 2025 · 23 comments
Assignees
Labels
bug Something isn't working

Comments

@ronaldtse
Copy link
Contributor

ronaldtse commented Jan 28, 2025

As reported by @ReesePlews :

Previously, the 001-v5 document (sources/001-v5/document.adoc) compiled in around 1 hour.

Now the document takes over 3 hours without even any error message (and does not complete).

I suspect that the recent XML refactoring has introduced critical inefficiencies that is not working with big documents.

The delivery date for Plateau is Jan 30 and this should be addressed by then.

@ronaldtse ronaldtse added the bug Something isn't working label Jan 28, 2025
@github-project-automation github-project-automation bot moved this to 🆕 New in Metanorma Jan 28, 2025
@ronaldtse ronaldtse moved this from 🆕 New to 🌋 Urgent in Metanorma Jan 28, 2025
@ronaldtse
Copy link
Contributor Author

@kwkwan do you think this has anything to do with the recent xmi gem update?

@ronaldtse
Copy link
Contributor Author

The most recent docker release still works:

However, the local build is messed up. I only have metanorma-cli in my Gemfile, so using all released gems.

@ReesePlews
Copy link

thanks for checking this

Now the document takes over 3 hours without even any error message (and does not complete).

just to clarify here:
plateau document (sources/001-v5) took 3:30min to generate locally and it did complete.

@opoudjis
Copy link
Contributor

Since the entirety of mn-samples-plateau, INCLUDING 001-v5, ran last week with the gems of that metanorma release on docker in 48 mins

https://github.com/metanorma/metanorma-cli/actions/runs/12876911452/job/35964616845

the "caused by recent refactoring" ranges from unlikely to nonsensical as having anything to do with metanorma.

I am now leaving this running on my side to find out what is going on, but I had no issues at all my side last week. If this turns out not to do with metanorma but with lutaml.

I will report back on my findings.

@opoudjis
Copy link
Contributor

If this turns out to be lutaml (and if you're hearing nothing from metanorma, that is my suspicion), it would be useful for the metanorma lutaml plugin to introduce some sort of progress reporting, as I had done for AsciiMath processing...

@opoudjis
Copy link
Contributor

After 1.5 hours, I get:

asciidoctor: ERROR: <stdin>: line 18762: dropping cells from incomplete row detected end of table

That is still Asciidoctor processing, not Metanorma, which pretty much tells me straight that the delay is in Lutaml, since that is embedded within Asciidoc preprocessing. I'm going to restart and monitor metanorma-plugin-lutaml.

@opoudjis
Copy link
Contributor

The Lutaml preprocessors are all pretty fast:

LutamlPreprocessor START
LutamlPreprocessor END
LutamlUmlClassPreprocessor START
LutamlUmlClassPreprocessor END
LutamlEaXmiBase START
LutamlEaXmiBase END
LutamlEaXmiBase START
LutamlEaXmiBase END
LutamlUmlClassPreprocessor START
LutamlUmlClassPreprocessor END

Each instance of LutamlKlassTableBlockMacro is taking something like half a minute to run. Will leave it running for a half hour to see how far it gets....

@opoudjis
Copy link
Contributor

20 minutes in, it's chugging along, alternating runs of LutamlKlassTableBlockMacro and runs of LutamlGmlDictionaryBlockMacro. It has processed 150 such blocks so far.

I have left an warning in for when Metanorma processing starts, and I am going out in 10 mins, but I will leave it running for a few hours to see what is going on...

@opoudjis
Copy link
Contributor

30 minutes, 230 blocks processed. Whether or not it hangs, the slowdown is clearly in Lutaml.

I'll leave it running, though I will be out for a few hours...

@opoudjis
Copy link
Contributor

45 mins, 300 blocks. I'm going out...

@opoudjis
Copy link
Contributor

Got back, it finished doing Lutaml processing after 1.5 hours, and then it crashed because I had a bug when I wrote the warning for it starting Metanorma. I'm now restarting it, and I'll see what happens next.

I think that is a real slowdown compared to before in Lutaml that needs to be addressed, I just want to confirm that there isn't an issue downstream in Metanorma.

@opoudjis opoudjis assigned kwkwan and unassigned opoudjis Jan 28, 2025
@opoudjis
Copy link
Contributor

On my computer:

Lutaml took 1.5 hrs to run

Metanorma took 1 minute and compiled successfully.

I suspect that the recent XML refactoring has introduced critical inefficiencies that is not working with big documents.

Wrong. This is not my problem. It's @kwkwan 's.

@ReesePlews
Copy link

hello @opoudjis thank you for investigating this and providing the details. i will follow up with @ronaldtse and @kwkwan .

@ronaldtse
Copy link
Contributor Author

This may be a lutaml-model performance issue since that's the only change at the "xmi" gem: it migrated from Shale to lutaml-model.

Ping @HassanAkbar

@HassanAkbar
Copy link
Member

@ronaldtse I have investigated this performance issue thoroughly, and here are the findings:

  • It is confirmed that the issue comes after the migration of xmi gem to use lutaml-model instead of shale
  • The issue is with from_xml method of lutaml-model it is slow since we are first generating the doc hash and then deserializing it to the instance.

Here are the benchmarking results from a sample spec of xmi gem.

Benchmarking of from_xml method of lutaml-model:
Image

As you can see it takes ~3 seconds to deserialize xml from xmi.

  • Shale on the other hand shows much better performance and deserializes the XML in less than 1 second.
    Here are the benchmark results for Shale:
Image

Conclusion:
We need to optimize the performance of lutaml-model to de-serialize XML faster. We might have to remove the conversion to doc hash and de-serialize the adapter object directly to improve perfomance and I am currently looking into that.

@ronaldtse
Copy link
Contributor Author

ronaldtse commented Jan 29, 2025

Thanks @HassanAkbar for pinpointing the problem. I agree with the next steps on performance improvement by eliminating the doc_hash. My next question is: why are we calling to_xml? In this usage there is no XML generated through lutaml-model... what I mean is there could be another location of inefficiency that is not to_xml.

@ronaldtse
Copy link
Contributor Author

Technically, this document generation only uses "to_liquid" in the "xmi" gem. I wonder if there is a performance issue there as well. Ping @kwkwan

@ReesePlews
Copy link

hello @HassanAkbar thank you for looking into this.

i generated the document locally and it successfully completed but took 3 hrs and 40min to complete.

over the past two weeks the client and i were discussing and testing liquid template updates with @kwkwan for the sorting order of the autogenerated UML tables in the Plateau Project (mn-samples-plateau). late last week i requested the updated template PRs be merged to the main branch. i did not know there were other non-liquid updates being made. regularly the document took about 1hr 30min to run. there were updates to lutaml in Nov 2024 that solved some speed issues at that time. after that i believed the only changes were mostly to the liquid-templates which are document specific in the mn-samples-plateau repo. these templates are used in the autogenerated tables in sources/001-v5.

do you think there is some type of looping/evaluation issue in the liquid template that could be causing this slowdown? i believe the liquid templates look at the data coming from the xmi and considers it by row or possibly by cell in each table. with a couple hundred tables that could cause a slowdown if there was a problem with the templates.

however, the document compile successfully completes. the client is now checking if those sorting order updates to the UML tables are still correct.

not sure if this is helpful information or not. thanks for checking into this.

@HassanAkbar
Copy link
Member

My next question is: why are we calling to_xml? In this usage

@ronaldtse, actually the issue here is not with to_xml but with from_xml.
From what I figured out, the xmi gem reads the XML files and performs these steps while deserialization using from_xml:

  1. It reads XML to an adapter instance
  2. It makes a doc hash
  3. It makes an instance from doc hash

I am working to eliminate step 2 so that the instance is directly created from the adapter instance, improving performance and saving time.
Please let me know if I am missing something here or is there any other flow that we need to look at.

@ReesePlews
Copy link

hello @HassanAkbar is there anything to report on the status of this issue? do let us know when you have feedback to discuss. thank you for working on this.

@kwkwan
Copy link

kwkwan commented Feb 3, 2025

Technically, this document generation only uses "to_liquid" in the "xmi" gem. I wonder if there is a performance issue there as well. Ping @kwkwan

Hi @ronaldtse, the main differences between v0.3.11 and later of the gem xmi is just switched from shale to lutaml-model. I think there is not much area to improve the performance.

I compared the time consumed for running bundle exec rake. In v0.3.11, it took 10s. However, it took 1m in v0.3.13 (~ 5x slower). I think @HassanAkbar's founding is correct. The performance of from_xml in lutaml-model should be improved.

In the meanwhile, we can pin the version of the gem xmi to v0.3.11 in the the project mn-samples-plateau.

@kwkwan
Copy link

kwkwan commented Feb 3, 2025

FYI. It takes 33m to compile sources/001-v5/document.adoc after pinned the version to v0.3.11 of the gem xmi in my local machine.

@ReesePlews
Copy link

wow, that is a great improvement! trying to test now based on the info from the other issue. thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🌋 Urgent
Development

No branches or pull requests

5 participants