Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental formatter is slower than current formatter with large XML file #1368

Closed
angelozerr opened this issue Nov 14, 2022 · 7 comments · Fixed by #1372
Closed

Experimental formatter is slower than current formatter with large XML file #1368

angelozerr opened this issue Nov 14, 2022 · 7 comments · Fixed by #1372
Assignees
Labels
formatting This issue or enhancement is related to formatting support performance This issue or enhancement is related to performance concerns
Milestone

Comments

@angelozerr
Copy link
Contributor

angelozerr commented Nov 14, 2022

There are interesting large TEI XML files at https://www.wwp.northeastern.edu/outreach/seminars/_current/handouts/tei_samples/index.html

If you download

And you try to format the XML with experimental formatter, it takes some times although with the current formatter it is very q

We need to investigate the performance issue with this sample.

  • with experimental formatter, it takes around 1323ms.
  • with the current formatter, it takes around 58ms.

When grammar aware formatting is disabled, it takes around 148ms.

@angelozerr angelozerr added formatting This issue or enhancement is related to formatting support performance This issue or enhancement is related to performance concerns labels Nov 14, 2022
@angelozerr angelozerr changed the title Experimental formatter is slower than current formatter Experimental formatter is slower than current formatter with large XML file Nov 14, 2022
@fbricon
Copy link
Contributor

fbricon commented Nov 14, 2022

also throws some exceptions:

[Info  - 4:18:57 PM] Nov 14, 2022 04:18:57 org.eclipse.lemminx.XMLLanguageServer initialize()
Message: Initializing XML Language server
LemMinX Server info:
 - Version : 0.22.0
 - Java : /Users/fbricon/.sdkman/candidates/java/11.0.13-tem
 - VM Version : 11.0.13
 - Git 36acd60 - [maven-release-plugin] prepare release 0.22.0
[Error - 4:19:39 PM] Nov 14, 2022 04:19:39 org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument internalEntityDecl()
Message: Error while extracting information for the internal entity '%model.measureLike'
java.lang.ArrayIndexOutOfBoundsException: Index 2054 out of bounds for length 2048
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument$ScannedDTDEntityDecl.getEntityNameStartColumnNumber(CMDTDDocument.java:207)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument$ScannedDTDEntityDecl.createNameParameter(CMDTDDocument.java:167)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument$ScannedDTDEntityDecl.<init>(CMDTDDocument.java:130)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument$ScannedDTDEntityDecl.<init>(CMDTDDocument.java:116)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument.internalEntityDecl(CMDTDDocument.java:318)
	at org.apache.xerces.impl.XMLDTDScannerImpl.scanEntityDecl(Unknown Source)
	at org.apache.xerces.impl.XMLDTDScannerImpl.scanDecls(Unknown Source)
	at org.apache.xerces.impl.XMLDTDScannerImpl.scanDTDExternalSubset(Unknown Source)
	at org.apache.xerces.impl.dtd.XMLDTDLoader.loadGrammar(Unknown Source)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument.loadGrammar(CMDTDDocument.java:391)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDContentModelProvider.createCMDocument(CMDTDContentModelProvider.java:81)
	at org.eclipse.lemminx.extensions.contentmodel.model.ContentModelManager.findCMDocument(ContentModelManager.java:333)
	at org.eclipse.lemminx.extensions.contentmodel.model.ContentModelManager.findCMDocument(ContentModelManager.java:153)
	at org.eclipse.lemminx.extensions.contentmodel.model.ContentModelManager.findCMDocument(ContentModelManager.java:103)
	at org.eclipse.lemminx.extensions.contentmodel.model.ContentModelManager.findCMDocument(ContentModelManager.java:86)
	at org.eclipse.lemminx.extensions.contentmodel.participants.ContentModelHoverParticipant.onTag(ContentModelHoverParticipant.java:51)
	at org.eclipse.lemminx.services.XMLHover.getTagHover(XMLHover.java:119)
	at org.eclipse.lemminx.services.XMLHover.doHover(XMLHover.java:75)
	at org.eclipse.lemminx.services.XMLLanguageService.doHover(XMLLanguageService.java:168)
	at org.eclipse.lemminx.XMLTextDocumentService.lambda$hover$5(XMLTextDocumentService.java:261)
	at org.eclipse.lemminx.commons.ModelTextDocuments.lambda$computeModelAsync$0(ModelTextDocuments.java:118)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

[Error - 4:19:39 PM] Nov 14, 2022 04:19:39 org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument internalEntityDecl()
Message: Error while extracting information for the internal entity '%model.pPart.data_sequenceOptional'
java.lang.ArrayIndexOutOfBoundsException: Index 2053 out of bounds for length 2048
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument$ScannedDTDEntityDecl.getEntityNameStartColumnNumber(CMDTDDocument.java:207)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument$ScannedDTDEntityDecl.createNameParameter(CMDTDDocument.java:167)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument$ScannedDTDEntityDecl.<init>(CMDTDDocument.java:130)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument$ScannedDTDEntityDecl.<init>(CMDTDDocument.java:116)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument.internalEntityDecl(CMDTDDocument.java:318)
	at org.apache.xerces.impl.XMLDTDScannerImpl.scanEntityDecl(Unknown Source)
	at org.apache.xerces.impl.XMLDTDScannerImpl.scanDecls(Unknown Source)
	at org.apache.xerces.impl.XMLDTDScannerImpl.scanDTDExternalSubset(Unknown Source)
	at org.apache.xerces.impl.dtd.XMLDTDLoader.loadGrammar(Unknown Source)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDDocument.loadGrammar(CMDTDDocument.java:391)
	at org.eclipse.lemminx.extensions.dtd.contentmodel.CMDTDContentModelProvider.createCMDocument(CMDTDContentModelProvider.java:81)
	at org.eclipse.lemminx.extensions.contentmodel.model.ContentModelManager.findCMDocument(ContentModelManager.java:333)
	at org.eclipse.lemminx.extensions.contentmodel.model.ContentModelManager.findCMDocument(ContentModelManager.java:153)
	at org.eclipse.lemminx.extensions.contentmodel.model.ContentModelManager.findCMDocument(ContentModelManager.java:103)
	at org.eclipse.lemminx.extensions.contentmodel.model.ContentModelManager.findCMDocument(ContentModelManager.java:86)
	at org.eclipse.lemminx.extensions.contentmodel.participants.ContentModelHoverParticipant.onTag(ContentModelHoverParticipant.java:51)
	at org.eclipse.lemminx.services.XMLHover.getTagHover(XMLHover.java:119)
	at org.eclipse.lemminx.services.XMLHover.doHover(XMLHover.java:75)
	at org.eclipse.lemminx.services.XMLLanguageService.doHover(XMLLanguageService.java:168)
	at org.eclipse.lemminx.XMLTextDocumentService.lambda$hover$5(XMLTextDocumentService.java:261)
	at org.eclipse.lemminx.commons.ModelTextDocuments.lambda$computeModelAsync$0(ModelTextDocuments.java:118)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
	at java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:479)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

[Warn  - 4:19:43 PM] Nov 14, 2022 04:19:43 org.eclipse.lsp4j.jsonrpc.RemoteEndpoint handleCancellation()
Message: Unmatched cancel notification for request id 18

@fbricon
Copy link
Contributor

fbricon commented Nov 14, 2022

on my mac, with the experimental fomatter, I see:

Received response 'textDocument/formatting - (140)' in 600ms.

without grammar aware formatting:

Received response 'textDocument/formatting - (161)' in 230ms.

With the legacy formatter:

Received response 'textDocument/formatting - (150)' in 76ms.

Not as bad as on your machine, but still way worse than the old formatter

@angelozerr
Copy link
Contributor Author

The main problem comes from https://github.com/eclipse/lemminx/blob/09b3f498f2aa13f6a6608f8a60dfbd1bf797608c/org.eclipse.lemminx/src/main/java/org/eclipse/lemminx/extensions/contentmodel/participants/ContentModelFormatterParticipant.java#L55 which is for each element. We need to cache this result in a Map for instance initialized by experimental formatter.

@angelozerr
Copy link
Contributor Author

Thanks @fbricon to share your information! We need to fix your stack trace error + performance.

@datho7561
Copy link
Contributor

I'm interested in how much Jessica's PR helped with your performance issues, Angelo, since while it was a noticeable improvement for me, it wasn't a 5x improvement.

@angelozerr
Copy link
Contributor Author

I'm interested in how much Jessica's PR helped with your performance issues, Angelo, since while it was a noticeable improvement for me, it wasn't a 5x improvement.

Indeed we need to investigate how we could improve again the experimental formatter performance.

@datho7561
Copy link
Contributor

Okay. Since the main issue with the grammar aware formatting has been resolved, let's track any further investigation and improvements in #1379.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatting This issue or enhancement is related to formatting support performance This issue or enhancement is related to performance concerns
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants