Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache on the file system, XML Schema from http, ftp before loading it #159

Closed
angelozerr opened this issue Oct 10, 2018 · 21 comments
Closed
Assignees
Labels
performance This issue or enhancement is related to performance concerns
Milestone

Comments

@angelozerr
Copy link
Contributor

There are again a big problem of performance (the first time) with some XML Schema (completion and validation can take 1 minute to load the XML Schema).

An example is the load of schema https://www.w3.org/2009/01/xml.xsd (with namespace http://www.w3.org/XML/1998/namespace.html). The consumming time is the http loading stream (not the tree creation structure of XML Schema). It seems that w3c website is slow when Java client tries to get the XML Schema?

The WTP XML Editor doesn't have this performance problem. After studying the code, it uses a cache strategy :

<extension
		point="org.eclipse.wst.xml.core.catalogContributions">
		<catalogContribution id="default">
			<uri
				name="http://www.w3.org/2001/XMLSchema"
				uri="platform:/plugin/org.eclipse.xsd/cache/www.w3.org/2001/XMLSchema.xsd" />
			<system
				systemId="http://www.w3.org/2001/xml.xsd"
				uri="platform:/plugin/org.eclipse.xsd/cache/www.w3.org/2001/xml.xsd"/>				
		</catalogContribution>

See https://github.com/eclipse/webtools.common/tree/master/plugins/org.eclipse.wst.internet.cache/src/org/eclipse/wst/internet/cache/internal

Foe the moment, user can define an XML catalog which defines xml.xsd and XMLSchema.xsd, but it should be better to provide CacheURIResolverExtension feature that which could be enabled/disabled with settings.

@angelozerr
Copy link
Contributor Author

@NikolasKomonen I'm working on this issue. I have commited just a new settings useCache, that you could start to integrate in vscode-xml. This settings is to activate cache.

@NikolasKomonen
Copy link
Contributor

@angelozerr alright great, ill check it out.

angelozerr added a commit that referenced this issue Oct 12, 2018
@angelozerr
Copy link
Contributor Author

angelozerr commented Oct 12, 2018

@NikolasKomonen I have committed my work, but it must be cleaned and impoved again.

You can see teh following demo which takes a lot of time to download the FIRST time all XML Schema:

cacheloading

You can see in the hover (and completion) which XMl Schema is loading. I'm not sure it's a good idea?

@fbricon @NikolasKomonen any feedback are welcome. @fbricon my code is very not cleaned, I must do it, but I wanted push something in order we can discuss about this feature.

Once all XML Schema is loaded, it is tored in the .lsp4xml/cache folder. When you stop the XML Language Server and restart it, XML Schema loading is very fast (immediatly). The time is to download XMl Schema from http or ftp not to build it.

IMHO I think we should have a kind monitoring which does that. Never done and I fear it's an hard thing.

@fbricon
Copy link
Contributor

fbricon commented Oct 12, 2018

  • I haven't checked the code but, how do you ensure 2 running lsp4xml processes won't be writing on the same files concurrently?
  • It seems cache is disabled by default. It should be on by default.
  • cache directory should be configurable
  • I get absolutely nothing (no cached schema, no hover doc, no completion) when editing an xsd file like:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:complexType name="property">
        <xs:attribute name="name" type="xs:string" />
        <xs:attribute name="value" type="xs:string" />
    </xs:complexType>
    
    <xs:complexType name="resource">
        <xs:sequence>
            <xs:element name="property" type="property" minOccurs="0" maxOccurs="unbounded" />
        </xs:sequence>
        <xs:attribute name="name" type="xs:string" use="required" />
    </xs:complexType>

    <xs:element name="resources">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="resource" type="resource" minOccurs="0" maxOccurs="unbounded" />
            </xs:sequence>
            <xs:attribute name="variant" type="xs:string" use="required"/>
        </xs:complexType>
    </xs:element>
</xs:schema>

@fbricon
Copy link
Contributor

fbricon commented Oct 12, 2018

I think lsp4xml should embed some default schemas, to avoid that initial performance penalty for, say, xsd or xsl documents

@angelozerr
Copy link
Contributor Author

I haven't checked the code but, how do you ensure 2 running lsp4xml processes won't be writing on the same files concurrently?

It should work because before deplyoying XML Schema, DTD file in cache, a check is done if resources is loading (with resourcesLoading list) at https://github.com/angelozerr/lsp4xml/blob/master/org.eclipse.lsp4xml/src/main/java/org/eclipse/lsp4xml/uriresolver/CacheResourcesManager.java#L42

In this case, the exception CacheResourceLoadingException is thrown.

  • It seems cache is disabled by default. It should be on by default.

I agree with you, but as I'm not confident about this cache feature, I would like to disable it for the moment. I think we must wait for feedback users to enable it by default. My fear is about proxy user that I have not managed.

  • cache directory should be configurable

Yes sure, I will see that.

  • I get absolutely nothing (no cached schema, no hover doc, no completion) when editing an xsd file like:

Cache is done only with files coming from http or ftp. No need to redeploy an XML Schema or DTD file coming from the file system. Loading in the demo, means "downloading" XML Schema, DTD, perhaps we should replace "loading" by "downloading"?

I think lsp4xml should embed some default schemas, to avoid that initial performance penalty for, say, xsd or xsl documents

WTP does that with https://github.com/eclipse/webtools.sourceediting/tree/master/xml/bundles/org.eclipse.wst.standard.schemas/xsd But I'm surprised that it doesn't include xml.xsd which takes so many time to download it.

I think it's an another issue, please create it.

@angelozerr angelozerr added the performance This issue or enhancement is related to performance concerns label Oct 15, 2018
@fbricon
Copy link
Contributor

fbricon commented Oct 15, 2018

It should work because before deplyoying XML Schema, DTD file in cache, a check is done if resources is loading (with resourcesLoadinglist) at https://github.com/angelozerr/lsp4xml/blob/master/org.eclipse.lsp4xml/src/main/java/org/eclipse/lsp4xml/uriresolver/CacheResourcesManager.java#L42

That's if 2 threads from the same process try to load the same resource. My concern is about 2 different processes, i.e 2 JVMs trying to cache the same resource in a common directory. Think 2 instances of VS Code running.

  • It seems cache is disabled by default. It should be on by default.

I agree with you, but as I'm not confident about this cache feature, I would like to disable it for the moment. I think we must wait for feedback users to enable it by default. My fear is about proxy user that I have not managed.

Still think it's better to have it enabled by default. For one, we'll get faster feedback, second, this doesn't prevent proxy issues, since the resources would still need to be downloaded anyway. On the contrary, if proxy was an issue, having a cache version of the schemas would allow users to work more efficiently when proxy is on.

Cache is done only with files coming from httpor ftp. No need to redeploy an XML Schema or DTD file coming from the file system. Loading in the demo, means "downloading" XML Schema, DTD, perhaps we should replace "loading" by "downloading"?

I see no schema being cached for:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified"
           elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           xsi:schemaLocation="http://www.w3.org/2001/XMLSchema
                               http://www.w3.org/2001/XMLSchema.xsd">
</xs:schema>

The Maven schema's not cached either

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
</project>

@angelozerr
Copy link
Contributor Author

That's if 2 threads from the same process try to load the same resource. My concern is about 2 different processes, i.e 2 JVMs trying to cache the same resource in a common directory. Think 2 instances of VS Code running.

Good catch! How to manage that?

Still think it's better to have it enabled by default. For one, we'll get faster feedback, second, this doesn't prevent proxy issues, since the resources would still need to be downloaded anyway. On the contrary, if proxy was an issue, having a cache version of the schemas would allow users to work more efficiently when proxy is on.

Good catch again :) My fear was I have had implemented that quickly. I prefer enable the cache when all issues (like 2 processes case) will be fixed.

I see no schema being cached for:

Even if cache is enabled? If you have activated cache, is it possible to debug it?

@fbricon
Copy link
Contributor

fbricon commented Oct 15, 2018

Damn, that was because I had to manually enable the server cache in my worspace preferences. We definitely need it on by default

@fbricon
Copy link
Contributor

fbricon commented Oct 15, 2018

That's if 2 threads from the same process try to load the same resource. My concern is about 2 different processes, i.e 2 JVMs trying to cache the same resource in a common directory. Think 2 instances of VS Code running.

Good catch! How to manage that?

In Maven, aether adds a file marker for other processes to know there's a download in progress.
You could also download in a tmp file and rename/move it to the its final destination once download is complete

@apupier
Copy link
Contributor

apupier commented Oct 16, 2018

In Maven, aether adds a file marker for other processes to know there's a download in progress.

I'm surprised that Aether is taking care of it. I thought that it was the reason the workspace got corrupted when several jobs were running in parallel. See https://github.com/eclipse/aether-core/pull/4

angelozerr added a commit that referenced this issue Oct 19, 2018
validation is done and XML Schema is downloading (see #159)
@angelozerr
Copy link
Contributor Author

@fbricon @apupier I have improved the cache feature:

  • I'm using file temp to download XML Schema/DTD
  • I mark the XML document element as warning to tell that the XML Schema takes too time to download it. The validation is redone once the XML Schema is reload it.

As it's a complex feature and I'm not sure that there are bugs, I have not enabled the cache by default. Please play with this feature with web.xml for instance which requires several XMl Schema which takes time to download. Remove the cache folder at hand, and retry it, etc.

I'm waiting for your feedback to enable the cache by default.

@fbricon fbricon added this to the v0.0.2 milestone Oct 28, 2018
@fbricon
Copy link
Contributor

fbricon commented Oct 28, 2018

I've tried opening 2 web.xml files simultaneously from the same vscode instance (with useCache turned on), each containing an error. I didn't see any corrupt files. The only weird thing is that only one of them got a diagnostic stating schema was being downloaded. That was once validated once downloads were completed. The other one was showing the download in progress message on hover, but no validation once everything was complete. Modifying the file triggered proper validation.

Still think it's valuable to get the cache turned on by default.

@fbricon
Copy link
Contributor

fbricon commented Oct 28, 2018

While the warning in hover brings a better UX, the warning as completion item is a bad idea. It makes it insertable and you end up with <Cannot process XML Schema completion: The resource 'http://www.w3.org/2001/xml.xsd' is downloading. in the file, if you press tab or enter. The completion query should just wait until ready to respond. The client (at least vscode) will display a loading message until then.

@angelozerr
Copy link
Contributor Author

the warning as completion item is a bad idea

Fixed with 8348cc6

angelozerr added a commit that referenced this issue Oct 29, 2018
@angelozerr
Copy link
Contributor Author

@fbricon can we close this issue?

@fbricon
Copy link
Contributor

fbricon commented Nov 2, 2018

The only thing left is to ensure the default useCache value is true. When the json settings are deserialized into java, no value is turned into false, which overrides the value set in the constructor.

@angelozerr
Copy link
Contributor Author

Change useCache to Boolean should work, no?

@angelozerr
Copy link
Contributor Author

The only thing left is to ensure the default useCache value is true. When the json settings are deserialized into java, no value is turned into false, which overrides the value set in the constructor.

77a8011 should fix that

@fbricon
Copy link
Contributor

fbricon commented Nov 2, 2018

Ok that works now. Thanks!

@fbricon fbricon closed this as completed Nov 2, 2018
@angelozerr
Copy link
Contributor Author

Ok that works now. Thanks!

Thank's @fbricon for your great feedback, I think we start having a robust cache feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance This issue or enhancement is related to performance concerns
Projects
None yet
Development

No branches or pull requests

4 participants