Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NC keys: When copying page nested content does not update properly #5913

Closed
deMD opened this issue Jul 16, 2019 · 55 comments
Closed

NC keys: When copying page nested content does not update properly #5913

deMD opened this issue Jul 16, 2019 · 55 comments
Labels

Comments

@deMD
Copy link
Contributor

deMD commented Jul 16, 2019

This issue description has been updated with a propsed solution and has been marked as up for grabs. Please read below about the original issue and the proposed solution. You can also scroll way down to read the entire discussion and history.

Proposed solution

Taking some inspiration from the media tracking branch and looking at what we already have in core:

  • Nested content (and any other property editor) can handle the IDataValueEditor.FromEditor to re-populate any missing keys. Nested content already overrides this method so it would be just adding the logic to check for empty keys. Empty keys may be sent up if the client side clears them when copying them with the clipboard service. Nested content calls FromEditor recursively so if there's an NC inside an NC this will still work.
  • Create a new interface: IDataValueCopying which can be implemented by any IDataValueEditor (like like IDataValueReference for media tracking). It could have one method like: object Copy(object input);
  • We create an external component to handle ContentService.Copying, the event handler will use the current IContent along with the PropertyEditorCollection to figure out if any of the IDataValueEditors being used for any of the properties implement IDataValueCopying and if so, will invoke the Copy method and set the new property value to the result.
  • the NestedContentPropertyValueEditor then implements this interface and it will have to deal with having nested nested contents but that shouldn't be too difficult.

Original submitted issue

When we copy pages with Nested Content and edit a published element the copied page still shows values from the original content. For example an image block where we changed the image still gives us the original image when checking the strongly typed model. The source value of the property does update properly, as demonstrated in the image below:

image

Reproduction

  • Create a page with nested content
  • copy the page
  • change a property of a block in the nested content
  • check the return values.

This bug appears in Umbraco version 8.0.2


This item has been added to our backlog AB#6684

@kjac
Copy link
Contributor

kjac commented Jul 16, 2019

@deMD I'm not sure I follow. Would you expect the elements on the copy to update automatically when you update elements on the original?

@deMD
Copy link
Contributor Author

deMD commented Jul 16, 2019

No.

Here is a screenshot of the current environment:
image

As you can see you see one image a lot of times, however, that is the image of the original NestedContent image block. In all the other pages they changed the image, but as you can see the original image is used.

This seems to only happen when resolving the strongly typed model, when checking the source value of two different blocks you clearly see them referencing different images. Which you can also see in the backoffice:

original:
image

copy:
image

@deMD deMD closed this as completed Jul 16, 2019
@deMD deMD reopened this Jul 16, 2019
@deMD
Copy link
Contributor Author

deMD commented Jul 16, 2019

Oops did not mean to close the issue, sorry!

@deMD
Copy link
Contributor Author

deMD commented Jul 16, 2019

Update on this issue, just to make it even stranger. This problem seems to only occur when fetching multiple items:

var docs = _publishedContentQuery.Content(ids);

When getting just a single item var docs = _publishedContentQuery.Content(id); I do get the correct image.

And when debugging, if I first resolve the third item in my watch window, I get that image in all copies + original

I made a video where you can see what happens: https://youtu.be/_duHGDfH8o0

@stevemegson
Copy link
Contributor

The problem is that the copied items have the same key. When you access the Image property, the value is cached with a key based on the element's key and the property alias, so each of the pages ends up looking at the same cache key. Whichever page you access first will populate the cache, then you'll get that cached value back for all other pages.

The cache is per-request, so it's fine if you only fetch a single item per request.

@deMD
Copy link
Contributor Author

deMD commented Jul 16, 2019

Yeah seems like that is the cause. Pretty nasty though that that happens for elements in nested content.

@ronaldbarendse
Copy link
Contributor

This probably needs a fix in the caching layer, as making the keys unique (on copy) would require support for every property editor that uses IPublishedElements.

As the example shows and @stevemegson explains, this bug will only surface when nested elements are fetched (and converted to an object/instance) from copied content within the same request.

@kjac
Copy link
Contributor

kjac commented Jul 19, 2019

I can reproduce this as well. I'll have a look into it.

@kjac
Copy link
Contributor

kjac commented Jul 19, 2019

PR in #5961

@deMD
Copy link
Contributor Author

deMD commented Oct 1, 2019

Is the PR for this fix still in view to be reviewed?

@Baasker
Copy link

Baasker commented Oct 1, 2019

We also like to get any update on this issue, since we also still struggling with it.

@stevemegson
Copy link
Contributor

I don't think we ever reached a conclusion on how this should be fixed. I believe the suggestions were:

  1. Ignore the key stored in the JSON and create the nested elements with a new random key every time. This gives the correct caching behaviour for properties on the nested elements, but the Key property of the nested elements becomes fairly meaningless since it changes any time the parent element is re-cached.

  2. Have PublishedElementPropertyBase.GetCacheLevels never return Snapshot. This gives less caching than we ideally want. If the same nested element is accessed multiple times in the same request, its property values won't be cached.

  3. Modify the implementation of PublishedElementPropertyBase.ValuesCacheKey, so we get different cache keys without having to mess with the Key property. This seems like the "correct" solution, but the problem is that a PublishedElement has no idea which other element(s) it is nested in, so it can't really create a better cache key without some significant changes.

Thinking about it again now, ConvertToElement does have access to the parent element (or at least it could easily be given access). Rather than generating a new random Guid as the key for each nested element, it could xor owner.Key and key to produce a new Guid. Like option 1, we'd have keys which don't match those in the JSON, but they wouldn't change unpredictably.

@ronaldbarendse
Copy link
Contributor

ronaldbarendse commented Oct 1, 2019

IMHO option 3 is the only real fix. Combining 2 GUIDs into 1 (by XORing the bytes) might still result in duplicate IDs, make debugging harder and introduce other bugs/unexpected outcomes (see #5961 (comment)).

Passing/setting a cache prefix would probably be the easiest change, although that would probably be a breaking one and should preferably be done before 8.2.0 is released!

@stevemegson
Copy link
Contributor

Why would XORing two GUIDs make duplicate IDs more likely?

The point about sites relying on the key to identify a nested element is a fair one, but in that case they may already be broken by duplicate keys. If we have a choice between always exposing the key that's in the JSON or exposing some other GUID which avoids collisions, I'm not sure which is better.

I agree that option 3 is the best fix in theory, but I'm not sure what it looks like in practice.

@JoseMarcenaro
Copy link
Contributor

More info on this same issue:
When a Nested Content property is part of a Composition, then every instance of the same composed type gets the same cached value. No matter whether you duplicated existing items or used the Create... action to create them individually.

That makes the bug more frequent than the "duplicate" case.
Any suggestions on how to work around this?

@ronaldbarendse
Copy link
Contributor

@JoseMarcenaro If the NC property is part of a composition that is added to a Document Type, the data is stored on the content node of that Document Type (not the composition and therefore not shared between other nodes), so I'm not sure how that would result in duplicate element keys...

When using Content Templates however, you can add default NC elements that have the same key when new nodes are created using the template. This is basically the same as copying an existing node, but another way to have duplicate keys nonetheless.

@JoseMarcenaro
Copy link
Contributor

@ronaldbarendse - thanks for your thoughts.
Because I'm experiencing exactly the same symptoms in a Composition / Nested Content situation, maybe I rushed into assuming it is the same problem. I will try to document the steps to reproduce the problem, and open a separate issue.

@JoseMarcenaro
Copy link
Contributor

@ronaldbarendse - after some tests, I conclude you are totally right.
The reason I was having the same issue is that the items had been duplicated too - no relation with the Nested Content or Composition structure.
So disregard my previous comments on this ticket.
Thanks!

@stevemegson
Copy link
Contributor

While we decide on a permanent solution, this is a possible workaround.

It replaces the standard property value converters for nested content with versions which disable caching. That will obviously have a performance impact if you're making heavy use of nested content, but I think it'll be acceptable as a temporary solution in most cases.

@Shazwazza
Copy link
Contributor

Hi all, there's a lot of information both here and in the original issue #5961. For now, I'll keep the conversation here.

IMO the GUIDs for each element should be unique since that is what we would expect it to be. They should be both unique and not change per element, for example, if one day we can persist elements to the DB, they will have their own unique GUID that won't change. Even though we are currently storing this data in JSON, it should be the same principal. I'm not a fan of having new random keys or non-consistent keys per request. Each of the stored keys should just be unique and consistent.

To solve this issue a custom event handler for ContentService.Copying will be needed to handle this type of situation which will re-generate new GUIDs per element.

I do understand there's a complexity here. Currently nested content is the only editor that has unique keys per element that we'd need to take into account but the challenge is that nested content can be nested within other complex editors. Originally i was thinking of just being pragmatic and creating a ContentService.Copying handler within the nested content property editor and just re-generating the keys for each element. This would be quite easy to do but doesn't take into account situations like if there is nested content inside the grid or some other nested editor.

It might be possible to detect nested content json within a json structure, and check if there's a GUID stored for any key and re-generate that as a more generic solution. I think this is a bit ugly but could work for the interim. Else we would just need handlers for the grid and nested content to deal with detecting nested content items and handle the key regeneration. I don't think this would be too difficult and would solve the issue for most cases. I understand it doesn't solve the problem if NC is nested in 3rd party complex editors, but until we streamline how elements are stored and how complex editors are supposed to work (i.e. that is coming with the standardization of the block editor) we can't magically just make this work 100% of the time but i think we can easily solve it for the majority. For the edge cases we can have some re-usable methods for devs to use to handle ContentService.Copying in case they support having NC nested inside their own complex editors.

What do you think?

@kjac
Copy link
Contributor

kjac commented Dec 23, 2019

@Shazwazza sounds like the most pragmatic approach for the time being. It might have a slight performance hit for bulk copy operations though. Also it needs to be able to handle Nested Content within Nested Content, as this is a quite common usecase.

Also this has to be handled client side as well, since we can copy elements between content using the clipboard service. I believe Niels L already did some work towards achieving this.

On the server side, a different approach could be to introduce a "you're being copied" event system on property level and let Nested Content and the like do their own thing. This would probably scale better for 3rd party editors. But it would probably also end up being a breaking change.

@clausjensen clausjensen added the state/sprint-candidate We're trying to get this in a sprint at HQ in the next few weeks label May 14, 2020
@azure-devops-sync azure-devops-sync bot changed the title When copying page nested content does not update properly NC keys: When copying page nested content does not update properly May 14, 2020
@mjbarlow
Copy link
Contributor

I agree with @enkelmedia would be great for Core to prioritise this fix in-house especially since @Shazwazza has a great handle on what needs to be done.

I've noticed the same issue when trying to copy nested content on one of my projects. Happy to offer my own Angular JS skills to help out. btw, you guys at HQ are doing a great job!

@madsoulswe
Copy link
Contributor

madsoulswe commented May 17, 2020

What has been done so far? It's along thread.

  • Does the Contentservice handle the issue.
  • Does the ClipboardService handle the issue.

If they have been fixed, I will make a patch for all of our clients to fix all duplicates.

@creativesuspects
Copy link

creativesuspects commented Jun 2, 2020

@madsoulswe I think the ClipboardService has been fixed in #7166. And I believe #8177 also resolves duplicate key issues when copying nodes that contain nested content. Both PR's were included in the 8.6.2 release.

However, this still leaves us with duplicate keys that were saved before these fixes were applied. If you come up with an idea to reset the nested content keys for all existing content, please let me know. I'm not concerned about Deploy/uSync/etc. scenarios. I wonder if it's even possible to reset keys when these scenarios are involved.

@creativesuspects
Copy link

I’ve found a way to replace all existing keys in the umbracoPropertyData table for all nested content data types and only for current content versions, using the logic that was included in #8177. But these changes are only reflected after each node is republished. Where does Umbraco fetch its property data from when loading a page? Is it the cmsContentNu table? Is there an easy way to rebuild the cache after manually updating values in the umbracoPropertyData table without having to republish the affected nodes.

@nul800sebastiaan nul800sebastiaan removed community/up-for-grabs state/sprint-candidate We're trying to get this in a sprint at HQ in the next few weeks labels Jun 3, 2020
@rbottema
Copy link
Contributor

rbottema commented Jun 3, 2020

@creativesuspects Can you share the code you used to update the keys? You can rebuild the cmsContentNu table from Umbraco: Go to Settings -> Published Status and click to Rebuild Database Cache. You probably also need to Rebuild Memory Cache afterwards for the values to be reflected in the site cache.

@nul800sebastiaan
Copy link
Member

@creativesuspects It's getting it from (predictably) UmbracoPropertyData 😉

I don't think there's a great way to fix the keys at the moment, other than going through each row in UmbracoPropertyData, find the data for the NC datatypes, deserialize it, give everything a unique key and save it back into this table. Then you need to regenerate NuCache from that, which should probably work like @rbottema said, but I'm entirely sure it's a full rebuild, but I believe it should be.

@creativesuspects
Copy link

creativesuspects commented Jun 3, 2020

@nul800sebastiaan That's exactly what I did:

Fetch all records from umbracoPropertyData that contain nested content data types for current content version. And then loop through each record and replace the keys:

using (var scope = _scopeProvider.CreateScope(autoComplete: true))
{
    var selectSql = scope.SqlContext.Sql("SELECT id, textValue FROM umbracoPropertyData WHERE versionId IN (SELECT [id] FROM [umbraco8].[dbo].[umbracoContentVersion] WHERE [current] = 1) AND propertyTypeId IN (SELECT [id] FROM [umbraco8].[dbo].[cmsPropertyType] WHERE dataTypeId in (SELECT [nodeId] FROM [umbraco8].[dbo].[umbracoDataType] WHERE propertyEditorAlias = 'Umbraco.NestedContent'))");
    var records = scope.Database.Fetch<dynamic>(selectSql);

    foreach (var record in records)
    {
        var r = (PocoExpando)record;
        var id = Convert.ToInt32(r.Values.First());
        var text = r.Values.Last().ToString();

        if (text.HasValue())
        {
            var updatedText = CreateNestedContentKeys(text, false);
            scope.Database.Execute("UPDATE umbracoPropertyData SET textValue = @0 WHERE id = @1", new object[] { updatedText, id });
        }
    }
}

The logic for CreateNestedContentKeys(string rawJson, bool onlyMissingKeys, Func<Guid> createGuid = null) is copied from the #8177 PR. The 2nd parameter is set to false to make sure all keys are replaced, not just the empty ones. And this works like a charm. The nested content keys in umbracoPropertyData are being replaced.

However, the cmsContentNu table still contains the old property collection for each affected page. And regenerating the NuCache doesn't seem to do anything. I've also tried clearing out /App_Data/TEMP/ followed by an application pool recycle and what not. For some reason the new property data only becomes visible when republishing the affected page.

Am I missing something?

Edit: It was late last night / this morning. So I'm going to retrace my steps later today to double check that rebuilding database and memory cache actually doesn't work in this case.

@nul800sebastiaan
Copy link
Member

After rebuilding NuCache (from the dashboard in the developer section), you still need to read the new cache into memory, I could only make that happen with an app pool recycle.

Your mistake is that you're using [current] = 1, which means the version that is being edited right now and is not published yet.
I think you can just leave off that part of the query and update all of the keys, as far as I understand it doesn't matter exactly what the key is as long as it's unique, right?

@nul800sebastiaan
Copy link
Member

This is the SQL that gives you all of the results for all versions:

SELECT id, textValue FROM umbracoPropertyData WHERE propertyTypeId IN (
	SELECT id FROM cmsPropertyType WHERE dataTypeId IN (
		SELECT nodeId FROM umbracoDataType WHERE propertyEditorAlias = 'Umbraco.NestedContent'
	)
)

@creativesuspects
Copy link

creativesuspects commented Jun 3, 2020

I think you can just leave off that part of the query and update all of the keys, as far as I understand it doesn't matter exactly what the key is as long as it's unique, right?

Yep, as long as each key is unique it's fine. I will give that a try and let you know what happens.

@creativesuspects
Copy link

creativesuspects commented Jun 3, 2020

@nul800sebastiaan @rbottema Okay, so that seems to work. Also, I just realized I had previously been reloading Memory Cache before reloading the Database Cache, so that might have been an issue as well. Like I said, it was very late.

I have only tested it on a very small site with only a few content nodes, but I'm going to do some more testing on a larger site with multiple language variants to make sure it works.

So to sum it all up, here's what I did:

  1. I've confirmed that there were duplicate nested content keys in the database. These nested content values are showing the same output when loaded onto a page even though the content is different. But obviously that's because Umbraco caches the content based on the keys which are duplicates.
  2. I've created a simple API controller that replaces all the nested content keys in the umbracoPropertyData table using the logic from PR Adds in NC Property Component to deal with keys that were duplicated #8177, see code below. So you'd only have to execute this once: /umbraco/backoffice/api/ResetNestedContentKeys/GetItDone
  3. Settings -> Published Status -> Rebuild Database Cache
  4. Settings -> Published Status -> Rebuild Memory Cache
  5. Reload affected page and you should see unique content for each nested content item
using System;
using System.Linq;
using System.Web.Mvc;
using Newtonsoft.Json.Linq;
using NPoco;
using Umbraco.Core;
using Umbraco.Core.Persistence;
using Umbraco.Core.Scoping;
using Umbraco.Web.WebApi;

namespace Application.Core.Controllers
{
    public class ResetNestedContentKeysController : UmbracoAuthorizedApiController
    {
        private readonly IScopeProvider _scopeProvider;

        public ResetNestedContentKeysController(IScopeProvider scopeProvider)
        {
            _scopeProvider = scopeProvider;
        }

        [HttpGet]
        public HttpStatusCodeResult GetItDone()
        {
            using (var scope = _scopeProvider.CreateScope(autoComplete: true))
            {
                var selectSql = scope.SqlContext.Sql(@"
                    SELECT id, textValue FROM umbracoPropertyData WHERE propertyTypeId IN (
                        SELECT id FROM cmsPropertyType WHERE dataTypeId IN (
                            SELECT nodeId FROM umbracoDataType WHERE propertyEditorAlias = 'Umbraco.NestedContent'
                        )
                    )
                ");
                var records = scope.Database.Fetch<dynamic>(selectSql);

                foreach (var record in records)
                {
                    var r = (PocoExpando)record;
                    var id = Convert.ToInt32(r.Values.First());
                    var text = r.Values.Last().ToString();

                    if (!string.IsNullOrEmpty(text))
                    {
                        var updatedText = CreateNestedContentKeys(text, false);
                        scope.Database.Execute("UPDATE umbracoPropertyData SET textValue = @0 WHERE id = @1", new object[] { updatedText, id });
                    }
                }
            }

            return new HttpStatusCodeResult(200);
        }

        private string CreateNestedContentKeys(string rawJson, bool onlyMissingKeys, Func<Guid> createGuid = null)
        {
            // used so we can test nicely
            if (createGuid == null)
                createGuid = () => Guid.NewGuid();

            if (string.IsNullOrWhiteSpace(rawJson) || !rawJson.DetectIsJson())
                return rawJson;

            // Parse JSON
            var complexEditorValue = JToken.Parse(rawJson);

            UpdateNestedContentKeysRecursively(complexEditorValue, onlyMissingKeys, createGuid);

            return complexEditorValue.ToString();
        }

        private void UpdateNestedContentKeysRecursively(JToken json, bool onlyMissingKeys, Func<Guid> createGuid)
        {
            // check if this is NC
            var isNestedContent = json.SelectTokens($"$..['ncContentTypeAlias']", false).Any();

            // select all values (flatten)
            var allProperties = json.SelectTokens("$..*").OfType<JValue>().Select(x => x.Parent as JProperty).WhereNotNull().ToList();
            foreach (var prop in allProperties)
            {
                if (prop.Name == "ncContentTypeAlias")
                {
                    // get it's sibling 'key' property
                    var ncKeyVal = prop.Parent["key"] as JValue;
                    // TODO: This bool seems odd, if the key is null, shouldn't we fill it in regardless of onlyMissingKeys?
                    if ((onlyMissingKeys && ncKeyVal == null) || (!onlyMissingKeys && ncKeyVal != null))
                    {
                        // create or replace
                        prop.Parent["key"] = createGuid().ToString();
                    }
                }
                else if (!isNestedContent || prop.Name != "key")
                {
                    // this is an arbitrary property that could contain a nested complex editor
                    var propVal = prop.Value?.ToString();
                    // check if this might contain a nested NC
                    if (!propVal.IsNullOrWhiteSpace() && propVal.DetectIsJson() && propVal.InvariantContains("ncContentTypeAlias"))
                    {
                        // recurse
                        var parsed = JToken.Parse(propVal);
                        UpdateNestedContentKeysRecursively(parsed, onlyMissingKeys, createGuid);
                        // set the value to the updated one
                        prop.Value = parsed.ToString();
                    }
                }
            }
        }
    }
}

@nul800sebastiaan
Copy link
Member

I just realized I had previously been reloading Memory Cache before reloading the Database Cache, so that might have been an issue as well

Duh me too, the right way around works indeed 👍

@creativesuspects
Copy link

@nul800sebastiaan So are you sure we couldn't include the WHERE [current] = 1 from my original query for performance? On the other hand, it's just a one-time fix, so performance might not be a huge deal.

@nul800sebastiaan
Copy link
Member

It's not going to perform well in any case on large sites, but with a lot of version it will be pretty slow. I am not sure how to know which version is actually published, but current is the one that is currently being edited. Whenever you publish a version, a new version gets created and marked as current.

Aha, hang on you can use this one instead:

SELECT id, textValue FROM umbracoPropertyData WHERE versionId IN (
 SELECT id FROM umbracoDocumentVersion WHERE published = 1
) 
AND propertyTypeId IN (
 SELECT id FROM cmsPropertyType WHERE dataTypeId IN (
  SELECT nodeId FROM umbracoDataType WHERE propertyEditorAlias = 'Umbraco.NestedContent'
 )
)

umbracoDocumentVersion knows which one is currently published so you wont need to update all the versions.

Ah but you'll still need to update [current] = 1 as well because that's the one the backoffice will display! So when you don't update that one and they hit the publish button it might not have unique keys.

@nul800sebastiaan
Copy link
Member

This seems to do the trick for both:

SELECT id, textValue FROM umbracoPropertyData WHERE (versionId IN (
 SELECT id FROM umbracoDocumentVersion WHERE published = 1
) OR versionId IN (
  SELECT id FROM umbracoContentVersion WHERE [current] = 1
 ))
AND propertyTypeId IN (
 SELECT id FROM cmsPropertyType WHERE dataTypeId IN (
  SELECT nodeId FROM umbracoDataType WHERE propertyEditorAlias = 'Umbraco.NestedContent'
 )
)

@creativesuspects
Copy link

On second thought it's probably best to just replace ALL records in case someone decides to do a rollback on a page, because that would also bring back those duplicate keys.

@nul800sebastiaan
Copy link
Member

That's true!

@creativesuspects
Copy link

Again, I haven't tested this fix on a large(r) site. But do you have any idea if it could be applied to a site with hundreds or thousands of content nodes, or would that just take forever?

@nul800sebastiaan
Copy link
Member

You can let me know! ;-)

@creativesuspects
Copy link

creativesuspects commented Jun 3, 2020

Just tested it on a relatively new site which heavily depends on nested content with a few hundred content nodes and two language variants: 3398 records updated in 21.430 seconds.

It's still not a huge site, but the fix executed fairly quickly. Of course when you have a lot more content nodes and a lot more versions per content node running the fix may eventually become a problem.

Note 1: I've been testing on a remote SQL server instance, so there's probably some network latency involved as well.

Note 2: I've just tested it on a couple of production sites (SQL server instance and IIS are on the same server):

3392 records updated in 3.444 seconds
2033 records updated in 1.449 seconds
2286 records updated in 3.506 seconds

@kimschurmann
Copy link

So this issue has been open for almost a year and still no solution? :(

If the copy-paste function does not work properly, then why not disable it? Editors will be very confused with this experience?

Please do not tamper with the Guids for the nested block elements - we really need them on our project as a reference!

@nul800sebastiaan
Copy link
Member

@kimschurmann As of v8.6.2 this works now, with one caveat, something went wrong with the release and there's a manual fix for that available at the moment: #8223 - that will be fixed in 8.6.3.

I will close this issue now as a "duplicate" of #7133 which was marked as fixed and released in 8.6.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.