Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.IsCurrentNode is case sensitive #249

Closed
rblaettler opened this issue Nov 25, 2013 · 14 comments
Closed

Node.IsCurrentNode is case sensitive #249

rblaettler opened this issue Nov 25, 2013 · 14 comments
Assignees

Comments

@rblaettler
Copy link

There is already a discussion about this topic in Stackoverflow:
http://stackoverflow.com/questions/20150440/mvcsitemapprovider-iscurrentnode-case-sensitive

But the basic issue is, that URLS might not always be in the same case, specially if a user enters them directly or he comes in via an URL from another domain. So if IsCurrentNode is case sensitive, the mapping will often not work.

Maybe this could be enabled/disabled with a property? Personally, I don't see a reason for this to be case sensitive at all.

@ghost ghost assigned NightOwl888 Nov 26, 2013
@NightOwl888
Copy link
Collaborator

My initial thought was that because the URLs are stored in a dictionary, and because that dictionary is compared in a case sensitive way before the MVC routes are checked that it was failing for that reason. I didn't consider that if you get a miss on the URL table, it will still fall back on the MVC routes. When I experimented with this in the MvcMusicStore demo, I was unable to reproduce your problem.

Therefore, there must be something misconfigured in your application and that is the reason you are not getting a hit when doing the case-insensitive MVC route match. Please post your node configuration and routes - perhaps the problem is there.

Lowercasing URLs

Just so we are clear, I suggested this as a workaround for your problem, but it is also a best practice to follow in general.

Search engines MUST index the URLs in a case-sensitive manner otherwise they would be making unacceptable assumptions about the way web applications work. If any character in the URL is different, it is considered a different URL. This can have serious implications if your application serves the same content on various cases of a URL because the search engines won't know which one is the canonical URL.

Therefore, your MVC application must enforce case sensitivity of URLs and the most logical thing to do is to make them lowercase. In MVC, this means there are 2 details to take care of.

  1. Fix the routing infrastructure so the same case is always used.
  2. Do a 301 redirect when the wrong case is used in an external URL.

The first part I already mentioned. There is an option in the routing framework RouteCollection.LowercaseUrls that can be used in .NET 4.5.

If not using .NET 4.5, there is an open source project LowerccaseRoutesMVC.

To ensure your incoming URLs are redirected to the right place, you can either use URL rewriting or build a custom route to do it. Here is a custom route that I am using in one of my projects.

public class LowercaseRedirectRoute
    : RouteBase
{
    public override RouteData GetRouteData(HttpContextBase httpContext)
    {
        var request = httpContext.Request;
        if (request.Path.ContainsUpper() || request.Url.Host.ContainsUpper())
        {
            var builder = new UriBuilder(request.Url)
            {
                Path = request.Path.ToLowerInvariant(),
                Host = request.Url.Host.ToLowerInvariant()
            };
            var destinationUrl = builder.Uri.ToString();
            var routeData = new RouteData(this, new MvcRouteHandler());
            var response = httpContext.Response;

            // Do a 301 redirect
            response.CacheControl = "no-cache";
            response.StatusCode = (int)HttpStatusCode.MovedPermanently;
            response.RedirectLocation = destinationUrl;

            // Optional: Specify a controller action in case the end user's browser doesn't respect the 301 redirect
            routeData.Values["controller"] = "System";
            routeData.Values["action"] = "Status301";
            routeData.Values["url"] = destinationUrl;

            return routeData;
        }
        return null;
    }

    public override VirtualPathData GetVirtualPath(RequestContext requestContext, RouteValueDictionary values)
    {
        return null;
    }
}

I created a helper method for strings that tests whether there is an upper case character:

    public static class StringExtensions
    {
        public static bool ContainsUpper(this string text)
        {
            foreach (char c in text)
            {
                if (char.IsUpper(c)) return true;
            }
            return false;
        }
}

And here is the system controller action.

//
// GET: /System/Status301/?url=(some url)

public ActionResult Status301(string url)
{
    Response.CacheControl = "no-cache";
    Response.StatusCode = (int)HttpStatusCode.MovedPermanently;
    Response.RedirectLocation = url;
    ViewBag.DestinationUrl = url;

    return View();
}

Finally, here is the view that I use to display to the user if their browser doesn't respond to HTTP 301. I have found that not all browsers do, but was unable to get any information as to which ones and/or what configuration would cause this. The fallback is to send both a JavaScript and Meta Refresh redirect to the browser to force it to do a 302 redirect. If all else fails, there is a hyperlink to the page.

@{
    ViewBag.Title = "Page Moved";
}
@section MetaRefresh {
    <meta http-equiv="refresh" content="5;@ViewBag.DestinationUrl" />
}

<h2 class="error">@ViewBag.Title</h2>

The page has moved. Please update your bookmarks. Click <a href="@ViewBag.DestinationUrl">@ViewBag.DestinationUrl</a> if your browser doesn't automatically redirect you in 5 seconds.

<script>
    //<!--
    setTimeout(function () {
        window.location = "@ViewBag.DestinationUrl";
    }, 5000);
    //-->
</script>

Last but not least, configure the LowerCaseRedirectRoute in your route collection. You will most likely want to place this before all of your other routes.

public class RouteConfig
{
    public static void RegisterRoutes(RouteCollection routes)
    {
        routes.LowercaseUrls = true;

        routes.IgnoreRoute("{resource}.axd/{*pathInfo}");

        routes.Add("LowercaseRedirect", new LowercaseRedirectRoute());

        routes.MapRoute(
            name: "Default",
            url: "{controller}/{action}/{id}",
            defaults: new { controller = "Home", action = "Index", id = UrlParameter.Optional }
        );
    }
}

@rblaettler
Copy link
Author

You were right. routes.LowercaseUrls = true; did help. The problem was with the also case sensitive URL before.

I totally agree with your point that all URLs should be lowercase. Unfortunately we can't enforce that currently in our application for various reasons. We are working on it, but that will take a bit of time.

Nevertheless, I still think the IsCurrentNode should at least have the option to not be case sensitive at all.

@NightOwl888
Copy link
Collaborator

Nevertheless, I still think the IsCurrentNode should at least have the option to not be case sensitive at all.

Actually, as you have discovered the URLs are compared in a case-insensitive way when using MVC routing. It is only when you set the URL property explicitly that the URL comparison is case sensitive.

<mvcSiteMapNode title="The Node" url="/SomeController/SomeAction"> <!-- case-sensitive -->
<mvcSiteMapNode title="The Node" controller="SomeController" action="SomeAction"> <!-- case-insensitive -->

If you are not setting the URL property (and instead using controller, action, and area, etc), then I suspect that there is something else wrong with your configuration as I was unable to reproduce the behavior in this case. The fact that this is the only complaint about case sensitivity we have had out of over 12,000 downloads supports this theory.

As you can see item 6 in this list, there are plans to split the routing into a separate dictionary with a key based on the route instead of the resulting URL, which will sidestep the problem almost entirely. The way it is currently done is based on Microsoft's original design which is also uses case-sensitive URLs. I am not sure that making the WHOLE URL case insensitive is the right way to go about this. Some applications may need the querystring values to be case-sensitive to function, while the rest of the URL could be case-insensitive.

Doing 301 redirects to the proper case seems like a much more performant and scalable way to fix this because the majority of the traffic will never take the hit. Perhaps it would make more sense to add an (optional) custom route based on the SiteMap's URL dictionary to the package that automatically does the redirects to the correct case. This will also fix the inherent SEO canonical URL problem with using a case sensitive URL scheme.

@kojiishi
Copy link

I hit this too; in my case, my site relies on FindSiteMapNode(string rawUrl), and it fails if the mvcSiteMapNode is url-based because it uses urlTable, which is case sensitive.

While all responses from NightOwl888 make sense to me, I also agree with rblaettler that having an option would be great. As long as we're on .NET world, it's very likely that files are on NTFS, which is case insensitive.

It looks like urlTable and keyTable are the only case where IGenericDictionaryFactory.Create is called with TKey = string, so simple fix such as:

        public virtual IDictionary<TKey, TValue> Create<TKey, TValue>()
        {
            if (typeof(TKey) == typeof(string))
                return (IDictionary<TKey, TValue>)new Dictionary<string, TValue>(StringComparer.OrdinalIgnoreCase);
            return new Dictionary<TKey, TValue>();
        }

can do with little harm. Probably one of the following are better: 1) adding an argument, 2) a separate method for URL, or 3) separate IGenericDictionaryFactory in SiteMapChildStateFactory. My preference is 2 or 3.

URL comparison is always a little special than other cases since it can also involve Unicode normalizations. Would you think it's a bad idea to add this option?

@kojiishi
Copy link

After some more thoughts, adding IUrlPath.CreateDictionary() might be better place.

@NightOwl888
Copy link
Collaborator

I had previously made the change you suggested in a prototype and it worked great. However, there is one major drawback to this - that is the entire URL is not necessarily case insensitive. To see what I mean, lets break down the parts of the URL.

  1. Scheme (http or ftp) - case insensitive.
  2. Scheme delimiter (://) - doesn't matter.
  3. UserName/Password (http://username:[email protected]/) - case sensitive.
  4. Host (www.mysite.com) - case insensitive.
  5. Port (:80) - doesn't matter (must be colon + numeric).
  6. Path (/some-path/some-resource) - case insensitive.
  7. Query (?name=MyName&search=Value) - can be case sensitive.

So, clearly we can't just make the whole URL case insensitive (especially for things like passwords and querystring parameters) and it is clear Microsoft avoided this issue in the original design by making the entire URL case sensitive.

So in order to realistically make the URL case insensitive (which actually only means you want the path segment of the URL to be case insensitive), the URL would have to be broken down into its segments and then each segment compared individually. Putting this string parsing directly in the path of every single request is going to be a significant drain on performance, especially for large SiteMaps. This is why I previously suggested the most logical workaround - to use 301 redirects on any incoming URLs with invalid case so only the offending URLs will see any impact on performance.

In the future, we could fix this by taking several steps to minimize the impact (some of these are already planned for v5):

  1. Create a separate dictionary to compare the MVC routes and remove those node entries from the urlTable.
  2. Break the URLs into their various segments during the construction of the SiteMap so this doesn't have to be done on every single request.
  3. Store the URL segments into a data structure that can quickly evaluate case sensitive and case insensitive comparisons of the various segments.
  4. Use the data structure in item 3 as the key of the urlTable.

There is also another problem that we need to overcome in v5 - the current design stores all of the nodes in RAM which limits its scalability. So, I am considering modeling the new version after a search engine with a high performance index that can be stored on disk, which will also need to take case-sensitive field matching into consideration. But as you can see, this is nowhere near as cut and dried as your proposed solution.

@kojiishi
Copy link

Thank you for the response and the great write up. Yeah, I understand that; I once wrote URI comparison by myself and it was really complex. But in this case, all we care is urlTable, which -- if I understand correctly -- only contains path and query. Do I misunderstand url attribute can contain only path and query?

@NightOwl888
Copy link
Collaborator

The URL attribute/property can contain any relative or absolute URL, including FTP URLs.

It is possible you can limit this in your own application to relative path and query (or just path), in which case you could make this change and inject your own SiteMap object.

BTW - the OP found that he had a configuration problem that was causing the issue. The case insensitive match works on nodes that are configured with controller and action, the only place it doesn't work is when you set the URL property explicitly. The latest version now checks for invalid characters in the controller and action properties to help avoid a configuration problem, but it could still happen if you misspell a controller name or action name.

@kojiishi
Copy link

Yeah, I understand that. I'm porting classic ASP.NET to MVC, and the old site had its own XML-based sitemap, so it was easier for me to use URL than controller and action for now. I might make gradual transition though.

I'm not sure if I understand what you meant by "make this change and inject your own," I guess you won't welcome such pull request but recommend to use DI to replace ISiteMapChildStateFactory or IGenericDictionaryFactory?

@kojiishi
Copy link

I understand you're not very positive to support this feature, but since the patch is quite small that I went ahead to create, some testing, and sent a pull request. It's off by-default, and one can turn it on in web.config (or through DI though that part isn't tested.)

My site has a lot of static content that are handled by single action, so even after the full upgrade to MVC, I will need URL in the sitemap unless I misunderstand something -- well, I'm still quite new to MVC so I might though.

I was back and forth between using IGenericDictionryFactory and adding a method to IUrlPath. The latter seemed more natural fit, but since you mentioned that scaling and memory pressure is one of the issues you want to work on, I guess IGenericDictionryFactory gives more flexibility.

Hope you're ok with this patch.

@NightOwl888
Copy link
Collaborator

Thanks for the contribution, however I maintain that this approach isn't the right way to fix this problem.

The fact of the matter is, if you are migrating from ASP.NET then this isn't a problem because the case sensitive URL match works exactly the same in ASP.NET as it does in MvcSiteMapProvider.

Furthermore, if you look at this from a purely SEO standpoint, this is the wrong approach. Search engines always use case-sensitive URLs and each unique URL variant is considered a completely different page. Making the application respond to multiple cases of the URL without a 301 redirect to the canonical one virtually guarantees lower search engine rankings.

For example, a search engine would consider the following 2 URLs to be different pages and give each one its own ranking score:

http://www.mysite.com/some-controller/some-action
http://www.mysite.com/some-controller/some-actioN

My question is, why would you want to make the search engine consider these 2 URLs to be separate pages if they point to the exact same resource on the server? The only thing this accomplishes is to 'split the vote' of any incoming links between the 2 URLs and make them both rank lower overall than they could. A better solution would be to make the URL http://www.mysite.com/some-controller/some-actioN do a 301 redirect to http://www.mysite.com/some-controller/some-action so the search engines only score the last URL and would transfer any ranking points from the first one to the last one.

Even if you are not concerned with search engines, this approach still fixes the perceived problem by redirecting the user to the correct URL casing, which effectively side-steps the original 'problem' by guaranteeing a matching record will exist in the urlTable.

Therefore, if you would like to make a contribution, I suggest you instead add an IncorrectCaseRedirectRoute similar to what I posted here that does a 301 redirect when the case doesn't match the URL that is configured in the SiteMap. This would certainly make more sense and would also ensure better search engine rankings by guaranteeing a canonical URL for each page. It will also ensure the letter casing that is sent to the search engine in the /sitemap.xml endpoint is the same as what the application responds to as the canonical URL.

I suggest making a read-only property on the ISiteMap interface that returns a string array of URLs, read from the key of the urlTable that can be used to do a quick comparison from the IncorrectCaseRedirectRoute. To minimize the impact, the IncorrectCaseRedirectRoute.GetRouteData() method should first do a case-sensitive match and return null immediately if the case matches. Then, during a second pass it should do a case insensitive match on the URL and if a hit is found it should initiate the 301 redirect to the URL with the correct casing. That way the performance impact only hits the users that are coming in under the wrong case of the URL.

Keep in mind that we can have multiple SiteMap objects in the same application, so the IncorrectCaseRedirectRoute should have a way to specify the siteMapCacheKey to pull up a specific SiteMap, and if not specified, pull up the default SiteMap.

Also keep in mind that we want to do this in a DI-friendly way so dependencies can be swapped, if needed. I suggest injecting ISiteMapLoader instead of accessing the SiteMap object through the static SiteMaps object. Have a look at the XmlSiteMapResult and XmlSiteMapResultFactory for an example of how this can be done.

@kojiishi
Copy link

Thank you for the suggestion.

A few answers to your questions to make sure my intention is clear. First, ASP.NET, both MVC and classic, ignores cases for path part, so I don't understand what you mean by MvcSiteMapProvider works exactly the same way as ASP.NET. I'm not a native English speaker, maybe I misunderstand what you said?

Second. In order for our writers to add content without adding controllers/actions for each page, our controller serves different views for different URLs using one shared controller/action, so I'll need URLs in the sitemap even after the migration is completed.

I agree with you on the search engine part, but I'm looking at the problem from different point of view. Since ASP.NET, both classic and MVC, serves the page for incorrectly-cased URLs, only navigations are broken. Not only it doesn't show current node correctly, but also whole navigation is gone because my site determines which part of navigation to be rendered from the current node. I hope these arguments make sense to you and understand the needs to handle incorrect casing somehow.

Well, for "how" part, I understand your suggestion, and agree that redirecting to the correctly-cased URL is better than just fixing the navigations. It's just a bit more work. I'll give some time to think about it.

And thank you for the link to the contribution guide, I relied Google to find it but without luck. That was helpful.

@NightOwl888
Copy link
Collaborator

First, ASP.NET, both MVC and classic, ignores cases for path part, so I don't understand what you mean by MvcSiteMapProvider works exactly the same way as ASP.NET. I'm not a native English speaker, maybe I misunderstand what you said?

I stand corrected. I took another look at the Microsoft implementation of the urlTable and here is what it looks like:

internal IDictionary UrlTable
{
    get
    {
        if (this._urlTable == null)
        {
            lock (base._lock)
            {
                if (this._urlTable == null)
                {
                    this._urlTable = new Hashtable(StringComparer.OrdinalIgnoreCase);
                }
            }
        }
        return this._urlTable;
    }
}

So indeed, the URL matching is case insensitive in the original implementation and I was wrong about what I said before. All of the other dictionaries in the StaticSiteMapProvider (Microsoft's implementation) are case sensitive.

Second. In order for our writers to add content without adding controllers/actions for each page, our controller serves different views for different URLs using one shared controller/action, so I'll need URLs in the sitemap even after the migration is completed.

This is fine. v3 didn't support using URLs, but I have ensured that in v4 using URLs instead of controllers and actions now works. This is the same thing I am doing in my application.

I agree with you on the search engine part, but I'm looking at the problem from different point of view. Since ASP.NET, both classic and MVC, serves the page for incorrectly-cased URLs, only navigations are broken. Not only it doesn't show current node correctly, but also whole navigation is gone because my site determines which part of navigation to be rendered from the current node. I hope these arguments make sense to you and understand the needs to handle incorrect casing somehow.

While I understand where you are coming from, it is unfortunate that Microsoft decided to implement it like this because it encourages development of web sites that respond on non-canonical URLs, which is bad for SEO. This is also broken for anyone who requires case sensitive matching of querystring parameters. Furthermore, this makes the bigger problem (lack of 301 redirects) more difficult to spot.

However, being that this is the way Microsoft made it originally, I agree that we should do it the same way (this is a bug) since the urlTable is a carryover from their design anyway.

The approach I recommend is to make each "table" in the SiteMap into its own interface (inheriting from IDictionary<key, value>), to make each "table" have its own specialized factory (each with its own interface), and to make only the SiteMapUrlTable case insensitive (using OrdinalIgnoreCase the same way Microsoft did). Note that this is the same approach used in SiteMapNode to provide its child dictionaries and collections. This was the direction I was originally going in to fix this problem when I realized it wasn't a complete fix - unfortunately, I seem to have deleted the code without committing it to Git.

This also means the GenericDictionaryFactory can be eliminated and the SiteMapChildStateFactory will need to be modified with the additional overloads to create the different types of dictionaries for the "tables" in the SiteMap object.

The approach you made to add the UrlIgnoreCase setting was not quite correct, though. All settings need to be injected via DI. So, we need to create both a case insensitive and case sensitive ISiteMapUrlTableFactory, and inject one or the other in the SiteMapLoaderContainer depending on the value of the configuration setting. The external DI container modules will need a modification as well, but let me put this together because I need to break all DI changes into their own commit so they can be more easily copied and pasted by end users when upgrading.

SEO Problem

That said, we really should also provide a RouteBase override that can be used to 301 redirect when the casing doesn't match because that will be required for anyone that requires SEO support (basically, any web site that is internet facing). I am thinking by default that it should not use a view, but give the option to add a view that is external to MvcSiteMapProvider by passing area, controller, and action settings into the constructor. That way the view and controller can exist somewhere else in the application.

The internal DI container should probably just contain an option to enable the route and it should be inserted as the first route in the RouteCollection automatically. For external DI, the DI container should be passed into the RouteConfig.RegisterRoutes() method so the IIncorrectCaseRedirectRoute can be resolved with all of its dependencies.

While you don't necessarily have to provide this for your pull request to be accepted, I really wouldn't consider this issue closed until we have provided a way to do the 301 redirects. Since the source of the URLs is the MvcSiteMapProvider, it seems that including this piece should fall on our shoulders. Its usage should be optional, to make everybody happy.

@kojiishi
Copy link

Looks great, thank you for taking time on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants