RFC(plugin-sitemap): What would you like to see fixed or changed? #22757
Replies: 30 comments
-
Suggested new API,examples given would be defaults. API ChangesLAST UPDATED: 2020/04/16 // gatsby-config.js
module.exports = {
siteMetadata: {
siteUrl: `https://www.example.com`,
},
plugins: [
{
resolve: `gatsby-plugin-sitemap`,
options: {
output: `/sitemap.xml`, //UNCHANGED
// UNCHANGED
// Exclude specific pages or groups of pages using glob parameters
// See: https://github.com/isaacs/minimatch
// anny added categoies would be appended to these
exclude: [
`/dev-404-page`,
`/404`,
`/404.html`,
`/offline-plugin-app-shell-fallback`,
],
createLinkInHead: true, //UNCHANGED
sitemapSize: Infinity, //UNCHANGED
//UNCHANGED
query: `
{
site {
siteMetadata {
siteUrl
}
}
allSitePage {
nodes {
path
}
}
}`
,
/**
* @name resolveSiteUrl
*
* UNCHANGED
* @param {Object} data - results of the GraphQL query
* @returns {string}
*/
resolveSiteUrl: (data) => data.site.siteMetadata.siteUrl,
/**
* @name resolvePagePath
*
* NEW - This allows passing any data, it just has to end up in an array being
* returned with "path" as the object property the uri.
*
* if you don't want to place the URI in "path" then resolvePagePath
* are needed.
*
* @param {Object|string} page - Array Item returned from reolvePages
* @returns {string}
*/
resolvePagePath: (page) => page.path,
/**
* @name resolvePages
*
* NEW - This allows custom resolution of the array of pages.
* This also where user's could merge multiple sources into
* a single array if needed.
*
* @param {Object} data - results of the GraphQL query
* @returns {Array}
*/
resolvePages: (data) => data.allSitePage.nodes,
/**
* @name filterPages
*
* NEW - This allows filtering any data in any way.
*
* This Function is executed via allPages.map((page) => !excludes.some((excludedRoute) => thisFunc(page, ecludedRoute, tools)))
* allPages is the results of the resolvePages
*
* @param {Object} page
* @param {string} excludedRoute - Array from plugin config `options.exclude`
* @param {Object} tools - contains required tools for filtering
*
* @returns {Array}
*/
filterPages: (page, excludedRoute, { minimatch, withoutTrailingSlash, resolvePagePath }) =>
minimatch(
withoutTrailingSlash(resolvePagePath(page),
withoutTrailingSlash(excludedRoute)
)
),
/**
* @name serialize
*
* MODIFIED - the idea of passing specific siteUrl, and allPages values
* instead of the rawQuery data is that devs would not need to rewrite
* seralize, if the data structures changed. They would only need to change the
* resolution functions and not how the data is being seralized.
*
* Hopefully this allow slower ejection from the defaults.
*
* This funciton would be exectuted after the array is filtered with the default or custom filter
*
* This funciton is executed by allPages.map(page => thisFunc(page, siteUrl, tools))
* allpages is the result of the filter process
*
* @param {Object} page - results of the resolvePages function
* @param {string} siteUrl - results of the resolveSiteUrl function
* @param {Object} tools - contains tools for serializing
*
*/
serialize: (page, siteUrl, { resolvePagePath }) => {
return {
url: `${siteUrl}${resolvePagePath(page)}`,
changefreq: `daily`,
priority: 0.7,
}
},
}
}
]
} Other Improvements
|
Beta Was this translation helpful? Give feedback.
-
@pieh not sure if you have any input but you had requested a well thought out design process. FYI: I'm starting to wonder if this will REQUIRE a major version bump. Changing the default |
Beta Was this translation helpful? Give feedback.
-
TBF, I don't have a lot of context and insight into
Comment like this just shows value in making research and design ;) We might not need major version bump in the end (tho this was never a blocker, just frequent breaking changes is just bad for DX and we should do everything we can to minimize those) |
Beta Was this translation helpful? Give feedback.
-
Writing this down before I forget...no guarantee of logical thought. the custom resolvers are important but if someone is querying additional data for the serializer, and using that in a custom serializer...there is a good chance they don't want to bother with separately parsing the pages...they can probably just do it once in the custom resolvers? but If not in the resolvers...the filter still has to work. So maybe they have to do it twice or somehow merge the data from the additional query... Well if they don't write a custom resolver...but to write a custom filter, then they don't need the custom The flexibility of this design allows for several solutions...so maybe we simply document clearly how and when each config gets executed...maybe a nice little flow chart. And devs can do what they think is best. This probably is not a code issue but just why docs need to be very clear. |
Beta Was this translation helpful? Give feedback.
-
If it would be possible to maintain current signature and just expand it (so users would only need to change their |
Beta Was this translation helpful? Give feedback.
-
I do like the look of those new APIs, but I have some questions: In what scenarios I think you started writing it out in jsdoc descriptions for /**
* NEW - This allows custom resolution of
*
* @param {Object} data - results of the GraphQL query
* @returns {Array}
*/
resolvePages: (data) => data.allSitePage.nodes, |
Beta Was this translation helpful? Give feedback.
-
Additional thought: What if instead of (this might cause issues with backward compat tho - not sure - might depends on how much default options would change?) Why? Users wouldn't have to call |
Beta Was this translation helpful? Give feedback.
-
Originally, the issue that was into and hackily fixed recently was some people querying
I'm not sure if Iunderstand this question...but the reason I added these was so if a user wrote a custom query that broke defaults, they could override the resolvers without rewriting the entire serializer. Without the custom resolvers if you write a breaking query you would have to replace the filter function and the serialize function even if your output from those and their behavior is the same as the defaults. having the custom resolvers means you only have to customize the filter/serializer if you need to change their behavior, and not just the data structure.
Mostly because that doesn't fix the filter issue. Filter still has to be able to resolve the page path so you'd still need the custom resolver. What you suggest though makes me wonder if the TODO:
|
Beta Was this translation helpful? Give feedback.
-
I just realized it may be unnecessary to make the entire filter function be replicated. The custom filter could maybe be simply and only the function inside of |
Beta Was this translation helpful? Give feedback.
-
I would also want you to consider one use case. When you're migrating into gatsby from another service you might be in situation where some of pages are server by gatsby while some are server from wordpress/anything else. Currently you can only query for site and allSitePage. This is what I believe |
Beta Was this translation helpful? Give feedback.
-
@masives Thanks for the comment, I hadn't thought of this kind of situation. While Since we're ok with a major version bump I think maybe just changing the serialize: (page, siteUrl, {resolvePagePath}) => {
return {
url: `${siteUrl}${resolvePagePath(page)}`,
changefreq: `daily`,
priority: 0.7,
}
}, The This would be inline with my comment above
Making the filterPages: filterPages: (page, excludes, { minimatch, withoutTrailingSlash, resolvePagePath }) =>
!excludes.some((excludedRoute) =>
minimatch(
withoutTrailingSlash(resolvePagePath(page)),
withoutTrailingSlash(excludedRoute)
)
)
), |
Beta Was this translation helpful? Give feedback.
-
I've updated the proposal to include the above discussed changes. I like that it draws clearer lines between the different custom config options and each has a specific purpose as opposed to having lots of overlap with each other. |
Beta Was this translation helpful? Give feedback.
-
I just realized, if Gatsby prefixes paths devs may need to prefix paths from non gatsby sources. If a dev is pulling paths from outside Gatsby and also doing path prefixes they may need to access that in serialize: (page, { resolvePagePath, prependUrlRoot }) => {
return {
url: prefixUrlRoot(resolvePagePath(page)),
changefreq: `daily`,
priority: 0.7,
}
},
|
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
Just came here from #25019 which led to me take a look at the From reading this RFC, it's not quite clear to me why we'd need |
Beta Was this translation helpful? Give feedback.
-
@janosh It's an escape hatch. If where you're fetching data from always stores the path on the Notice that |
Beta Was this translation helpful? Give feedback.
-
I see, thanks for taking the time to explain! |
Beta Was this translation helpful? Give feedback.
-
I'm hoping to start work on this soon. Any final comments or input? Then only thing I'm unsure on is how to handle |
Beta Was this translation helpful? Give feedback.
-
Looking forward to the update! 👍 |
Beta Was this translation helpful? Give feedback.
-
I've created a canary version based on @moonmeister PR. Please try it out by running one of the two commands:
|
Beta Was this translation helpful? Give feedback.
-
@moonmeister @wardpeet Thanks for your work! Just tried out
|
Beta Was this translation helpful? Give feedback.
-
Hi @janosh! Sorry to hear you're running into an issue. To help us best begin debugging the underlying cause, it is incredibly helpful if you're able to create a minimal reproduction. This is a simplified example of the issue that makes it clear and obvious what the issue is and how we can begin to debug it. If you're up for it, we'd very much appreciate if you could provide a minimal reproduction and we'll be able to take another look. Thanks for using Gatsby! 💜 |
Beta Was this translation helpful? Give feedback.
-
@moonmeister Thanks for the quick reply! It's difficult to create a minimal repro as I'm unsure how to use the sitemap plugin. The serialize: ({ path, modifiedGmt }) => {
return {
url: path,
lastmod: modifiedGmt,
}
}, whereas the list of options says it's |
Beta Was this translation helpful? Give feedback.
-
Hey good catch. So "the default is said to be" is not the defaults, it's an example and I'm using de-structuring to extract those fields from the page object. and not using the |
Beta Was this translation helpful? Give feedback.
-
@moonmeister Thanks for the clarification. I read through the
Also, small typo "definately" -> definitely. Regarding the minimal repro, if I create a new barebones project gatsby new bug-repro https://github.com/gatsbyjs/gatsby-starter-hello-world and just module.exports = {
siteMetadata: {
siteUrl: `https://example.com`,
},
plugins: [`gatsby-plugin-sitemap`],
} it builds as expected. On my blog, however, the same setup throws the above mentioned error,
even after deleting and reinstalling rm -rf node_modules && rm yarn.lock && yarn Not sure how to debug that. |
Beta Was this translation helpful? Give feedback.
-
So we don't use |
Beta Was this translation helpful? Give feedback.
-
Sorry, I wasn't able to get the current version of the sitemap plugin to work with my blog either as explained in #25019. But if I remove the |
Beta Was this translation helpful? Give feedback.
-
@janosh Okay, I'll try and look into it but this is OSS work for me and I'm pretty busy atm...so not sure when I'll get to it. |
Beta Was this translation helpful? Give feedback.
-
No problem. I didn't mean to shove this off on to you. Will look into it more as well when I get the chance. |
Beta Was this translation helpful? Give feedback.
-
To his has been merged and published under |
Beta Was this translation helpful? Give feedback.
-
Summary
I recently did some work on
gatsby-plugin-sitemap
to resolve some edge case issues(#20692) caused by assumptions made in its code. This work fixed some issues and revealed some more. I made a note in the PR(#21948) that it might be time for a rewrite with some API changes to get all this smoothed out and to avoid some serious "jank" in the code.Proposal
Basically, I have time since the world is on pause and might just get this done.
To start this process, I wanted to see if anyone out there had any suggestions on features to add or bugs to resolved in a new version(I'm expecting their to be breaking changes currently, but we'll see what happens)?
I'll do some thinking and design work for the API in the coming days and add what I'm thinking and incorporate any feedback I get. Thanks for any input!
Beta Was this translation helpful? Give feedback.
All reactions