Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sitemap): add ignorePatterns option #6979

Merged
merged 7 commits into from
Apr 6, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,22 @@ describe('createSitemap', () => {
expect(sitemap).not.toContain('404');
});

it('exclusion of ignored paths', async () => {
const sitemap = await createSitemap(
{
url: 'https://example.com',
} as DocusaurusConfig,
['/', '/search', '/tags'],
{
changefreq: EnumChangefreq.DAILY,
priority: 0.7,
ignorePatterns: [/^\/search/, /^\/tags/],
},
);
expect(sitemap).not.toContain('/search');
expect(sitemap).not.toContain('/tags');
});

it('keep trailing slash unchanged', async () => {
const sitemap = await createSitemap(
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ describe('validateOptions', () => {
const userOptions = {
changefreq: 'yearly',
priority: 0.9,
ignorePatterns: [/^\/search/],
};
expect(testValidate(userOptions)).toEqual({
...defaultOptions,
Expand All @@ -49,4 +50,17 @@ describe('validateOptions', () => {
`"\\"changefreq\\" must be one of [daily, monthly, always, hourly, weekly, yearly, never]"`,
);
});

it('rejects bad ignorePatterns inputs', () => {
expect(() =>
testValidate({ignorePatterns: /^search/}),
).toThrowErrorMatchingInlineSnapshot(
`"\\"ignorePatterns\\" must be an array"`,
);
expect(() =>
testValidate({ignorePatterns: ['search']}),
).toThrowErrorMatchingInlineSnapshot(
`"\\"ignorePatterns[0]\\" must be of type object"`,
Josh-Cena marked this conversation as resolved.
Show resolved Hide resolved
);
});
});
9 changes: 7 additions & 2 deletions packages/docusaurus-plugin-sitemap/src/createSitemap.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,17 @@ export default async function createSitemap(
if (!hostname) {
throw new Error('URL in docusaurus.config.js cannot be empty/undefined.');
}
const {changefreq, priority} = options;
const {changefreq, priority, ignorePatterns} = options;

const sitemapStream = new SitemapStream({hostname});

routesPaths
.filter((route) => !route.endsWith('404.html'))
.filter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we apply trailingSlash before the filtering? I don't know what is best in this case 🤷‍♂️

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

404s should not be applied trailing slash anyways, I think this is fine. The odds that someone gets tricked by this seems low to me, and they can always check their glob.

(route) =>
!route.endsWith('404.html') &&
(!ignorePatterns ||
Josh-Cena marked this conversation as resolved.
Show resolved Hide resolved
!ignorePatterns.some((pattern) => pattern.test(route))),
)
.forEach((routePath) =>
sitemapStream.write({
url: applyTrailingSlash(routePath, {
Expand Down
4 changes: 4 additions & 0 deletions packages/docusaurus-plugin-sitemap/src/options.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import type {OptionValidationContext} from '@docusaurus/types';
export const DEFAULT_OPTIONS: Options = {
changefreq: EnumChangefreq.WEEKLY,
priority: 0.5,
ignorePatterns: [],
};

const PluginOptionSchema = Joi.object({
Expand All @@ -24,6 +25,9 @@ const PluginOptionSchema = Joi.object({
.valid(...Object.values(EnumChangefreq))
.default(DEFAULT_OPTIONS.changefreq),
priority: Joi.number().min(0).max(1).default(DEFAULT_OPTIONS.priority),
ignorePatterns: Joi.array()
.items(Joi.object().instance(RegExp))
Copy link
Collaborator

@Josh-Cena Josh-Cena Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we allow strings as well, which will be glob patterns like /tags/**? We have a utils.createMatcher for that.

Copy link
Collaborator

@slorber slorber Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this runs in Node side so why not expose a callback as well (and thus rename to simply "ignore"?)

a callback is always the most flexible option

now I'm not against providing support for RegExp/globs too but all this can be implemented in userland with a callback and this is a quite niche feature 🤷‍♂️

Copy link
Collaborator

@Josh-Cena Josh-Cena Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like callback-based APIs unless we can't implement a good solution otherwise—Webpack loaders don't use callbacks everywhere only because they can. I do prefer serializable APIs if they provide the right level of abstraction. While ignorePatterns: ["/tags/**"] is probably as useful as ignore: (path) => path.startsWith("/tags/"), the former is more well-understood by everyone and more approachable for those not well-versed with JS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷‍♂️ I don't know, it's also more ambiguous because some edge cases might depend on the underlying glob library etc... A callback always behaves explicitly, other apis are just shortcuts.

Not saying that we shouldn't provide shortcuts, but this use-case seems niche enough that a lower-level but more flexible API might be enough

Copy link
Contributor Author

@ApsarasX ApsarasX Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that glob patterns are better than regular expressions, and I've used utils.createMatcher instead of regular matching.

In addition, I don't quite agree to use the callback based API, because it will make docusaurus.config.js looks disorganized. And not all docusaurus users are professional programmers.

.default(DEFAULT_OPTIONS.ignorePatterns),
trailingSlash: Joi.forbidden().messages({
'any.unknown':
'Please use the new Docusaurus global trailingSlash config instead, and the sitemaps plugin will use it.',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ export type Options = {
changefreq?: EnumChangefreq;
/** @see https://www.sitemaps.org/protocol.html#xmlTagDefinitions */
priority?: number;
ignorePatterns?: RegExp[];
};