Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the crawler-preference spec. #560

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions extensions/crawler-preference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
title: Crawler Preference
layout: spec
work-in-progress: true
copyrights:
-
name: "Sadie Powell"
email: "[email protected]"
period: "2024"
---


## Notes for implementing work-in-progress version

This is a work-in-progress specification.

## Introduction

An IRC crawler is a bot that connects to IRC networks to create a publicly available list of servers and channels. This specification allows IRC networks to express a preference to about whether they can be indexed by a well-behaving IRC crawler.

## The `CRAWLER` ISUPPORT token

Servers can express a crawler preference by adertising the the `CRAWLER`[ISUPPORT][] token. Servers MUST advertise this token with a value. If a crawler encounters the token without a value then it MUST ignore the token.

The value of the `CRAWLER` token is a comma (`,`) (0x2C) separated list of data. Each datum consists of a key which might have a value attached. If there is a value attached, the value is separated from the key by a colon (`:`) (0x3A). That is, `<key>[:<value>][,<key2>[:<value2>][,<keyN>[:<valueN>]]]`. Keys specified in this document MUST only occur at most once.

[ISUPPORT]: https://modern.ircdocs.horse/#feature-advertisement

### The `index` key

The `index` key specifies whether a crawler is allowed to index the network. This key MUST be specified.

The value of this key MUST be either `allow` to allow crawlers to index the network or `deny` to prohibit crawlers from indexing the network.

### The `cooldown` key

The `cooldown` key specifies the time period before a crawler is allowed to reindex a network.

The value of this key MUST be a duration in seconds. If not set this key defaults to 3600 seconds (one hour).

## Examples

Example of a network which wishes to be crawled once per hour:

S: :irc.example.com 005 alice CRAWLER=index:allow,cooldown:3600

Example of a network which does not wish to be crawled (recheck yearly):

S: :irc.example.com 005 bob CRAWLER=index:deny,cooldown:31536000