You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because of the behaviour of the relation embedder (see also #2256), the same record can be ingested multiple times during a reindex, or even during normal running when there is a particularly dendritic archive record (see Slack).
We can guard against this a little by changing the ingestor queue into a FIFO queue with content-based deduplication.
In some cases, the record will have been completely processed before the new message appears. In those situations, it will still be processed multiple times.
Using a FIFO queue will guard against the situation where a message to process a record is placed on the queue multiple times in quick succession, e.g. if the relation embedder processes it in two adjacent batches.
The text was updated successfully, but these errors were encountered:
It may also be wise to deduplicate elsewhere, but it is in the nature of the relation embedder to flood the ingestor with duplicates, whereas duplication upstream of there is more likely to be due to multiple subsequent changes in the source data arriving faster than the pipeline processes them.
Such rapid changes are less common than the relation embedder sending duplicate messages.
Because of the behaviour of the relation embedder (see also #2256), the same record can be ingested multiple times during a reindex, or even during normal running when there is a particularly dendritic archive record (see Slack).
We can guard against this a little by changing the ingestor queue into a FIFO queue with content-based deduplication.
In some cases, the record will have been completely processed before the new message appears. In those situations, it will still be processed multiple times.
Using a FIFO queue will guard against the situation where a message to process a record is placed on the queue multiple times in quick succession, e.g. if the relation embedder processes it in two adjacent batches.
The text was updated successfully, but these errors were encountered: