Skip to content

Commit

Permalink
resolving merge
Browse files Browse the repository at this point in the history
  • Loading branch information
petersilva committed Feb 14, 2024
2 parents b20313c + 96c62d8 commit 4921fb1
Show file tree
Hide file tree
Showing 37 changed files with 758 additions and 164 deletions.
15 changes: 15 additions & 0 deletions debian/changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
metpx-sr3 (3.00.52rc1) unstable; urgency=medium

* nodupe_fileAgeX (X=Min or Max) name change to fileAgeX replacing inflight.
* fix #907 declare exchange in admin.conf wasn't working.
* fix #912 enable v2 style retry processing with new --retry_refilter flag.
* fix #918 plugin for AM to name files with incomplete headers properly.
* fix #920 fix inflight mtime not working in watch.
* fix #920 replaced nodupe_fileAge(Min and Max) with fileAge(Min and Max)
* fix #920 inflight mtime merged with fileMinAge
* fix #922 sr3 showing wrong exchange on status/show.
* processing of rename field in messages suppressed when download=False
* (required for common processing of messages.)

-- peter <[email protected]> Wed, 14 Feb 2024 09:00:18 -0500

metpx-sr3 (3.00.51) unstable; urgency=medium

* PR #910 detect v02 messages without content_type header.
Expand Down
14 changes: 8 additions & 6 deletions docs/source/Explanation/CommandLineGuide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ View all configuration settings (the result of all parsing... what the flow comp
'message_ttl': 0,
'mirror': True,
'msg_total_interval': '0',
'nodupe_fileAgeMax': 0,
'fileAgeMax': 0,
'nodupe_ttl': 0,
'overwrite': True,
'permCopy': True,
Expand Down Expand Up @@ -954,8 +954,10 @@ These options set what files the user wants to be notified for and where
- **accept <regexp pattern> [rename=] (must be set)**
- **reject <regexp pattern> (optional)**
- **permDefault <integer> (default: 0o400)**
- **nodupe_fileAgeMax <duration> (default 30d)**
- **fileAgeMax <duration> (default 30d)**

nodupe_fileAgeMax should be less than nodupe_ttl when using duplicate suppression,
to avoid re-ingesting of files that have aged out of the nodupe cache.

The option *filename* can be used to set a global rename to the products.
Ex.:
Expand Down Expand Up @@ -1019,12 +1021,12 @@ other nodes participating in the poll, when they don't have the vip,
will subscribe to the output of the poll to keep their duplicate suppression
caches current.

files that are more than nodupe_fileAgeMax are ignored. However, this
files that are more than fileAgeMax are ignored. However, this
can be modified to any specified time limit in the configurations by using
the option *nodupe_fileAgeMax <duration>*. By default in components
the option *fileAgeMax <duration>*. By default in components
other than poll, it is disabled by being set to zero (0). As it is a
duration option, units are in seconds by default, but minutes, hours,
days, and weeks, are available. In the poll component, nodupe_fileAgeMax
days, and weeks, are available. In the poll component, fileAgeMax
defaults to 30 days.

Advanced Polling
Expand Down Expand Up @@ -1116,7 +1118,7 @@ The notification protocol is defined here `sr_post(7) <../Reference/sr_post.7.ht
**poll** connects to a *broker*. Every *sleep* seconds, it connects to
a *pollUrl* (sftp, ftp, ftps). For each of the *directory* defined, it lists
the contents. Polling is only intended to be used for recently modified
files. The *nodupe_fileAgeMax* option eliminates files that are too old
files. The *fileAgeMax* option eliminates files that are too old
from consideration. When a file is found that matches a pattern given
by *accept*, **poll** builds a notification message for that product.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/Explanation/DetectFileReady.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,12 @@ File Detection Strategy Table
|sr_watch with| | |
|inflight |Minimum age (modification time) |Last choice, guarantees delay only if |
|number |of the file before it is considered |no other method works. |
|(mtime) |complete. | |
|(mtime) |complete. (aka: fileAgeMin) | |
| | |Receiving from uncooperative |
|Alternate | - Adds delay in every transfer. |sources. |
|setting | - Vulnerable to network failures. | |
| | - Vulnerable to clock skew. |(ok choice with PDS) |
|nodupe\_\ | | |
| | | |
|fileAgeMin | |If a process is re-writing a file |
| | |often, can use mtime to smooth out |
| | |the i/o pattern, by slowing posts. |
Expand Down
7 changes: 6 additions & 1 deletion docs/source/Explanation/DuplicateSuppression.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,13 +170,18 @@ Files That are Too Old

in the configuration file::

nodupe_fileAgeMax 600
fileAgeMax 600

Files which are older than 600 seconds (10 minutes) will not be considerred for transfer.

This is usually used with polls that have very long lasting directories on a remote
server. example: a remote server has a permanent database of remote files.

It is often the case that nodupe_ttl should be greater than nodupe_fileAgeMax to prevent
files from aging out of the cache before they are considered "too old" and then being
(erroneously) re-ingested. A warning message is emitted if this is the case in a poll
on startup.


Roll Your Own
-------------
Expand Down
4 changes: 2 additions & 2 deletions docs/source/Explanation/FileCompletion.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Inflight Table
| number |of the file before it is considered |guaranteed delay added |
| (mtime) |complete. | |
| | |Receiving from uncooperative |
| |Adds delay in every transfer. |sources. |
| fileAgeMin |Adds delay in every transfer. |sources. |
| |Vulnerable to network failures. | |
| |Vulnerable to clock skew. |(ok choice with PDS) |
+-------------+---------------------------------------+--------------------------------------+
Expand All @@ -62,7 +62,7 @@ NOTES:

On versions of sr_sender prior to 2.18, the default was NONE, but was documented as '.tmp'
To ensure compatibility with later versions, it is likely better to explicitly write
the *inflight* setting.
the *inflight* setting. The numeric variant is the same as setting *fileAgeMin*

*inflight* was renamed from the old *lock* option in January 2017. For compatibility with
older versions, can use *lock*, but name is deprecated.
Expand Down
21 changes: 19 additions & 2 deletions docs/source/How2Guides/UPGRADING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,23 @@ Installation Instructions
git
---

3.0.52
------

*CHANGE*: Additional messageCountMax arugment to flowcb.gather() entry point.
when implementing flow callbacks for scheduled flows, or poll overrides, the
gather entry point now takes one additional argument indicating the maximum
number of messages that the routine should return.

To be compatible with previous versions, one can establish a default value
on the gather::

def gather(self, messageMaxCount=None):

With the default value, plugins are downward compatible. (earlier versions
will call with only self as an argument.)


3.0.51
-------

Expand Down Expand Up @@ -309,7 +326,7 @@ V2 to Sr3
all participants in a vip update ls_files nodes subscribe to the output exchange
poll builds strings to describe files poll builds stat(2) like paramiko.SftpAttributes()
participants rely on their ls_files for state poll uses flowcb.nodupe module like rest of sr3
file_time_limit to ignore older files nodupe_fileAgeMax
file_time_limit to ignore older files fileAgeMax
*destination* gives where to poll *pollUrl*
*directory* gives remote directory to list *path* used like in *post* and *watch*
need *accept* per *directory* need only one *accept*
Expand Down Expand Up @@ -367,7 +384,7 @@ V2 to Sr3
e fileEvents
events fileEvents
exchange_split exchangeSplit
file_time_limit nodupe_fileAgeMax
file_time_limit fileAgeMax
hb_memory_baseline_file MemoryBaseLineFile
hb_memory_max MemoryMax
hb_memory_multiplier MemoryMultiplier
Expand Down
8 changes: 4 additions & 4 deletions docs/source/How2Guides/subscriber.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ The usual intent is to automatically download the data
wanted to a directory on a subscriber
machine where other software can process it. Please note:

- the tool is entirely command line driven (there is no GUI) More accurately, it is mostly configuration file driven.
most of the *interface* involves using a text editor to modify configuration files.
- while written to be compatible with other environments, the focus is on Linux usage.
- the tool can be used as either an end-user tool, or a system-wide transfer engine.
- The tool is entirely command line driven (there is no GUI) More accurately, it is mostly configuration file driven.
Most of the *interface* involves using a text editor to modify configuration files.
- While written to be compatible with other environments, the focus is on Linux usage.
- The tool can be used as either an end-user tool, or a system-wide transfer engine.
This guide is focused on the end-user case.
- All documentation of the package is available
at https://metpx.github.io/sarracenia
Expand Down
57 changes: 51 additions & 6 deletions docs/source/Reference/sr3_options.7.rst
Original file line number Diff line number Diff line change
Expand Up @@ -503,6 +503,12 @@ Once connected to an AMQP broker, the user needs to bind a queue
to exchanges and topics to determine the notification messages of interest.


bufsize <size> (default: 1MB)
-----------------------------

Files will be copied in *bufsize*-byte blocks. for use by transfer protocols.


byteRateMax <size> (default: 0)
--------------------------------

Expand Down Expand Up @@ -906,6 +912,7 @@ can also be specified as a time interval, for example, 10 for 10 seconds.
When set to a time interval, file posting process ensures that it waits until
the file has not been modified in that interval. So a file will
not be processed until it has stayed the same for at least 10 seconds.
This is the same as setting the **fileAgeMin** setting.

Lastly, **inflight** can be set to *NONE*, which case the file is written directly
with the final name, where the recipient will wait to receive a post notifying it
Expand Down Expand Up @@ -941,7 +948,17 @@ will not be posted.
inlineByteMax <size>
--------------------

the maximums size of messages to inline.
The maximum size of messages to inline.

inlineEncoding text|binary|guess (default: guess)
_________________________________________________

when inlining file content, what sort of encoding should be done? Three choices:

* text: the file content is assumed to be utf-8 text and encoded as such.
* binary: the file content is unconditionally converted to base64 binary encoding.
* guess: try making text, if that fails fall back to binary.


inlineOnly
----------
Expand Down Expand Up @@ -1153,14 +1170,18 @@ or:

More information: `Duplicate Suppresion <../Explanation/DuplicateSuppression.html>`_

nodupe_fileAgeMax
-----------------
fileAgeMax
----------

If files are older than this setting (default: 30d in poll, 0 in other components),
If files are older than this setting (default: 7h in poll, 0 in other components),
then ignore them, they are too old to post. 0 deactivates the setting.

nodupe_fileAgeMin
-----------------
In a Poll:
* default is 7 hours. should be less than nodupe_ttl to prevent re-ingest of duplicate data.
(full discussion here: https://github.com/MetPX/sarracenia/issues/904 )

fileAgeMin
----------

If files are newer than this setting (default: 0), then ignore them, they are too
new to post. 0 deactivates the setting.
Expand All @@ -1182,6 +1203,9 @@ dropped, so duplicates separated by a large enough interval will get through.
A value of 1d (day) or 1w (week) can be appropriate. Setting the option without specifying
a time will result in 300 seconds (or 5 minutes) being the expiry interval.

Default value in a Poll is 8 hours, should be longer than nodupe_fileAgeMax to prevent
re-ingesting files that have aged out of the duplicate suppression cache.

**Use of the cache is incompatible with the default *parts 0* strategy**, one must specify an
alternate strategy. One must use either a fixed blocksize, or always never partition files.
One must avoid the dynamic algorithm that will change the partition size used as a file grows.
Expand Down Expand Up @@ -1528,6 +1552,19 @@ retryEmptyBeforeExit: <boolean> (default: False)
Used for sr_insects flow tests. Prevents Sarracenia from exiting while there are messages remaining in the retry queue(s). By default, a post will cleanly exit once it has created and attempted to publish messages for all files in the specified directory. If any messages are not successfully published, they will be saved to disk to retry later. If a post is only run once, as in the flow tests, these messages will never be retried unless retryEmptyBeforeExit is set to True.


retry_refilter <boolean> (default: False)
-----------------------------------------

The **retry_refilter** option alters how messages are reloaded when they are retrieved from
a retry queue. The default way (value: False) is to repeat the transfer using exactly
the same message as before. If **retry_refilter** is set (value: True) then all the
message's calculated fields will be discarded, and the processing re-started from the gather
phase (accept/reject processing will be repeated, destinations re-calculated.)

The normal retry behaviour is use when the remote has had a failure, and need to
re-send later, while the retry_refilter option is used when recovering from configuration
file errors, and some messages had incorrect selection or destination criteria.

retry_ttl <duration> (default: same as expire)
----------------------------------------------

Expand Down Expand Up @@ -1759,6 +1796,14 @@ The **timeout** option, sets the number of seconds to wait before aborting a
connection or download transfer (applied per buffer during transfer).


timezone <string> (default: UTC)
--------------------------------

Interpret listings from an FTP server as being in the given timezone as per `pytz <pypi.org/project/pytz>`_
Examples: Canada/Pacific, Pacific/Nauru, Canada/Eastern, Europe/Paris
Has no effect other than in when polling an FTP server.


tlsRigour (default: medium)
---------------------------

Expand Down
7 changes: 5 additions & 2 deletions docs/source/Reference/sr_post.7.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,12 @@ The headers are an array of name:value pairs::

"content" - for smaller files, the content may be embedded.
{
"encoding" : "utf-8" | "base64" ,
"encoding" : "utf-8" | "base64" | "iso-8859-1" ,
"value" " "encoded file content"
}
}
Note that the iso-8859-1 encoding is only an allowance for legacy data flows.
Should normally not be used.

"contentType" : "string" - MIME-type information referring to the data.

For "v03.report" topic notification messages the following addtional
Expand Down
18 changes: 17 additions & 1 deletion docs/source/fr/CommentFaire/MiseANiveau.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,22 @@ Instructions d’installation
git
---

3.0.52
------

*CHANGEMENT* : argument messageCountMax supplémentaire au point d'entrée flowcb.gather().
lors de la mise en œuvre de rappels de flow pour les flux cédulés ou de remplacements d'poll, le
le point d'entrée de *gather* prend désormais un argument supplémentaire indiquant le maximum
nombre de messages que la routine doit retourner.

Pour être compatible avec les versions précédentes, on peut établir une valeur par défaut
sur le rassemblement ::

def gather(self, messageMaxCount=None) :

Avec la valeur par défaut, les plugins sont compatible avec les version précédentes.


3.0.51
------

Expand Down Expand Up @@ -295,7 +311,7 @@ V2 to Sr3
e fileEvents
events fileEvents
exchange_split exchangeSplit
file_time_limit nodupe_fileAgeMax
file_time_limit fileAgeMax
hb_memory_baseline_file MemoryBaseLineFile
hb_memory_max MemoryMax
hb_memory_multiplier MemoryMultiplier
Expand Down
4 changes: 2 additions & 2 deletions docs/source/fr/Explication/AssurerLaLivraison.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ Tableau de Inflight
+-------------+---------------------------------------+--------------------------------------+
| |Âge minimum (temps de modification) |Dernier choix, ne garantit un délai |
| entier |du fichier avant que le transfer soit |que si aucun autre moyen peut servir |
| (mtime) |considéré Complèté. | |
| (mtime) |considéré Complèté. (fileAgeMin pareil)| |
| | |Réception de ceux qui ne coopèrent pas|
| |Retard tous les avis | |
| fileAgeMin |Retarde tous les avis | |
| |Vulnérable aux pannes de réseau. | (choix acceptable pour PDS) |
| |Vulnérable aux horloges en désaccord | |
+-------------+---------------------------------------+--------------------------------------+
Expand Down
14 changes: 8 additions & 6 deletions docs/source/fr/Explication/GuideLigneDeCommande.rst
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ Afficher tous les paramètres de configuration (le résultat de toutes les analy
'message_ttl': 0,
'mirror': True,
'msg_total_interval': '0',
'nodupe_fileAgeMax': 0,
'fileAgeMax': 0,
'nodupe_ttl': 0,
'overwrite': True,
'permCopy': True,
Expand Down Expand Up @@ -960,8 +960,10 @@ Ces options définissent les fichiers pour lesquels l’utilisateur souhaite êt
- **accept <regexp pattern> [rename=] (doit être défini)**
- **reject <regexp pattern> (facultatif)**
- **permDefault <integer> (par défaut: 0o400)**
- **nodupe_fileAgeMax <duration> (par défaut 30d)**
- **fileAgeMax <duration> (par défaut 30d)**

fileAgeMax doit être inférieur à nodupe_ttl lors de l'utilisation de la suppression des doublons,
pour éviter la réingestion de fichiers obsolètes une fois partie du cache nodupe.

L’option *filename* peut être utilisée pour définir un changement de nom global pour les produits.
Ex.:
Expand Down Expand Up @@ -1024,12 +1026,12 @@ Comme pour tous les autres composants, l’option **vip** peut être utilisée p
qu’un poll doit être actif sur seulement un seul nœud d’un cluster. Notez que quand
d’autres nœuds participant au poll et wu’ils n’ont pas le vip, ils

les fichiers qui sont plus vieux que nodupe_fileAgeMax sont ignorés. Cela
les fichiers qui sont plus vieux que fileAgeMax sont ignorés. Cela
peut être modifié à n’importe quelle limite de temps spécifiée dans les configurations en utilisant
l’option *nodupe_fileAgeMax <duration>*. Par défaut, dans les composants
l’option *fileAgeMax <duration>*. Par défaut, dans les composants
autre que poll, cette option est désactivé en étant défini à zéro (0). Comme il s’agit d’une
option de durée, les unités sont en secondes par défaut, mais il est possible de definir l'option
en utilisant des minutes, heures, jours ou des semaines. Dans la composante de poll, nodupe_fileAgeMax
en utilisant des minutes, heures, jours ou des semaines. Dans la composante de poll, fileAgeMax
est défini à 30 jours par défaut.

Sondage avancé (Advanced Polling)
Expand Down Expand Up @@ -1118,7 +1120,7 @@ Le protocle de notification est défini ici `sr3_post(7) <../Reference/sr3_post.
**poll** se connecte à un *broker*. À toutes les secondes de *sleep*, il se connecte à
une *pollUrl* (sftp, ftp, ftps). Pour chacun des *directory* définis, les contenus sont listés.
Le poll est seulement destinée à être utilisée pour les fichiers récemment modifiés.
L’option *nodupe_fileAgeMax* élimine les fichiers trop anciens. Lorsqu’un fichier correspondant
L’option *fileAgeMax* élimine les fichiers trop anciens. Lorsqu’un fichier correspondant
à un modèle donné est trouvé by *accept*, **poll** crée un message de notification pour ce produit.

Le message est ensuite verifié dans la cache dupliqué (limité en temps par l'option
Expand Down
Loading

0 comments on commit 4921fb1

Please sign in to comment.