Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove rstr dependency #755

Merged
merged 4 commits into from
Feb 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ jobs:
- name: lint changed and added files
if: steps.changed-files.outputs.all_changed_files
run: |
pylint --fail-under 9.5 ${{ steps.changed-files.outputs.all_changed_files }}
pylint ${{ steps.changed-files.outputs.all_changed_files }}
- name: Run tests and collect coverage
run: pytest tests/unit --cov=logprep --cov-report=xml
- name: Upload coverage reports to Codecov with GitHub Action
Expand Down
9 changes: 9 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,15 @@ repos:
- id: check-toml
- id: debug-statements
- id: no-commit-to-branch
- repo: local
hooks:
- id: pylint
name: pylint
entry: pylint
language: system
types: [python]
require_serial: true
args: ["--rcfile=./pyproject.toml"]
- repo: https://github.com/psf/black
rev: 25.1.0
hooks:
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
### Improvements
* removes `colorama` dependency
* reimplemented the rule loading mechanic
* removes `rstr` dependency

### Bugfix
* fixes a bug with lucene regex and parentheses
Expand Down
16 changes: 10 additions & 6 deletions logprep/connector/http/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@
.. security-best-practice::
:title: Http Input Connector - Authentication
When using basic auth with the http input connector the following points should be taken into account:
When using basic auth with the http input connector
the following points should be taken into account:
- basic auth must only be used with strong passwords
- basic auth must only be used with TLS encryption
- avoid to reveal your plaintext secrets in public repositories
Expand Down Expand Up @@ -90,7 +91,6 @@
import falcon.asgi
import msgspec
import requests
import rstr
from attrs import define, field, validators
from falcon import ( # pylint: disable=no-name-in-module
HTTP_200,
Expand All @@ -103,7 +103,7 @@

from logprep.abc.input import FatalInputError, Input
from logprep.metrics.metrics import CounterMetric, GaugeMetric
from logprep.util import http
from logprep.util import http, rstr
from logprep.util.credentials import CredentialsFactory

logger = logging.getLogger("HTTPInput")
Expand Down Expand Up @@ -212,6 +212,7 @@ class HttpEndpoint(ABC):
Includes authentication credentials, if unset auth is disabled
"""

# pylint: disable=too-many-arguments,too-many-positional-arguments
def __init__(
self,
messages: mp.Queue,
Expand Down Expand Up @@ -352,9 +353,11 @@ class Config(Input.Config):
.. security-best-practice::
:title: Uvicorn Webserver Configuration
:location: uvicorn_config
:suggested-value: uvicorn_config.access_log: true, uvicorn_config.server_header: false, uvicorn_config.data_header: false
:suggested-value: uvicorn_config.access_log: true,
uvicorn_config.server_header: false, uvicorn_config.data_header: false
Additionally to the below it is recommended to configure `ssl on the metrics server endpoint
Additionally to the below it is recommended to configure
`ssl` on the metrics server endpoint
<https://www.uvicorn.org/settings/#https>`_
.. code-block:: yaml
Expand Down Expand Up @@ -497,7 +500,8 @@ def shut_down(self):
def health_endpoints(self) -> List[str]:
"""Returns a list of endpoints for internal healthcheck
the endpoints are examples to match against the configured regex enabled
endpoints. The endpoints are normalized to match the regex patterns and this ensures that the endpoints should not be too long
endpoints. The endpoints are normalized to match the regex patterns and
this ensures that the endpoints should not be too long
"""
normalized_endpoints = (endpoint.replace(".*", "b") for endpoint in self._config.endpoints)
normalized_endpoints = (endpoint.replace(".+", "b") for endpoint in normalized_endpoints)
Expand Down
60 changes: 60 additions & 0 deletions logprep/util/rstr/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
"""rstr - Generate random strings from regular expressions."""

# Copyright (c) 2011, Leapfrog Direct Response, LLC
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of the Leapfrog Direct Response, LLC, including
# its subsidiaries and affiliates nor the names of its
# contributors, may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL LEAPFROG DIRECT
# RESPONSE, LLC, INCLUDING ITS SUBSIDIARIES AND AFFILIATES, BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
# BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
# OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
# IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# source: https://github.com/leapfrogonline/rstr

from logprep.util.rstr.xeger import Xeger

Rstr = Xeger
_default_instance = Rstr()

rstr = _default_instance.rstr
xeger = _default_instance.xeger


# This allows convenience methods from rstr to be accessed at the package
# level, without requiring the user to instantiate an Rstr() object.
printable = _default_instance.printable
letters = _default_instance.letters
uppercase = _default_instance.uppercase
lowercase = _default_instance.lowercase
digits = _default_instance.digits
punctuation = _default_instance.punctuation
nondigits = _default_instance.nondigits
nonletters = _default_instance.nonletters
whitespace = _default_instance.whitespace
nonwhitespace = _default_instance.nonwhitespace
normal = _default_instance.normal
word = _default_instance.word
nonword = _default_instance.nonword
unambiguous = _default_instance.unambiguous
postalsafe = _default_instance.postalsafe
urlsafe = _default_instance.urlsafe
domainsafe = _default_instance.domainsafe
193 changes: 193 additions & 0 deletions logprep/util/rstr/rstr_base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
"""This module provides the RstrBase class for generating random strings
from various alphabets.
"""

# Copyright (c) 2011, Leapfrog Direct Response, LLC
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of the Leapfrog Direct Response, LLC, including
# its subsidiaries and affiliates nor the names of its
# contributors, may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL LEAPFROG DIRECT
# RESPONSE, LLC, INCLUDING ITS SUBSIDIARIES AND AFFILIATES, BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
# BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
# OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
# IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# source: https://github.com/leapfrogonline/rstr

import itertools
import string
import typing
from functools import partial
from typing import Iterable, List, Mapping, Optional, Sequence, TypeVar

_T = TypeVar("_T")


if typing.TYPE_CHECKING:
from random import Random
from typing import Protocol

class _PartialRstrFunc(Protocol):

def __call__(
self,
start_range: Optional[int] = ...,
end_range: Optional[int] = ...,
include: str = ...,
exclude: str = ...,
) -> str: ...


ALPHABETS: Mapping[str, str] = {
"printable": string.printable,
"letters": string.ascii_letters,
"uppercase": string.ascii_uppercase,
"lowercase": string.ascii_lowercase,
"digits": string.digits,
"punctuation": string.punctuation,
"nondigits": string.ascii_letters + string.punctuation,
"nonletters": string.digits + string.punctuation,
"whitespace": string.whitespace,
"nonwhitespace": string.printable.strip(),
"normal": string.ascii_letters + string.digits + " ",
"word": string.ascii_letters + string.digits + "_",
"nonword": "".join(
set(string.printable).difference(string.ascii_letters + string.digits + "_")
),
"unambiguous": "".join(set(string.ascii_letters + string.digits).difference("0O1lI")),
"postalsafe": string.ascii_letters + string.digits + " .-#/",
"urlsafe": string.ascii_letters + string.digits + "-._~",
"domainsafe": string.ascii_letters + string.digits + "-",
}


class RstrBase:
"""Create random strings from a variety of alphabets.
The alphabets for printable(), uppercase(), lowercase(), digits(), and
punctuation() are equivalent to the constants by those same names in the
standard library string module.
nondigits() uses an alphabet of string.letters + string.punctuation
nonletters() uses an alphabet of string.digits + string.punctuation
nonwhitespace() uses an alphabet of string.printable.strip()
normal() uses an alphabet of string.letters + string.digits + ' ' (the
space character)
postalsafe() is based on USPS Publication 28 - Postal Addressing Standards:
http://pe.usps.com/text/pub28/pub28c2.html
The characters allowed in postal addresses are letters and digits, periods,
slashes, the pound sign, and the hyphen.
urlsafe() uses an alphabet of unreserved characters safe for use in URLs.
From section 2.3 of RFC 3986: "Characters that are allowed in a URI but
do not have a reserved purpose are called unreserved. These include
uppercase and lowercase letters, decimal digits, hyphen, period,
underscore, and tilde.
domainsafe() uses an alphabet of characters allowed in hostnames, and
consequently, in internet domains: letters, digits, and the hyphen.
"""

def __init__(self, _random: "Random", **custom_alphabets: str) -> None:
super().__init__()
self._random = _random
self._alphabets = dict(ALPHABETS)
for alpha_name, alphabet in custom_alphabets.items():
self.add_alphabet(alpha_name, alphabet)

def add_alphabet(self, alpha_name: str, characters: str) -> None:
"""Add an additional alphabet to an Rstr instance and make it available
via method calls.
"""
self._alphabets[alpha_name] = characters

def __getattr__(self, attr: str) -> "_PartialRstrFunc":
if attr in self._alphabets:
return partial(self.rstr, self._alphabets[attr])
message = f"Rstr instance has no attribute: {attr}"
raise AttributeError(message)

def sample_wr(self, population: Sequence[str], k: int) -> List[str]:
"""Samples k random elements (with replacement) from a population"""
return [self._random.choice(population) for i in itertools.repeat(None, k)]

def rstr(
self,
alphabet: Iterable[str],
start_range: Optional[int] = None,
end_range: Optional[int] = None,
include: Sequence[str] = "",
exclude: Sequence[str] = "",
) -> str:
"""Generate a random string containing elements from 'alphabet'
By default, rstr() will return a string between 1 and 10 characters.
You can specify a second argument to get an exact length of string.
If you want a string in a range of lengths, specify the start and end
of that range as the second and third arguments.
If you want to make certain that particular characters appear in the
generated string, specify them as "include".
If you want to *prevent* certain characters from appearing, pass them
as 'exclude'.
"""
k = None
same_characters = set(include).intersection(exclude)
if same_characters:
message = (
"include and exclude parameters contain "
f"same character{'s' if len(same_characters) > 1 else ''} "
f"({', '.join(same_characters)})"
)
raise SameCharacterError(message)

popul = [char for char in list(alphabet) if char not in list(exclude)]

if end_range is None:
if start_range is None:
start_range, end_range = (1, 10)
else:
k = start_range
elif start_range is None:
start_range = 1

if end_range:
k = self._random.randint(start_range, end_range)
# Make sure we don't generate too long a string
# when adding 'include' to it:
k = k - len(include)

result = self.sample_wr(popul, k) + list(include)
self._random.shuffle(result)
return "".join(result)


class SameCharacterError(ValueError):
"""Raised when include and exclude parameters contain the same character"""
Loading
Loading