Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add writing and updating JSON database #7

Merged
merged 56 commits into from
Oct 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
741e6ed
chore: :see_no_evil: Return ignoring 'dist' folder after deleting old…
hotenov Aug 4, 2021
8da9247
chore: Add blank files for future code structure
hotenov Aug 11, 2021
3aa7c68
chore: :heavy_plus_sign: Add requests, beautifulsoup4, lxml for parsi…
hotenov Aug 15, 2021
df183a7
chore: :heavy_plus_sign: Add 'requests-file' to dev deps for future u…
hotenov Sep 1, 2021
a5ea064
chore: :heavy_plus_sign: Add requests-mock 1.9.3 to dev dependencies
hotenov Sep 12, 2021
2632ddf
feat(parser): :sparkles: Add function to parse archive page
hotenov Sep 13, 2021
046d2fd
test(parser): :white_check_mark: Add test for checking mocked respons…
hotenov Sep 13, 2021
82c6595
chore: Add test HTML files for several episodes
hotenov Sep 17, 2021
2779b33
refactor(parser): :construction: Update two functions for parsing all…
hotenov Sep 17, 2021
35880be
chore: :wrench: Add mapping dict for 4 links and their text
hotenov Sep 17, 2021
bb6cc74
test: :construction: Add PoC test for mocking several episode pages
hotenov Sep 17, 2021
1fc6157
chore: :wrench: Update mypy settings in pyproject.toml
hotenov Sep 20, 2021
06a9fdc
chore: :wrench: Rename Tuple with irrelevant links
hotenov Sep 20, 2021
9a804b7
chore: :recycle: Improve typings and function names
hotenov Sep 20, 2021
15b4c02
test(parser): :white_check_mark: Add several general tests for parsin…
hotenov Sep 20, 2021
18c091f
ci: :wrench: Add installing of 'requests_mock' into 'tests' Nox session
hotenov Sep 20, 2021
a1acae0
ci: :wrench: Exclude HTML files from 'pre-commit' hooks
hotenov Sep 20, 2021
9b02dea
style: :art: Commit changes which were modified by 'pre-commit' hooks
hotenov Sep 20, 2021
1fafe55
style: :art: Fix flake8 errors
hotenov Sep 20, 2021
b32e2ed
chore: :heavy_plus_sign: Add flake8 plugins to dev deps
hotenov Sep 20, 2021
f727871
chore: :wrench: Change flake8 config: max-line-length = 120 and ignor…
hotenov Sep 20, 2021
6207855
ci: :wrench: Add installation of 'requests_mock' into 'typeguard' Nox…
hotenov Sep 20, 2021
375a777
chore: :heavy_minus_sign: Remove unused 'requests-file' from dev-deps
hotenov Sep 20, 2021
65b5ff9
fix(parser): Make links texts safe for windows path
hotenov Sep 22, 2021
6fd5b85
test(parser): :white_check_mark: Update test with getting link text b…
hotenov Sep 22, 2021
add8793
chore: :heavy_plus_sign: Add 'rope' package in dev deps for refactori…
hotenov Sep 25, 2021
10a5126
feat(parser): :sparkles: Parse episode page (date and episode number)
hotenov Sep 25, 2021
99942e2
refactor(parser): :recycle: Add returning of URL final location durin…
hotenov Sep 26, 2021
7e55d60
test(parser): :white_check_mark: Add tests to check final location af…
hotenov Sep 26, 2021
eae8379
feat(parser): :sparkles: Add index generating for post URL
hotenov Sep 26, 2021
e1b6617
test(parser): :white_check_mark: Add two tests to check index generation
hotenov Sep 26, 2021
3d31f30
feat: Add 'admin_note' attribute to LepEpisode class
hotenov Sep 27, 2021
ad7ec11
feat(parser): Add logic for bad response of parsing page
hotenov Sep 27, 2021
0f61349
test(parser): :white_check_mark: Update tests taking into account res…
hotenov Sep 27, 2021
70f3a3b
style(parser): :label: Fix 'mypy' and 'pre-commit' errors
hotenov Sep 27, 2021
76def0f
feat(parser): :sparkles: Add 'parsing_utc' attribute for LepEpisode c…
hotenov Sep 28, 2021
2eaaf24
test(parser): :white_check_mark: Update test to check parsing all lin…
hotenov Sep 28, 2021
271f646
feat(parser): :sparkles: Add function to parsing links to episode aud…
hotenov Sep 29, 2021
f487755
test(parser): :white_check_mark: Add minimum sufficient tests (to sat…
hotenov Sep 29, 2021
5ad3fe7
refactor(parser): :recycle: Unify parsing part of archive page (tag <…
hotenov Sep 29, 2021
07e3202
style: :art: Fix 'pre-commit' errors for imports order
hotenov Sep 29, 2021
0a5e435
refactor: :recycle: Change default value for 'audios' attribute to None
hotenov Sep 29, 2021
64c8602
perf(parser): :zap: Change algorithm to extract episode links and the…
hotenov Oct 1, 2021
80e4ea9
test(parser): :white_check_mark: Update tests according to new archiv…
hotenov Oct 1, 2021
2769f42
test(parser): :white_check_mark: Add two tests to check parsing mp3 l…
hotenov Oct 6, 2021
48e3674
style: :pencil2: Fix wrong writing of 'non-episode' word
hotenov Oct 6, 2021
9b32d97
feat(parser): :sparkles: Add function for descending sorting of parse…
hotenov Oct 7, 2021
bb27150
test(parser): :white_check_mark: Add test to check episodes sorting
hotenov Oct 7, 2021
aea5b00
fix(parser): :bug: Change secondary key sorting to 'index'
hotenov Oct 7, 2021
3d65b69
chore: :wrench: Add JSON_DB_URL configuration parameter
hotenov Oct 19, 2021
1f45c2b
feat: :label: Add 'LepJsonEncoder' class for json dump operations
hotenov Oct 19, 2021
6863731
feat(parser): :sparkles: Add rough implementation of 'main' method wi…
hotenov Oct 19, 2021
580fe6b
test(parser): :white_check_mark: Add tests for writing and updating J…
hotenov Oct 19, 2021
b7d092d
style: :art: Fix imports by pre-commit
hotenov Oct 19, 2021
a49c741
Merge branch 'pre-release' into version-3
hotenov Oct 20, 2021
79bec07
Merge branch 'version-3' of github.com:hotenov/LEP-downloader into ve…
hotenov Oct 20, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ repos:
language: system
types: [text]
stages: [commit, push, manual]
exclude_types: [html]
exclude_types: [html, json]
- id: flake8
name: flake8
entry: flake8
Expand All @@ -54,9 +54,9 @@ repos:
language: system
types: [text]
stages: [commit, push, manual]
exclude_types: [html]
exclude_types: [html, json]
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v2.3.0
hooks:
- id: prettier
exclude_types: [html]
exclude_types: [html, json]
32 changes: 23 additions & 9 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Pygments = "^2.9.0"
requests-mock = "^1.9.3"
flake8-black = "^0.2.3"
flake8-import-order = "^0.18.1"
rope = "^0.20.1"

[tool.poetry.scripts]
lep-downloader = "lep_downloader.__main__:main"
Expand All @@ -63,7 +64,7 @@ source = ["lep_downloader"]

[tool.coverage.report]
show_missing = true
fail_under = 85
fail_under = 100

[tool.mypy]
strict = true
Expand Down
9 changes: 3 additions & 6 deletions src/lep_downloader/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@

ARCHIVE_URL = "https://hotenov.com"

JSON_DB_URL = "https://hotenov.com/some_json.json"

LOCAL_ARCHIVE_HTML = "2021-08-10_lep-archive-page-content-pretty.html"

SHORT_LINKS_MAPPING_DICT = {
Expand All @@ -23,9 +25,4 @@

EPISODE_LINK_RE = r"https?://((?P<short>wp\.me/p4IuUx-[\w-]+)|(teacherluke\.(co\.uk|wordpress\.com)/(?P<date>\d{4}/\d{2}/\d{2})/))"

LINK_TEXTS_MAPPING = {
"https://teacherluke.co.uk/2018/04/18/522-learning-english-at-summer-school-in-the-uk-a-rambling-chat-with-raphael-miller/": "522. Learning English at Summer School in the UK (A Rambling Chat with Raphael Miller)",
"https://teacherluke.co.uk/2017/08/14/website-content-lukes-criminal-past-zep-episode-185/": "[Website content] Luke’s Criminal Past (ZEP Episode 185)",
"https://teacherluke.co.uk/2017/05/26/i-was-invited-onto-the-english-across-the-pond-podcast/": "[Website content] I was invited onto the “English Across The Pond” Podcast",
"https://teacherluke.co.uk/2016/03/20/i-was-invited-onto-craig-wealands-weekly-blab-and-we-talked-about-comedy-video/": "[VIDEO] I was invited onto Craig Wealand’s weekly Blab, and we talked about comedy",
}
INVALID_PATH_CHARS_RE = r"[<>:\"/\\\\|?*]"
52 changes: 52 additions & 0 deletions src/lep_downloader/lep.py
Original file line number Diff line number Diff line change
@@ -1 +1,53 @@
"""LEP module for general logic and classes."""
import json
import typing as t


class LepEpisode(object):
"""LEP episode class."""

def __init__(
self,
episode: int = 0,
date: str = "2000-01-01T00:00:00+00:00",
url: str = "",
post_title: str = "",
post_type: str = "",
parsing_utc: str = "",
index: int = 0,
audios: t.Optional[t.List[t.List[str]]] = None,
admin_note: str = "",
) -> None:
"""Default instance of LepEpisode.

Args:
episode (int): Episode number.
date (str): Post datetime (default 2000-01-01T00:00:00+00:00).
url (str): Final location of post URL.
post_title (str): Post title, extracted from tag <a> and safe for windows path.
post_type (str): Post type ("AUDIO", "TEXT", etc.).
audios (list): List of links lists (for multi-part episodes).
parsing_utc (str): Parsing datetime in UTC timezone (with microseconds).
index (int): Parsing index: concatenation of URL date and increment (for several posts).
admin_note (str): Note for administrator and storing error message (for bad response)
"""
self.episode = episode
self.date = date
self.url = url
self.post_title = post_title
self.post_type = post_type
self.audios = audios
self.parsing_utc = parsing_utc
self.index = index
self.admin_note = admin_note


class LepJsonEncoder(json.JSONEncoder):
"""Custom JSONEncoder for LepEpisode objects."""

def default(self, obj: t.Any) -> t.Any:
"""Override 'default' method for encoding JSON objects."""
if isinstance(obj, LepEpisode):
return obj.__dict__
# Let the base class default method raise the TypeError
return json.JSONEncoder.default(self, obj)
Loading