-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduces external depth (#74) & a few fixes (incl. #69) #146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work. Some nitpicks here and there. Thanks a lot
You need to fix the coding style (use rustfmt) |
Please rebase on top of |
526a5c5
to
1be1f85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase your branch on top of master
and squash your commit into one or 2 commits
@marchellodev Can you still work on this? If not, I can rebase and add tests for you |
@Skallwar I would really appreciate it! :) I tried to do that a few days ago, but I always stumbled upon some errors. I'm kinda new to git, especially to those sophisticated operations Thanks again! |
Introduces --edepth flag for external domain depth Transforms urls that start with `//` into full https links for more accurate detection of external links Fixes panic when url starts with `///` Readme: Added explanation of the `--edepth` flag --edepth -> --ext-depth Refactors code to avoid repetition (normalizing urls) Improved code documentation (url normalization) README.md upd: --ext-depth Minor docs changes (--depth) rustfmt coding style Updated `--ext_depth` command docs Small code documentation improvements Co-authored-by: CohenArthur <[email protected]> Co-authored-by: Esteban Blanc <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #146 +/- ##
==========================================
+ Coverage 62.54% 62.62% +0.07%
==========================================
Files 16 17 +1
Lines 558 610 +52
==========================================
+ Hits 349 382 +33
- Misses 209 228 +19
|
@CohenArthur are you ok with this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good :D Thanks a lot @marchellodev
@marchellodev Thanks again, excellent work |
Motivation
A lot of modern websites rely on external domains (usually referred to as cnd domains) for their css, js, images, and other resources. Since SuckIT does not yet support downloading data from external domains (except for the bug when
//en.wikipedia.org
is treated as a relative path, (which I fixed)), it is impossible to properly download big and complex websites (#74).Also, this patch fixes panic when trying to parse urls like
///tools.wmflabs.org/
, which returnsEmpty host
error. I encountered this trying to download wikipedia. So, I think this PR should also close #69Notes
I almost have no experience with Rust, and I haven't yet implemented tests for the changes (I'm not really sure what is the best way to do this). So, please look at the code with extra scrutiny :). However, I have tested it on a few websites, and everything seems to work properly.
Also,
--edepth
(external depth) does not have a shortcut, since-e
is used for excluding pattern. I'm not sure how this parameter should be renamed in order for shortcut to exist