Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

Various issues in Python package metadata URLs #18867

Closed
jayvdb opened this issue Feb 8, 2020 · 4 comments
Closed

Various issues in Python package metadata URLs #18867

jayvdb opened this issue Feb 8, 2020 · 4 comments

Comments

@jayvdb
Copy link
Contributor

jayvdb commented Feb 8, 2020

I've done a crawl of a significant subset of Python related packages and their metadata, and checked the https status of their websites. I am interested in which of these are problems suitable for fixing with https-everywhere. Higher than average technical skills can be assumed as these domains are/were used by software producers, and I intend to inform some of the developers, but many are old unmaintained packages that are still heavily used, where I doubt any fix would be prompt, so opinion on how long to wait for fixes upstream before adding https-everywhere rules would be appreciated. I dont see much such advice in the docs about process for additions/modifications. At what point do users needs outweigh trying to get upstream to fix the problem? e.g. one of the items below is packages.python.org , which is quite a prominent site, and I've reported it upstream but not seen any response yet (I will try it fast tracked by reporting it closer to the admins).

parked domains with https problems

https very slow while http ok

nothing on https (I guess these are out of scope for https everywhere?)

wrong cert

Github Pages

readthedocs.io

Self signed

Expired

strict cert verify failure - ok in Firefox, not ok with python requests secure mode

@jayvdb jayvdb changed the title Various URLs issues in Python package metadata Various issues in Python package metadata URLs Feb 8, 2020
@jayvdb
Copy link
Contributor Author

jayvdb commented Feb 8, 2020

A bit of analysis indicates http-everywhere doesnt like explicitly downgrading from https to http when necessary. I am drawing that conclusion mostly from the fact that

  • the checker has identified errors on https, and the entire ruleset has been disabled rather than adding rules to downgrade from https to http.
  • there are very few https->http rules

There are zero enabled rules which match from="https:... afaics - the following are all disabled/false positives I believe.

> git grep -n '\(from="^https\|to="http:\)'
Cato-Institute.xml:44:  <!-- <rule from="^https://www\.cato\.org/([^/]+/?(?:[^/]+/?)?)?$"
Cato-Institute.xml:45:          to="http://www.cato.org/$1" downgrade="1" /> -->
Epson.xml:71:   <rule from="^https://(?:www\.)?epson\.com/((?:[a-zA-Z][a-zA-Z\d]+){1})$"
Fasthosts.xml:34:               <rule from="^https://www\.fasthosts\.co\.uk/js/track\.js"
Tesco.xml:177:  <!--rule from="^https://secure\.tesco\.com/"
Tesco.xml:178:          to="http://www.tesco.com/" downgrade="1" /-->
Yandex.xml:641: <!--rule from="^https://static-maps\.yandex\.ru/"
Yandex.xml:642:         to="http://static-maps.yandex.ru/" downgrade="1" /-->

There are 3518/24931 rules with exclusions, so there is a decent attempt at defining http-only resources, but it seems that is mostly occurring when there are parts of the same domain which has http->https rules.

> git grep 'exclusion pattern="^http:' | wc -l
3518
> ls | wc -l
24931

Am I deriving too much policy/process guidance from the existing dataset?

@pipboy96
Copy link
Contributor

@jayvdb We try to never touch HTTPS requests if even possible, and have explicitly disallowed downgrading HTTPS to HTTP quite a long time ago.

@jayvdb
Copy link
Contributor Author

jayvdb commented Feb 11, 2020

Ok, good to know that downgrading rules are not permitted - not surprising. That eliminates one set of policy questions. In my next run of the job, I'll pay more attention to where my logic achieves the http->https transition successfully, and try to get them into the rulesets where missing.

Many of the items I listed above can be fixed by replacing http/https at custom domain with https at github.io/rtd.io/other custom domain, and I see this happening frequently in existing rules. So there is still the policy/process question about how long to wait for 'upstream' to fix their website before adding rules here to bypass the problems.

@cjwelborn
Copy link

I closed the issue related to my domain because I was able to fix it. The combination of GoDaddy and GitHub-Pages was a pain for a long time when trying to enable HTTPS. The DNS management interface is minimal, and they expect you to be a DNS expert. Fortunately, there are better tutorials, bug reports, and help topics on the issue now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants