Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set GET request header cookie #336

Merged
merged 1 commit into from
Mar 20, 2023

Conversation

BarisYazici
Copy link

This PR let's user to set GET request header cookie enabling with enables them to surpasss the bot detection script by immoscout24 #302. The cookie should be copied from a logged in session of a user. Once the cookie is compromised a new one should be created.

@BarisYazici BarisYazici temporarily deployed to test March 14, 2023 08:41 — with GitHub Actions Inactive
@BarisYazici BarisYazici temporarily deployed to test March 14, 2023 08:41 — with GitHub Actions Inactive
@BarisYazici BarisYazici temporarily deployed to test March 14, 2023 08:41 — with GitHub Actions Inactive
@codders
Copy link

codders commented Mar 14, 2023

Hey! Thanks so much for this - this looks great. A couple of questions I have.

  1. You say 'Copy the ImmoScout cookie' - can you say which cookie is important? Looking at the code, it seems like you copy the whole cookie header to the YAML file. Maybe you could be a bit more explicit about what is required in the comments.
  2. Have you tried logging in with the headless chrome webdriver? Does that also hit the bot detection? I imagine it might be more sustainable to store the Immoscout account username and password in the config and have flathunter login if the bot detection doesn't mess that up. But maybe that doesn't work or maybe it's a bunch of effort to find out.

Thanks again for the contribution!

@codecov
Copy link

codecov bot commented Mar 14, 2023

Codecov Report

Merging #336 (54c58d4) into main (fb62871) will increase coverage by 0.05%.
The diff coverage is 40.00%.

❗ Current head 54c58d4 differs from pull request most recent head 74ca7e4. Consider uploading reports for the commit 74ca7e4 to get more accurate results

@@            Coverage Diff             @@
##             main     #336      +/-   ##
==========================================
+ Coverage   64.77%   64.82%   +0.05%     
==========================================
  Files          37       37              
  Lines        1953     1959       +6     
  Branches      266      268       +2     
==========================================
+ Hits         1265     1270       +5     
  Misses        623      623              
- Partials       65       66       +1     
Impacted Files Coverage Δ
flathunter/crawl_immobilienscout.py 25.39% <ø> (ø)
flathunter/abstract_crawler.py 39.07% <40.00%> (-0.39%) ⬇️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@BarisYazici
Copy link
Author

You are welcome hopefully it will be helpful.
I didn't test the whole cases. For the username password login idea, I implemented sth like that as well by simulating the browser clicks on playwright. It worked fine for. But I don't know how stable it will be, the bot detection might also forbid the login page as well.
For the first question, as you pointed I copied the whole cookie to the config.yaml. I am not sure which cookie corresponds to the user id. I could look into that as well. But I should say that this solution is also not 100% working. It gets detected as a bot for some cases but for some not.

@codders
Copy link

codders commented Mar 15, 2023

Okay. Would be great if you could update the comments / documentation to make that a little clearer.

The PR is currently blocked because of a couple of linter issues:

************* Module flathunter.crawl_wggesucht
flathunter/crawl_wggesucht.py:94:4: W0221: Number of parameters was 6 in 'Crawler.get_soup_from_url' and is now 5 in overriding 'CrawlWgGesucht.get_soup_from_url' method (arguments-differ)
************* Module flathunter.abstract_crawler
flathunter/abstract_crawler.py:66:0: C0301: Line too long (107/100) (line-too-long)
flathunter/abstract_crawler.py:66:4: R0913: Too many arguments (6/5) (too-many-arguments)

Do you want to take a look at those, or should I?

@BarisYazici BarisYazici temporarily deployed to test March 20, 2023 18:40 — with GitHub Actions Inactive
@BarisYazici BarisYazici temporarily deployed to test March 20, 2023 18:40 — with GitHub Actions Inactive
@BarisYazici BarisYazici temporarily deployed to test March 20, 2023 18:40 — with GitHub Actions Inactive
@BarisYazici BarisYazici temporarily deployed to test March 20, 2023 18:40 — with GitHub Actions Inactive
@codders
Copy link

codders commented Mar 20, 2023

Awesome - thanks so much for the fixes! 🎉

@codders codders merged commit 9788996 into flathunters:main Mar 20, 2023
@prismspecs
Copy link

May I ask how the cookie should be formatted? As in,

immoscout_cookie: "2:pJP9F ... OU4Q="
or without quotes? or should it have the ¨value:" preceding it?

@BarisYazici
Copy link
Author

May I ask how the cookie should be formatted? As in,
immoscout_cookie: "2:pJP9F ... OU4Q="
or without quotes? or should it have the ¨value:" preceding it?

with the quotes. You don't need the value

@infctr
Copy link

infctr commented Mar 25, 2023

Hi! Would it be possible to add IS24 cookie to env variables so that it can be used with gcloud? 🙏
config.yaml seems to be ignored with gcloud builds

@codders
Copy link

codders commented Mar 30, 2023

@infctr Is this request now redundant?

@infctr
Copy link

infctr commented Mar 30, 2023

You're right! Got it working with gcloud

@BarisYazici
Copy link
Author

@infctr I kind of forced the google cloud to use the config.yaml by removing it from the dockerignore. I guess you can also add it the immoscout_cookie as environment variable in the Env class in the config.py file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants