Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"NoneType" in title_el.get("href") when scraping kleinanzeigen #515

Open
alvarnydev opened this issue Jan 17, 2024 · 5 comments
Open

"NoneType" in title_el.get("href") when scraping kleinanzeigen #515

alvarnydev opened this issue Jan 17, 2024 · 5 comments

Comments

@alvarnydev
Copy link

alvarnydev commented Jan 17, 2024

Hi you lovely people!

I currently run into an issue when scraping kleinanzeigen because the bot seems to have trouble getting the link from the current listing it parses over, sometimes. It works for a while and eventually breaks. Looks like this for me:

Screenshot 2024-01-17 at 13 06 53

I looked through the existing and past issues and didn't find anything similar. Nevertheless, have you guys maybe seen this before? I run the Docker image of flathunter using docker compose on an Ubuntu 22 machine with the following config:

wghunter:
    build: .
    platform: linux/amd64
    command: python flathunt.py
    restart: always
    environment:
      - FLATHUNTER_TARGET_URLS=https://www.kleinanzeigen.de/s-frankfurt-am-main/wg/k0l4292;https://www.wg-gesucht.de/wg-zimmer-in-Frankfurt-am-Main.41.0.1.0.html?csrf_token=bf76f1e4c7392fd9aeadc109872a8fb14038151b&offer_filter=1&city_id=41&sort_order=0&noDeact=1&categories%5B%5D=0&rMax=1000&wgMxT=3&wgAge=28&wgSmo=2&exc=2&img_only=1
      - FLATHUNTER_DATABASE_LOCATION=./dbs/wgs/
      - FLATHUNTER_LOOP_PERIOD_SECONDS=120
      - FLATHUNTER_MESSAGE_FORMAT={title} \#CR# > Zimmer {rooms} \#CR# > Größe {size} \#CR# > Preis {price} \#CR# > Ort {address} \#CR# > Link {url}
      - FLATHUNTER_NOTIFIERS=telegram
      - FLATHUNTER_TELEGRAM_BOT_TOKEN=<...>
      - FLATHUNTER_TELEGRAM_RECEIVER_IDS=<...>
      - FLATHUNTER_HEADLESS_BROWSER=yes
    volumes:
      - ./:/usr/src/app
@codders
Copy link

codders commented Jan 17, 2024

Hi @alvarnydev,

I've not seen that before, no. It seems to pick out the title elements (at least on my crawls) without complaining. Looks like the search for the titel_el fails. How does the HTML look there?

@alvarnydev
Copy link
Author

Hey thanks for the comment. I haven't really looked into it much further because the docker compose config just restarts and works fine from there, until it eventually crashes again, in perpetuum. When I have the time I'll look into it more

@PlanetDyna
Copy link

Unfourtunately got the same problem.

@zahnech
Copy link

zahnech commented Sep 26, 2024

same issue
изображение

@jukoson
Copy link

jukoson commented Sep 26, 2024

Here's how I parse Kleinanzeigen. Maybe it helps in providing a fix:

expose_ids = soup.find_all("article", class_="aditem")
for x, expose in enumerate(expose_ids):
    title = expose.find(class_="ellipsis")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants