"NoneType" in title_el.get("href") when scraping kleinanzeigen #515

alvarnydev · 2024-01-17T12:12:21Z

Hi you lovely people!

I currently run into an issue when scraping kleinanzeigen because the bot seems to have trouble getting the link from the current listing it parses over, sometimes. It works for a while and eventually breaks. Looks like this for me:

I looked through the existing and past issues and didn't find anything similar. Nevertheless, have you guys maybe seen this before? I run the Docker image of flathunter using docker compose on an Ubuntu 22 machine with the following config:

wghunter:
    build: .
    platform: linux/amd64
    command: python flathunt.py
    restart: always
    environment:
      - FLATHUNTER_TARGET_URLS=https://www.kleinanzeigen.de/s-frankfurt-am-main/wg/k0l4292;https://www.wg-gesucht.de/wg-zimmer-in-Frankfurt-am-Main.41.0.1.0.html?csrf_token=bf76f1e4c7392fd9aeadc109872a8fb14038151b&offer_filter=1&city_id=41&sort_order=0&noDeact=1&categories%5B%5D=0&rMax=1000&wgMxT=3&wgAge=28&wgSmo=2&exc=2&img_only=1
      - FLATHUNTER_DATABASE_LOCATION=./dbs/wgs/
      - FLATHUNTER_LOOP_PERIOD_SECONDS=120
      - FLATHUNTER_MESSAGE_FORMAT={title} \#CR# > Zimmer {rooms} \#CR# > Größe {size} \#CR# > Preis {price} \#CR# > Ort {address} \#CR# > Link {url}
      - FLATHUNTER_NOTIFIERS=telegram
      - FLATHUNTER_TELEGRAM_BOT_TOKEN=<...>
      - FLATHUNTER_TELEGRAM_RECEIVER_IDS=<...>
      - FLATHUNTER_HEADLESS_BROWSER=yes
    volumes:
      - ./:/usr/src/app

The text was updated successfully, but these errors were encountered:

codders · 2024-01-17T16:20:47Z

Hi @alvarnydev,

I've not seen that before, no. It seems to pick out the title elements (at least on my crawls) without complaining. Looks like the search for the titel_el fails. How does the HTML look there?

alvarnydev · 2024-01-17T16:23:36Z

Hey thanks for the comment. I haven't really looked into it much further because the docker compose config just restarts and works fine from there, until it eventually crashes again, in perpetuum. When I have the time I'll look into it more

PlanetDyna · 2024-03-01T10:39:25Z

Unfourtunately got the same problem.

zahnech · 2024-09-26T09:18:16Z

same issue

jukoson · 2024-09-26T09:39:48Z

Here's how I parse Kleinanzeigen. Maybe it helps in providing a fix:

expose_ids = soup.find_all("article", class_="aditem")
for x, expose in enumerate(expose_ids):
    title = expose.find(class_="ellipsis")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"NoneType" in title_el.get("href") when scraping kleinanzeigen #515

"NoneType" in title_el.get("href") when scraping kleinanzeigen #515

alvarnydev commented Jan 17, 2024 •

edited

Loading

codders commented Jan 17, 2024

alvarnydev commented Jan 17, 2024

PlanetDyna commented Mar 1, 2024

zahnech commented Sep 26, 2024

jukoson commented Sep 26, 2024 •

edited

Loading

"NoneType" in title_el.get("href") when scraping kleinanzeigen #515

"NoneType" in title_el.get("href") when scraping kleinanzeigen #515

Comments

alvarnydev commented Jan 17, 2024 • edited Loading

codders commented Jan 17, 2024

alvarnydev commented Jan 17, 2024

PlanetDyna commented Mar 1, 2024

zahnech commented Sep 26, 2024

jukoson commented Sep 26, 2024 • edited Loading

alvarnydev commented Jan 17, 2024 •

edited

Loading

jukoson commented Sep 26, 2024 •

edited

Loading