Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to add Wycombe / Chiltern Council #154

Closed
adamcarter81 opened this issue Jan 6, 2023 · 16 comments
Closed

Request to add Wycombe / Chiltern Council #154

adamcarter81 opened this issue Jan 6, 2023 · 16 comments
Labels
council request A new council request

Comments

@adamcarter81
Copy link

adamcarter81 commented Jan 6, 2023

Name of Council

Chiltern Council

Example Postcode

HP10 9TX

Additional Information

First page: https://chiltern.gov.uk/collection-dates requires to enter postcode, press Submit andthen choose house number, then submit
The next page gives the results

First page calls this URL after entering postcode:
https://chiltern.gov.uk/apiserver/postcode?callback=jQuery2240753028558035197_1673034357951&jsonrpc={"id":+25266114,"method":+"postcodeSearch","params":+{"provider":+"",++"postcode":+"hp10+9tx"}}&_=1673034357952

It seems to set a session ID after choosing a postcode/house number, this is then set as a cookie and passed to the next page whcih displays the results for the next bin collection days (there is no calendar that I could find)

PS This plug in is awesome! Wish the councils would standardise the data and have a public API!

Edit: #38 looks to be the same website

@dp247 dp247 added the council request A new council request label Jan 7, 2023
@dp247
Copy link
Collaborator

dp247 commented Jan 7, 2023

Ah yes, Chiltern. Otherwise known as the bane of my Python skills 😆

@adamcarter81
Copy link
Author

I've been going nowhere for a while trying to get this to work in HA, I was so happy when I found this project! To only realise others have been here before :/

@dp247
Copy link
Collaborator

dp247 commented Jan 7, 2023

It's been a while since I looked at it kind, so I'll give it another go 😁

@preator67
Copy link
Contributor

preator67 commented Jan 27, 2023

@dp247 I might have a working implementation of this, utilising Selenium. It needs a little bit of work to bring it into line with the rest of the package, but I'm happy to do this if you don't already have something in the pipeline?

@dp247
Copy link
Collaborator

dp247 commented Jan 30, 2023

@preator67 I've delegated it to @robbrad to look at. I'm not entirely against the idea, but I'm not sure how selenium's dependencies would change/interfere with the project

@robbrad
Copy link
Owner

robbrad commented Jan 30, 2023

I'll take a look at this tomorrow night 👍

@robbrad
Copy link
Owner

robbrad commented Jan 31, 2023

Im not having the best fun with this one - the following should work as far as I can see but no result is returned - they must have some rate limiting setup

  def parse_data(self, page: str, **kwargs) -> dict:
      # Make a BS4 object
      #setup our session
      s = requests.Session()
      headers = {'Host': 'chiltern.gov.uk',
          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0',
          'Accept': 'text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01',
          'Accept-Language': 'en-GB,en;q=0.5',
          'Accept-Encoding': 'gzip, deflate, br',
          'X-Requested-With': 'XMLHttpRequest',
          'Connection': 'keep-alive',
          'Sec-Fetch-Dest': 'empty',
          'Sec-Fetch-Mode': 'cors',
          'Sec-Fetch-Site': 'same-origin'}

      s.headers.update(headers)

      first_load = s.get('https://chiltern.gov.uk/collection-dates')

      soup = BeautifulSoup(first_load.text, features="html.parser")
      soup.prettify()
      action = soup.find('form', id='COPYOFECHOCOLLECTIONDATES_FORM').get('action')
      parsed_url = urlparse(action)
      captured_values = parse_qs(parsed_url.query)

      form_data = {
          'COPYOFECHOCOLLECTIONDATES_PAGESESSIONID': captured_values['pageSessionId'][0],
          'COPYOFECHOCOLLECTIONDATES_SESSIONID': captured_values['fsid'][0],
          'COPYOFECHOCOLLECTIONDATES_NONCE': captured_values['fsn'][0],
          'COPYOFECHOCOLLECTIONDATES_VARIABLES': 'e30=',
          'COPYOFECHOCOLLECTIONDATES_PAGENAME': 'ADDRESSSELECTION',
          'COPYOFECHOCOLLECTIONDATES_PAGEINSTANCE': '0',
          'COPYOFECHOCOLLECTIONDATES_ADDRESSSELECTION_ADDRESSSELECTIONPOSTCODE': 'HP10 9TX',
          'COPYOFECHOCOLLECTIONDATES_ADDRESSSELECTION_ADDRESSSELECTIONADDRESS': 14,
          'COPYOFECHOCOLLECTIONDATES_ADDRESSSELECTION_SELECTEDADDRESS':'31 STATION ROAD\nLOUDWATER\nHIGH WYCOMBE\nHP10 9TX',
          'COPYOFECHOCOLLECTIONDATES_ADDRESSSELECTION_UPRN':'100081167425',
          'COPYOFECHOCOLLECTIONDATES_FORMACTION_NEXT': 'COPYOFECHOCOLLECTIONDATES_ADDRESSSELECTION_NAV1'
          }
      inter_request = s.post(action, data=form_data)
      bin_data_page = s.get(f"https://chiltern.gov.uk/collection-dates?pageSessionId={captured_values['pageSessionId'][0]}&fsn={captured_values['fsn'][0]}")

      data = {"bins": []}

@preator67 - are you using https://selenium-python.readthedocs.io/ - as long as

  1. Its in Python
  2. It has an Intergration test
  3. It has unit test coverage
  4. And above all produces a JSON

Im more than happy to have selenium do the heavy lifting - do you want to submit a PR?

@preator67
Copy link
Contributor

@robbrad yes, that’s the one. What I have working ticks the boxes on 1 and 4. Give me a little time to fully address/check 2 and 3, and I’ll submit a PR

@robbrad
Copy link
Owner

robbrad commented Feb 4, 2023

I can help on 2/3 if you need it or get stuck 👍

Check contributing.md if you need guidance or check the Cheshire east council/previous PRs

I'm actually quite excited to see some selenium in action

@preator67
Copy link
Contributor

Sorry to be slow on this. I'm struggling with integration/unit tests, but this may be because I've had to go about things abnormally. @robbrad I would therefore some input on if this approach is workable with the existing framework.

Firstly, I've had to execute the file as:
python collect_data.py Chilterns https://chiltern.gov.uk/collection-dates -p "HP14 4LA" -n "HUGHENDEN MANOR, MANOR ROAD, HUGHENDEN VALLEY, HIGH WYCOMBE" -s SKIP_GET_URL

The number needs to be the address as it is appears on the council website, which isn't ideal. I've also had to add the SKIP_GET_URL argument so I can use a custom get_data function. This initially returns an error:
UnboundLocalError: local variable 'bin_data_dict' referenced before assignment
but this can be solved by assigning the else statement to bin_data_dict on line 60 of get_bin_data.py - although I'm not sure if this an acceptable change?

The following then produces a JSON in the correct format:

import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.options import Options
from uk_bin_collection.uk_bin_collection.common import *
from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClass


class CouncilClass(AbstractGetBinDataClass):
    """
    Concrete classes have to implement all abstract operations of the
    base class. They can also override some operations with a default
    implementation.
    """

    def get_data(self, df) -> dict:

        # Create dictionary of data to be returned
        data = {"bins": []}

        # Output collection data into dictionary
        for i, row in df.iterrows():
            dict_data = {
                "type": row['Collection Name'],
                "collectionDate": row['Next Collection Due'],
                        }
    
            data["bins"].append(dict_data)

        return data


    def parse_data(self, page: str, **kwargs) -> dict:

        page = 'https://chiltern.gov.uk/collection-dates'

        # Assign user info
        user_postcode = kwargs.get("postcode")
        user_paon = kwargs.get("paon")
        
        # Set up Selenium to run 'headless'
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--no-sandbox')
        options.add_argument('--disable-gpu')
        options.add_argument('--disable-dev-shm-usage')

        # Create Selenium webdriver
        driver = webdriver.Chrome(options=options)
        driver.get(page)

        # Enter postcode in text box and wait
        inputElement_pc = driver.find_element(
            By.ID, "COPYOFECHOCOLLECTIONDATES_ADDRESSSELECTION_ADDRESSSELECTIONPOSTCODE")
        inputElement_pc.send_keys(user_postcode)
        inputElement_pc.send_keys(Keys.ENTER)

        time.sleep(4)

        # Select address from dropdown and wait
        inputElement_ad = Select(driver.find_element(
            By.ID,"COPYOFECHOCOLLECTIONDATES_ADDRESSSELECTION_ADDRESSSELECTIONADDRESS"))

        inputElement_ad.select_by_visible_text(user_paon)
        
        time.sleep(4)

        # Submit address information and wait
        inputElement_bn = driver.find_element(
            By.ID, "COPYOFECHOCOLLECTIONDATES_ADDRESSSELECTION_NAV1_NEXT").click()
        
        time.sleep(4)
       
        # Read next collection information into Pandas
        table = driver.find_element(By.ID, "COPYOFECHOCOLLECTIONDATES_PAGE1_DATES2").get_attribute('outerHTML')
        df = pd.read_html(table, header=[1])
        df = df[0]

        # Parse data into dict
        data = self.get_data(df)

        return data

I believe it is then failing the integration test because it is trying to execute without the -s option, but I'm unclear on how I can add this to the input.json. For instance, I tried:

"Chilterns": { "url": "https://chiltern.gov.uk/collection-dates", "postcode": "HP14 4LA", "house_number": "HUGHENDEN MANOR, MANOR ROAD, HUGHENDEN VALLEY, HIGH WYCOMBE", "SKIP_GET_URL": "SKIP_GET_URL" },

but this did not seem to have any effect?

@robbrad
Copy link
Owner

robbrad commented Feb 13, 2023

Looks really good @preator67 - to add the ability to take from the input json you need to add a small change to https://github.com/robbrad/UKBinCollectionData/blob/master/uk_bin_collection/tests/features/steps/validate_council.py#L25

This will then use the switch as specified - If you want to share a repo where you have this I could try it out ?

@preator67
Copy link
Contributor

preator67 commented Feb 18, 2023

Thanks for the pointer @robbrad - makes sense when you know where to look. I've submitted a PR. If it's all OK, I can write some info for the Wiki - as this requires a bit more info from the user than other councils.

@robbrad
Copy link
Owner

robbrad commented Feb 18, 2023

Fantastic work @preator67

@robbrad
Copy link
Owner

robbrad commented Feb 18, 2023

@preator67 - the integration tests came back saying lxml was missing - does this need adding to the Poetry.toml?

https://robbrad.github.io/UKBinCollectionData/3.9/449/#categories/66e5ec8c5c97ebb7160e51a452a5e3ba/54003f6f2df25442/

@preator67
Copy link
Contributor

@robbrad Yes, it does - apologies, not sure how that got missed off.

@OliverCullimore
Copy link
Collaborator

This one looks to be all implemented, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
council request A new council request
Projects
None yet
Development

No branches or pull requests

5 participants