Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realtor.com - element changes #1

Open
crushingbear opened this issue May 15, 2019 · 10 comments
Open

Realtor.com - element changes #1

crushingbear opened this issue May 15, 2019 · 10 comments

Comments

@crushingbear
Copy link

Hi there,

Can you give me some input here? I am trying to parse the data-url elements within the pages.

    soup = bs(requests.get(requestUrl + "/pg-" + str(page), headers=userAgent).text, "html.parser")
    #print(soup)
    pagedList = soup.findAll("div", {"class": "data-wrap"})
    #pagedList = soup.findAll('div')
    numberrec = len(pagedList)
    print(numberrec)
    for propertyTag in pagedList:
    	try:
    		print(urls.append(a.attrs['data-url']))

    		#print(propertyTag.find("data-url")['href'].text)
	#		print(propertyTag.find(attrs={'data-url':'href'}).text)
    #		#print("Start tag:", tag)
		except:
    		print("error")
@pangrr
Copy link
Owner

pangrr commented May 15, 2019

Hi, I did expect anyone gets interested in this ancient repo. So what a surprise to me.

I haven't used python for years, so I don't remember the implementations.

Will it be helpful to solve your problem with javascript?

@crushingbear
Copy link
Author

mainly looking for a solution in python, but I should be able to translate from javascript back to python.

@pangrr
Copy link
Owner

pangrr commented May 15, 2019

What exactly do you want as output?

@crushingbear
Copy link
Author

crushingbear commented May 15, 2019 via email

@pangrr
Copy link
Owner

pangrr commented May 15, 2019

data-url is an attribute which appears in multiple elements in a page.
Do you want the output to be those elements? Or the data-url values?
What are the pages you are interested in?

@crushingbear
Copy link
Author

@pangrr
Copy link
Owner

pangrr commented May 15, 2019

Do you want the output to be those elements? Or the data-url values?

@crushingbear
Copy link
Author

crushingbear commented May 15, 2019 via email

@crushingbear
Copy link
Author

i solved my problem. i guess taking a break from it helps. i had to reference as unicode to obtain data.

    soup = bs(requests.get(requestUrl + "/pg-" + str(page), headers=userAgent).text, "html.parser")
    #print(soup)
    pagedList = soup.findAll("div", {"class": "data-wrap"})
    #pagedList = soup.findAll('div')
    numberrec = len(pagedList)
    #print(numberrec)
    for propertyTag in pagedList:
    	try:
    		dataurl = propertyTag[u'data-url']
    		print(dataurl)
    		#print(propertyTag.attrs)
    		#print(u'data-url'.text)
			#print(propertyTag.find(u"data-url").text)
    		#print(propertyTag.find("data-url")['href'].text)
	#		print(propertyTag.find(attrs={'data-url':'href'}).text)
    #		#print("Start tag:", tag)
		except:
    		print("error")

@pangrr
Copy link
Owner

pangrr commented May 15, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants