Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serpscrap.SerpScrap() returns None for some keywords #52

Open
GefenPuravida opened this issue Sep 22, 2019 · 0 comments
Open

serpscrap.SerpScrap() returns None for some keywords #52

GefenPuravida opened this issue Sep 22, 2019 · 0 comments

Comments

@GefenPuravida
Copy link

GefenPuravida commented Sep 22, 2019

Hi.
Does somebody have any idea what could be the reason that on some keywords i get the data while on others i don't ?

for example, dog food:

import serpscrap

keywords = ['dog food']

config = serpscrap.Config()
config.set('scrape_urls', True)

scrap = serpscrap.SerpScrap()
scrap.init(config=config.get(), keywords=keywords)
scrap.as_csv('/tmp/output')
2019-09-22 11:55:14,988 - root - INFO - 
                Going to scrape 2 keywords with 1
                proxies by using 1 threads.
2019-09-22 11:55:14,990 - scrapcore.scraping - INFO - 
        [+] SelScrape[localhost][search-type:normal][https://www.google.com/search?] using search engine "google".
        Num keywords=1, num pages for keyword=[1]
        
2019-09-22 11:55:24,286 - scrapcore.scraper.selenium - INFO - https://www.google.com/search?
2019-09-22 11:55:55,364 - scrapcore.scraping - INFO - 
            [google]SelScrape localhost - Keyword: "dog food" with [1, 2] pages,
            slept 22 seconds before scraping. 1/1 already scraped
            
2019-09-22 11:55:56,767 - scrapcore.scraper.selenium - INFO - Requesting the next page
2/2 keywords processed.
2019-09-22 11:56:01,961 - root - INFO - Scraping URL: https://www.mypetneedsthat.com/best-dry-dog-foods-guide/
2019-09-22 11:56:02,681 - root - INFO - Scraping URL: https://www.businessinsider.com/best-dog-food
2019-09-22 11:56:02,686 - root - INFO - Scraping URL: https://www.akc.org/expert-advice/nutrition/best-dog-food-choosing-whats-right-for-your-dog/
2019-09-22 11:56:02,689 - root - INFO - Scraping URL: https://www.amazon.com/Best-Sellers-Pet-Supplies-Dry-Dog-Food/zgbs/pet-supplies/2975360011
2019-09-22 11:56:02,690 - root - INFO - Scraping URL: https://www.chewy.com/b/food-332
2019-09-22 11:56:26,122 - root - INFO - Scraping URL: https://www.petco.com/shop/en/petcostore/category/dog/dog-food
2019-09-22 11:56:26,123 - root - INFO - Scraping URL: https://www.petflow.com/dog/food
2019-09-22 11:56:26,843 - root - INFO - Scraping URL: https://www.dogfoodadvisor.com/
2019-09-22 11:56:27,735 - root - INFO - Scraping URL: https://www.petsmart.com/dog/food/dry-food/
2019-09-22 11:56:27,737 - root - INFO - Scraping URL: https://www.petsmart.com/dog/food/
2019-09-22 11:56:27,738 - root - INFO - Scraping URL: https://www.purina.com/dogs/dog-food
2019-09-22 11:56:28,635 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=fBABfWqSN2I
2019-09-22 11:56:31,757 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=7P85BMCCboI
2019-09-22 11:56:36,807 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=az0ktsWYydw
2019-09-22 11:56:39,645 - root - INFO - Scraping URL: https://www.youtube.com/watch?v=njJ99wPByy4
2019-09-22 11:56:42,571 - root - INFO - Scraping URL: https://nypost.com/video/homeless-man-and-his-dog-reuniting-is-pure-joy/
2019-09-22 11:56:45,156 - root - INFO - Scraping URL: /aclk?sa=l&ai=DChcSEwjRyYG5h-TkAhUM1WQKHSiFASYYABAAGgJwag&sig=AOD64_2IRYpCakgEzR3BK1oqeuLCVa3mjA&adurl=&rct=j&q=
2019-09-22 11:56:45,157 - root - INFO - Scraping URL: https://www.purina.com/dogs/dog-food
2019-09-22 11:56:45,867 - root - INFO - Scraping URL: https://en.wikipedia.org/wiki/Dog_food
2019-09-22 11:56:45,872 - root - INFO - Scraping URL: https://www.hillspet.com/dog-food
2019-09-22 11:56:45,876 - root - INFO - Scraping URL: https://www.smithsfoodanddrug.com/pl/dog-food/11103
2019-09-22 11:57:10,321 - root - INFO - Scraping URL: https://www.canidae.com/dog-food/
2019-09-22 11:57:10,325 - root - INFO - Scraping URL: https://www.petcarerx.com/dog/food-nutrition
2019-09-22 11:57:11,222 - root - INFO - Scraping URL: https://www.businessinsider.com/best-dog-food
2019-09-22 11:57:11,223 - root - INFO - Scraping URL: https://www.tractorsupply.com/tsc/catalog/dog-food
2019-09-22 11:57:12,249 - root - INFO - Scraping URL: https://www.thehonestkitchen.com/dog-food
2019-09-22 11:57:12,253 - root - INFO - Scraping URL: https://www.boxed.com/products/category/418/dog-food
2019-09-22 11:57:13,171 - root - INFO - Scraping URL: https://lifesabundance.com/category/dogfood.aspx
2019-09-22 11:57:13,174 - root - INFO - Scraping URL: //www.googleadservices.com/pagead/aclk?sa=L&ai=DChcSEwj5_NHFh-TkAhWTr-wKHSgSDVMYABAAGgJwag&ohost=www.google.com&cid=CAASEuRoai4G0R8MNbToVnZKzozmNA&sig=AOD64_10tA_ESFCwAHTPgPUTDsInBgYwEQ&adurl=&rct=j&q=
2019-09-22 11:57:13,178 - root - INFO - Scraping URL: https://freshpet.com/why-freshpet/
2019-09-22 11:57:13,901 - root - INFO - Scraping URL: https://pet-food.thecomparizone.com/?var1=82002114870&var2=381760664839&var4&var5=b&var7=1234567890&utm_source=google&utm_medium=cpc
None
Traceback (most recent call last):
  File "C:\Users\rot\Anaconda3\lib\site-packages\serpscrap\csv_writer.py", line 14, in write
    w.writerow(row)
  File "C:\Users\rot\Anaconda3\lib\csv.py", line 155, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "C:\Users\rot\Anaconda3\lib\csv.py", line 151, in _dict_to_list
    + ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'url', 'encoding', 'meta_robots', 'meta_title', 'text_raw', 'last_modified', 'status'
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\serpscrap\csv_writer.py in write(self, file_name, my_dict)
     13                 for row in my_dict[0:]:
---> 14                     w.writerow(row)
     15         except Exception:

~\Anaconda3\lib\csv.py in writerow(self, rowdict)
    154     def writerow(self, rowdict):
--> 155         return self.writer.writerow(self._dict_to_list(rowdict))
    156 

~\Anaconda3\lib\csv.py in _dict_to_list(self, rowdict)
    150                 raise ValueError("dict contains fields not in fieldnames: "
--> 151                                  + ", ".join([repr(x) for x in wrong_fields]))
    152         return (rowdict.get(key, self.restval) for key in self.fieldnames)

ValueError: dict contains fields not in fieldnames: 'url', 'encoding', 'meta_robots', 'meta_title', 'text_raw', 'last_modified', 'status'

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-16-3f66e8511348> in <module>
      8 scrap = serpscrap.SerpScrap()
      9 scrap.init(config=config.get(), keywords=keywords)
---> 10 scrap.as_csv('/tmp/output')

~\Anaconda3\lib\site-packages\serpscrap\serpscrap.py in as_csv(self, file_path)
    146         writer = CsvWriter()
    147         self.results = self.run()
--> 148         writer.write(file_path + '.csv', self.results)
    149 
    150     def scrap_serps(self):

~\Anaconda3\lib\site-packages\serpscrap\csv_writer.py in write(self, file_name, my_dict)
     15         except Exception:
     16             print(traceback.print_exc())
---> 17             raise Exception

Exception: 

Many thanks !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant