You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
remove self links from links to scrape i.e. all_links and get a dictionary of scraped link. add active '1' and also save the engines.
convert all_data to a format where it is a dictionary with 'link' as keys for fast look up.
Then use it to check
if 'common-crawl' in engines :
save 'cc-text'
save engines
link for all entries are active or not.
if in active then active:1 else active: 0
else:
use trafilatura scrape and filter
Then get info of that. check if it's category is commoncrawl and has snippet or not.
The text was updated successfully, but these errors were encountered: