Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Athletics scraper - merge events from different campuses that are on the same date? #66

Closed
qasim opened this issue Apr 27, 2016 · 3 comments
Labels

Comments

@qasim
Copy link
Member

qasim commented Apr 27, 2016

As it currently stands, the athletics scraper scrapes with top-level being a date. However, across campuses, the data is still in 2 different files (e.g. 01M and 01SC). I think it would make more sense to concatenate the two and have a schema like follows:

{
  "date":String,
  "events":[{
    "title":String,
    "location":String,
    "building_id":String,
    "campus":String,
    "start_time":String,
    "end_time":String
  }]
}

Looking at how we lay out scrapers, this actually may prove to be non-trivial. Any opinions on this change and if we were to implement it, how to go about doing so?

@qasim qasim added the question label Apr 27, 2016
@kashav
Copy link
Member

kashav commented Apr 27, 2016

I like this idea – the data will be a lot cleaner and we won't be repeating ids each month.

It shouldn't be hard to implement either, if we want to preserve the feature of scraping each campus separately, we can add a Boolean parameter to each scrape method which decides whether we save the data or return it. Then in exams.__init__ we can merge the sets and save them.

utm = UTMExams.scrape(location, save=False)
utsc = UTSCExams.scrape(location, save=False)
docs = OrderedDict()
for campus in utm, utsc:
    for date in campus:
        if date not in docs:
            docs[date] = OrderedDict([
                ('date', date),
                ('events', [])
            ])
        docs[date]['events'].extend(campus[date]['events'])
for date, doc in docs.items(): 
    Scraper.save_json(doc, location, date)

There might be a better solution, since this requires each campus scraper to have that same schema.

@qasim
Copy link
Member Author

qasim commented Apr 27, 2016

@kshvmdn that makes sense to me, better than what I was thinking :)

@qasim
Copy link
Member Author

qasim commented Apr 28, 2016

Awesome. the JSON files look even greater now, since they match stuff like the shuttles scraper. Consistency is key!

Thanks again ^_^

@qasim qasim closed this as completed Apr 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants