Athletics scraper - merge events from different campuses that are on the same date? #66

qasim · 2016-04-27T04:25:46Z

As it currently stands, the athletics scraper scrapes with top-level being a date. However, across campuses, the data is still in 2 different files (e.g. 01M and 01SC). I think it would make more sense to concatenate the two and have a schema like follows:

{
  "date":String,
  "events":[{
    "title":String,
    "location":String,
    "building_id":String,
    "campus":String,
    "start_time":String,
    "end_time":String
  }]
}

Looking at how we lay out scrapers, this actually may prove to be non-trivial. Any opinions on this change and if we were to implement it, how to go about doing so?

kashav · 2016-04-27T16:20:16Z

I like this idea – the data will be a lot cleaner and we won't be repeating ids each month.

It shouldn't be hard to implement either, if we want to preserve the feature of scraping each campus separately, we can add a Boolean parameter to each scrape method which decides whether we save the data or return it. Then in exams.__init__ we can merge the sets and save them.

utm = UTMExams.scrape(location, save=False)
utsc = UTSCExams.scrape(location, save=False)
docs = OrderedDict()
for campus in utm, utsc:
    for date in campus:
        if date not in docs:
            docs[date] = OrderedDict([
                ('date', date),
                ('events', [])
            ])
        docs[date]['events'].extend(campus[date]['events'])
for date, doc in docs.items(): 
    Scraper.save_json(doc, location, date)

There might be a better solution, since this requires each campus scraper to have that same schema.

qasim · 2016-04-27T18:40:07Z

@kshvmdn that makes sense to me, better than what I was thinking :)

qasim · 2016-04-28T03:16:19Z

Awesome. the JSON files look even greater now, since they match stuff like the shuttles scraper. Consistency is key!

Thanks again ^_^

qasim added the question label Apr 27, 2016

kashav mentioned this issue Apr 27, 2016

Merge event data for campuses based on date #68

Merged

qasim closed this as completed Apr 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Athletics scraper - merge events from different campuses that are on the same date? #66

Athletics scraper - merge events from different campuses that are on the same date? #66

qasim commented Apr 27, 2016

kashav commented Apr 27, 2016 •

edited

Loading

qasim commented Apr 27, 2016

qasim commented Apr 28, 2016

Athletics scraper - merge events from different campuses that are on the same date? #66

Athletics scraper - merge events from different campuses that are on the same date? #66

Comments

qasim commented Apr 27, 2016

kashav commented Apr 27, 2016 • edited Loading

qasim commented Apr 27, 2016

qasim commented Apr 28, 2016

kashav commented Apr 27, 2016 •

edited

Loading