-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add generic GSoC 2017 import app #67
Conversation
Preview of what is extracted (org info) |
gsoc/admin.py
Outdated
@@ -0,0 +1,3 @@ | |||
from django.contrib import admin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cant use admin in a django-distill system
gsoc/config.py
Outdated
|
||
def load_cache(filename): | ||
with open(os.path.join(DATA_DIR, filename), 'r') as f: | ||
return ruamel.yaml.load(f, Loader=ruamel.yaml.Loader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EOL at EOF
gsoc/config.py
Outdated
import ruamel.yaml | ||
import os | ||
|
||
DATA_DIR = os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a global setting for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get this. I need more help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git grep _site
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I searched but could not find any related setting other than STATIC_ROOT mentioned in community.settings as _site
. I actually found this snippet of from gci app.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is it.
gsoc/tests.py
Outdated
@@ -0,0 +1,3 @@ | |||
from django.test import TestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete this file until there are tests
templates/gsoc.html
Outdated
<body> | ||
|
||
{% if error %} | ||
<h3> {{error}} </h3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the scraper is moved into django commands, the error log will appear at /log
created in #63 , so you dont need to do error presentation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not actually error log. It appears in case when there are no data for a org. I am caching "no data present" also because it was stated in issue that we need to cache the fact that there is not data present for a org. Probably i should rename it? :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears in case when there are no data for a org.
That is an error. It means the GSOC app needs to be disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed this part. If related yaml file is not found then we show error. (file is not created if data is not found)
That is an error. It means the GSOC app needs to be disabled.
I do not delete existing yaml files on new query. Should i delete existing yaml files so that if query is made for another year(for which there is no data) so it does not render data from existing files. If there is no data no file gets created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this! Do not render errors here. This app should not exist if there was no gsoc data. it isnt a gsoc org. no app. no yaml. no page. using logging
to record the gsoc app was disabled. no other visible trace.
gsocscrape/spiders/gsoc.py
Outdated
technology['%s'% str(count)] = tech | ||
count = count + 1 | ||
|
||
item = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double space.
gsocscrape/spiders/gsoc.py
Outdated
project_data[int(item['id'])] = item | ||
project_data = OrderedDict(sorted(project_data.items(), key=lambda t: t[0])) | ||
|
||
with open(os.path.join('_site', 'project_info.yaml'), 'a') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix fix with gsoc
gsocscrape/spiders/gsoc.py
Outdated
|
||
def parse_project(self, response): | ||
mentors = [] | ||
org_url = "https://summerofcode.withgoogle.com/archive/2017/organizations/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
year
should be a parameter, as we'll also want to import other years, and other orgs will have many years of data to import.
gsocscrape/spiders/gsoc.py
Outdated
yield response.follow(follow_link, self.parse_org) | ||
|
||
def parse_org(self, response): | ||
project_url = "https://summerofcode.withgoogle.com/archive/2017/projects/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
year ; see below
gsoc/views.py
Outdated
name = organisation[1].get('name') | ||
return render(request, 'gsoc.html', {'error': name}) | ||
|
||
id = organisation[1].get('id') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why index of 1
? that is a bit odd. the yaml should only have one org in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this running this way only. I think that's how it inteprets yaml. Although i will reconsider it and try it via key, value type object. Or it would be good if i get some suggestions :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you dont need an array for one org. the Google json data structure that you fetch should not be the same as the coala org yaml data structure that you write to disk.
templates/gsoc_projects.html
Outdated
</head> | ||
|
||
<body> | ||
<h3 style="text-align: center;"> Coala </h3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line contains the keyword 'Coala'.
Origin: KeywordBear, Section: generalization
.
templates/gsoc_projects.html
Outdated
</head> | ||
|
||
<body> | ||
<h3 style="text-align: center;"> coala </h3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line contains the keyword 'coala'.
Origin: KeywordBear, Section: generalization
.
gsoc/config.py
Outdated
return ruamel.yaml.load(f, Loader=ruamel.yaml.Loader) | ||
|
||
def get_year(): | ||
return "2017" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EOL at EOF.
but more importantly, returning 2017 here is inherently wrong. the year is now 2018. The GSoC 2018 program has now started.
And you want 2018 - 1
gsocscrape/items.py
Outdated
@@ -0,0 +1,14 @@ | |||
# -*- coding: utf-8 -*- | |||
|
|||
# Define here the models for your scraped items |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is junk added by the generator to guide someone writing their code. It doesnt need to be in our repository
gsocscrape/spiders/gsoc.py
Outdated
class GsocSpider(scrapy.Spider): | ||
name = "gsoc" | ||
start_urls = [ | ||
"https://summerofcode.withgoogle.com/archive/{year}/organizations".format(year = year), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use pycodestyle
on your code
gsoc/config.py
Outdated
import ruamel.yaml | ||
import os | ||
|
||
DATA_DIR = os.path.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is it.
gsoc/config.py
Outdated
|
||
DATA_DIR = os.path.join( | ||
os.path.dirname(__file__), '..', | ||
'_site' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use STATIC_ROOT
here. (also create a newcomer issue to fix this in gci app)
gsoc/views.py
Outdated
for technology in org.get(key).get('technologies').values(): | ||
tech.append(technology) | ||
|
||
return render(request, 'gsoc.html', {'id': id, 'name': name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the substitution parameters to the next line
gsoc/views.py
Outdated
logger.info('GSoC data not available') | ||
error = 'No GSoC data available for {name} year {year}'.format( | ||
name=org_name, year=year) | ||
return render(request, 'gsoc.html', {'error': error}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, stop rendering an error. use logging
to record the error. and then re-raise the exception here.
gsoc/views.py
Outdated
} | ||
projects_list.append(item) | ||
return render(request, 'gsoc_projects.html', | ||
{'project_list': projects_list, 'org_name': name}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lay out the these substitution variables so that you can add another one without modifying the existing lines.
templates/gsoc.html
Outdated
<body> | ||
|
||
{% if error %} | ||
<h3> {{error}} </h3> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this! Do not render errors here. This app should not exist if there was no gsoc data. it isnt a gsoc org. no app. no yaml. no page. using logging
to record the gsoc app was disabled. no other visible trace.
|
||
class GsocSpider(scrapy.Spider): | ||
name = 'gsoc' | ||
start_urls = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not use the preferred quotation marks.
Origin: QuotesBear, Section: all.python
.
The issue can be fixed by applying the following patch:
--- a/tmp/tmpvhnur60c/gsocscrape/spiders/gsoc.py
+++ b/tmp/tmpvhnur60c/gsocscrape/spiders/gsoc.py
@@ -33,7 +33,7 @@
if(organization_link):
organization_link = organization_link[0].extract().split('/')[4]
else:
- logger.info("Organisation {} does not exist in GSoC for {}".format(
+ logger.info('Organisation {} does not exist in GSoC for {}'.format(
org_name, year
))
return
tagline = org.get(key).get('tagline') | ||
description = org.get(key).get('description') | ||
tech = [] | ||
for technology in org.get(key).get('technologies').values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code does not comply to PEP8.
Origin: PEP8Bear, Section: all.autopep8
.
The issue can be fixed by applying the following patch:
--- a/tmp/tmpvhnur60c/gsoc/views.py
+++ b/tmp/tmpvhnur60c/gsoc/views.py
@@ -33,7 +33,7 @@
'tagline': tagline,
'description': description,
'tech': tech
- })
+ })
def projects(request):
for key in org.keys(): | ||
name = org.get(key).get('name') | ||
projects = get_projects_data() | ||
projects_list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code does not comply to PEP8.
Origin: PEP8Bear, Section: all.autopep8
.
The issue can be fixed by applying the following patch:
--- a/tmp/tmpvhnur60c/gsoc/views.py
+++ b/tmp/tmpvhnur60c/gsoc/views.py
@@ -61,6 +61,6 @@
'mentors': mentors
}
projects_list.append(item)
- return render(request, 'gsoc_projects.html',{'project_list': projects_list,
- 'org_name': name
- })
+ return render(request, 'gsoc_projects.html', {'project_list': projects_list,
+ 'org_name': name
+ })
tagline = org.get(key).get('tagline') | ||
description = org.get(key).get('description') | ||
tech = [] | ||
for technology in org.get(key).get('technologies').values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E124 closing bracket does not match visual indentation
Origin: PycodestyleBear (E124), Section: all.autopep8
.
for key in org.keys(): | ||
name = org.get(key).get('name') | ||
projects = get_projects_data() | ||
projects_list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E231 missing whitespace after ','
Origin: PycodestyleBear (E231), Section: all.autopep8
.
for key in org.keys(): | ||
name = org.get(key).get('name') | ||
projects = get_projects_data() | ||
projects_list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (83 > 80 characters)
Origin: PycodestyleBear (E501), Section: all.autopep8
.
projects = get_projects_data() | ||
projects_list = [] | ||
for key in projects.keys(): | ||
mentors = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E124 closing bracket does not match visual indentation
Origin: PycodestyleBear (E124), Section: all.autopep8
.
for key in org.keys(): | ||
name = org.get(key).get('name') | ||
projects = get_projects_data() | ||
projects_list = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line is longer than allowed. (83 > 80)
Origin: LineLengthBear, Section: all.linelength
.
No file gets created if there is no data. I raised 404 exception as i didn't get any way to disable an app. Is there any way to disable the app for a condition. |
raising 404 is good enough. Note that there is no problem with the GCI app. master builds on cron work every night. see https://travis-ci.org/coala/community/builds . Something is wrong in your changes. |
@jayvdb I will update it. Thanks. |
@pradeepgangwar @jayvdb I tried to clone the forked repo and tried to run the Please help me in setting up. |
Please tell me if I am missing some step |
Closing this due to inactivity. I am not able to take it to completion. |
Please review and let me know changes it needs.
Related issue: #59