-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add citation suggestion to README.md #2503
Comments
Haha! I tweeted Quincy about this just yesterday and already put in a pull request for on particular article I found. This is how I did it: https://github.com/freeCodeCamp/guides/pull/2337/files?short_path=230a905#diff-230a9052be3f27a5607aea2debfbf534 |
I like it @bryanchapel; Good work! We need to write it into the README assuming others agree. Would you like to do that once everyone has a chance to comment? |
Can do! |
Update README.md with content attribution policy per issue freeCodeCamp#2503
Since I'm kind of new to this, about how long should you wait before making a decision on an issue? I made a change to the README and referenced this issue in the commit above to help the discussion along. Let me know how it looks. :) |
I agree whit all you said! Only one note: we should also pay attention to license and term of service. For example someone opened a pull request with copy pasted text from Quora that has this long ToS. I don't know if this is legal, but one thing is sure: this is not ethical! |
I found another PR Algorithm: Add AVL Tree Article with content copied from tutorialspoint. It's a sad situation and it's difficult to not discourage contributors, but we are trying to create a community content, not a copy-paste website. |
Not sure if this conversation is open, but I'm throwing in my thoughts. I wrote my first article last night. I was also an English teacher in my last incarnation, so I struggled with how to write concise content without it sounding exactly like my favorite resources. I opened several sources on the same topic and read them all, then closed them and wrote all my thoughts down without looking at any. That might be a good suggestion for your README. I did find that my own voice came out very similar to that favorite I mentioned but I I made sure to change up the examples and insert my own examples. Hope this helps. |
college student perspective: My university classes are all governed by very strict no-plagiarism rules. I think the guides should reflect similar values as other content creators work hard to produce their work and if we are not giving them proper citation then that is extremely unfair. We could ask everyone to use something like APA citations (its already a scientific standard). This would not only benefit in making the guide more professional, but also provide valuable SEO link-backs to creators. |
@Ethan-Arrowood Agreed. Didn't even think of the SEO benefits of this, haha! @lvcoulter I think you're highlighting the difference between paraphrasing what you've researched and learned, and directly quoting. I think that's totally appropriate. My only suggestion would be to collect some of those references and stick them in a @davcri I think the most important piece of Quora's ToS is this:
Also a good point I didn't think of. User's should be away of a resource's ToS and the concepts of fair use. I'll add these suggestions to the README changes I'm proposing in the commit referenced above. Good discussion all! |
On APA vs MLA formatting for citations, I agree that we could use APA since it's typically the scientific standard. I do like that the MLA specifies a "Date Accessed" for the citation. I think that's really helpful for possibly spotting references that may have changed or gone out of date since a topic was added to the guide, and we can amend/update these as needed. Also, listing the link in the citation, as APA recommends, is a bit redundant as we should be creating a link to the resource itself in the markdown. I think that convention applies more to print/non-web citations. I think the best format for our purposes would look something like this: And in the markdown it would look like: Maybe we could do a bit of both? Thoughts? |
@QuincyLarson @Bouncey @HKuz @timo (I'm tagging top contributors, they are more experienced than me): can you please give feedback ? I think this is a delicate subject. |
I think direct copy/paste should generally be discouraged unless directly quoted and integral to the article. Paraphrasing, which is what @lvcoulter is doing, it a-okay by me assuming we cite sources. As mentioned, I like @bryanchapel's approach as it's similar to Wikipedia. We could even model wikipedia's citation format if we wanted. I'm not sure it matters truly if it's APA or MLA as long as we give credit where credit is due. |
@dhcodes Sorry I'm so late to this thread. Here are my thoughts on this: by forcing contributors to abide by a style guide, we're making it harder to contribute. Such a style guide should instead by enforced through an automated script. Just like we use ESLint for our JavaScript, we should use a style checker for our citations. And we should tackle plagiarism the same way: by running a build script. That way, if the build task detects what might be plagiarism, a human can look at it and make sure it's properly attributed. Here's a library that does this. It hasn't been touched in a couple years, but we might be able to make it work. It's in Python, so @Ethan-Arrowood might be a good candidate for testing it out and seeing if we can get it running and incorporated into TravisCI: https://github.com/architshukla/Plagiarism-Checker Again, my sentiment is we should put up as few rules and as few impediments to contributing as absolutely necessary. And those rules should be enforced at the CI-level that's transparent and consistent. |
@davcri Thanks for spotting that case of clear plagiarism. I've closed that contributor's pull request and also reverted another PR from them that I spotted which had plagiarism. I gave him a stern one-time warning (the notion of plagiarism is less familiar in some parts of the world and I gave him the benefit of the doubt). If we spot people plagiarizing, we should give them a one-time warning that they will be banned from contributing to the freeCodeCamp GitHub organization if they're caught again, and we should refer them to the Academic Honesty Policy: https://www.freecodecamp.org/academic-honesty |
I like the idea of a plagiarism-checker. The python module @QuincyLarson linked is now broken due to the Google API it was using being deprecated. Furthermore, it would be easier to run a Node script through the Travis Build anyways. . . So I propose we add a Plagiarism-Checker Node.JS script as a down-the-road feature. However, at the moment I am way too busy to start this project. I have a lot on my plate including interview prep, university work, and personal projects (started my own OS project this week). If no one else wants to take up the lead on this I can create a blank repo and begin work in a few months once my life calms down a bit. In the meantime I think the best course of action would be to write a CONTRIBUTING.md that highlights the basics to contributing as well as some additional details such as our stance on plagiarism and citations. Here is a good resource (includes examples) on how to properly set up a CONTIBUTING.md file. |
I agree about the checker script as well. There is a Node version by Copyleaks (https://www.npmjs.com/package/plagiarism-checker) that we might be able to use. I also think that writing one from scratch might not be that hard. You could use the request-promise and cheerio libraries to send chunks of the committed text to Google, then parse the first 10 or so results and check the text chunk against it for a fuzzy match. If there's, say, a 60% or something similarity, the PR gets flagged as needing review. Everything I've found so far was the first hit returned by Google when I copied and pasted parts of the article. See this article on unconscious bias as an example. This user might also need a warning, as outlined by @QuincyLarson above? I put in a PR to fix their issue with citations already. At any rate, I added a note about the Academic Honesty Policy to my README commit, in addition to the stuff I added about proper attribution. Let me know how this looks, or if it should be pulled out into a separate CONTRIBUTING file as @Ethan-Arrowood suggests. Might even be best to mention it in both places just so it's clear and people don't have an excuse to say "I didn't see that guideline". |
@bryanchapel did you already make a PR with your updated README ? If not, can you make it ? In this way we can discuss it (for me it's almost all right, I have only a doubt in using HTML tags vs markdown). I vote for writing about the Honesty Policy in both the README and the CONTRIBUTING files. I also opened a new issue to discusse about adding a plagiarism check inside the Travis Build #3315 |
Just made the PR. #3371. This is just for the README. Didn't do anything for CONTRIBUTING. |
I added some small edits. |
@bryanchapel @dhcodes I've merged your edits! Thanks! We should mirror this in CONTRIBUTING to make sure people see it. Then I believe we can close this issue. |
@bryanchapel Nice find on the plagiarism checker! Yes - we would absolutely love your help implementing this. Seeing that @Ethan-Arrowood is a bit busy at the moment, and has determined that the Python library definitely doesn't work, you're now our only hope on this. |
I've looked into this a bit more and assuming we use a comparison search via a search engine (google or bing), we may need to limit the test to only files changed in the PR since the free plan for Google Custom Search now limits you to 100 queries/day. I've looked for alternatives, but there aren't many--Bing also has removed their free plan. I know Jest can run tests only on changed files, but I'm looking at alternatives as well. I'm not sure if this is a setting on Travis. Still researching. |
@dhcodes Yes - I agree. We should only test files changed in the PR. @Ethan-Arrowood pointed out that we might want this to be part of our pre-commit step, so that we can point out possible plagiarism to the contributor before it even gets committed. Then if the contributor thinks there's a false positive, they could run the commit task again with --not-plagiarism and it would skip this step, but add a note to the commit description like "plagiarism check skipped" so we'd know to eye-ball their contribution for anything suspicious before accepting the PR. |
@dhcodes Travis can run any script you give it, we just have to write it. If any check we make fails you can @QuincyLarson This check would be better as a Travis check due to the amount opf PR's coming via the GitHub GUI. Pre-commit hooks only work when committing locally. |
@Bouncey yeah I think based on what everyone has said, it may be best to go the PR bot route. I'm currently working on making one in probot, but I'm slow so if someone else wants to give it a go, by all means, go for it. |
I probably wasn't clear enough in my last post. If anyone wants to work on this, consider it open. There's no assurance that I'll get anything working and there are many other skilled programmers out there who could probably whip something up faster than I. |
Just to note, there are some PRs that reference this issue but in the interest of maintaining positive contributions, I am marking the PRs that have the majority of the content copied and pasted from external websites as To restate what @davcri has said and what I ultimately agree with, "we are trying to create community content, not a copy-paste website". |
I worry some of the new content directly plagiarizes other sites. IMHO, we should work on a recommended way to cite other sources and discourage unattributed copying/pasting.
Thoughts?
The text was updated successfully, but these errors were encountered: