-
-
Notifications
You must be signed in to change notification settings - Fork 381
Construct novice R lesson. #124
Comments
For scientists whose understanding of data analysis consists entirely of Excel and/or a GUI stats program (e.g. SPSS, Minitab, etc.), I see the main goals of a SWC R bootcamp as the following:
And then time permitting, other R topics could be covered:
I also think the only other topic covered should be the shell. I have taught version control to many beginners, and I have not found it overly fruitful since we are providing them a solution to a problem they do not yet have. What do we currently have? The most complete set of novice R materials we currently have are those in misc-r, which were developed by @jennybc. I think the best material in this set includes the introduction to R working environment and RStudio and the explanation of data types. However, it still needs lots of attention before it is useful for R bootcamps. It's current limitations include the following:
Going forward I think we should use @jennybc's introductory lesson as the first lesson for the novice bootcamp. I also think we should adapt her lecture notes on R data types (currently in PDF format) into a flat text format. However, from there I suggest we start from scratch and develop R materials with more exposition on core SWC concepts. We need to decide on what that core content will be and also the format that should be used (please debate the format at #92). |
Minor comment; don't stress loops (formal ones, I'd urge you not to focus the environment on RStudio; it's great and users should be encouraged to use it if they want, but I'd prefer to teach command-line usage (in the Windows or MacOS GUIs or the Linux shell) and supplement that with some RStudio-specific materials. You'd want students happy and familiar at the command line and then they can use any R interface, including RStudio. |
+1 for not really getting into version control for beginners (will elaborate later; basically gentle intro is possible using github only mostly as a consumer) +1 for using RStudio (-1 for not using it?....) +1 for less loops and more proper data aggregation; I vote for +1 for overview of topics in vs out by @jdblischak My material is in better form, in particular R Markdown vs PDF, in my STAT 545A course materials.
|
Just to clarify my RStudio point; The majority of the materials should be R IDE-agnostic, a student ought to be able use whatever IDE they want and still make progress. E.g. teach package management via the R functions, not the GUI in RStudio. Etc. Fine to have a set of slides on working with RStudio but keep the other materials agnostic. I'm +1 for requiring students on bootcamps use RStudio as that gives a common interface for the bc, which if nothing else helps instructors. -1 for plyr at least as the only data aggregation interface. plyr is a useful tool, but it makes you think about the aggregation issue in a very particular way - i.e. Hadley's way. There's also a huge pile of material out there that doesn't use Hadley's ecosystem. So I'd vote for teaching some of the basic R tools for this ( One reason for not focussing solely on plyr is that is it awfully slow; so slow you'd never want to use it in production functions. But if you never learn about the tools in base R, which are much faster but more quirky, then plyr is the only tool you reach for. This may change in the future of course with dplyr but that isn't going to be on CRAn for a while. |
We have to use RStudio with beginners. The only material that will be RStudio specific is the intro lesson orienting the students to the IDE, searching for help, and using knitr. I am also +1 on plyr. The main bottleneck for me as a scientist is the initial exploration of the data, the formalization of my analysis into a reusable script, and understanding the code when I return days/weeks/months later when I need to change something. The runtime of the code is extremely negligible in comparison. plyr provides a useful framework that I can keep in working memory while I code. I foresaw this flame war about teaching Hadley vs. base coming. I cast my vote for Hadley. |
Thanks for the inflammatory response John - since when did voicing an opinion count as flaming? I have no problem at all with Hadley's ecosystem of packages - I use many of them daily myself as part of my research and teach them regularly. However, I'm also not keen on equipping useRs with only an opinionated approach to working with and using R (Hadley's word "opinionated", not mine). Look, if you are producing novice material and intermediate R materials you are clearly expecting that at least some useRs will progress from one to the other. If at the intermediate level and beyond you are introducing package development etc you don't want to be relying on plyr or some other packages that are are more user-oriented. So then you start introducing some of the other tools in base R instead of focussing on package development etc. All I was suggesting is that you include both; explain how to do it with the raw tools R gives you (understanding split-apply-combine is quite informative if you get students to actually I see the same issues with plotting; Would you suggest we only use/teach ggplot2 (or lattice) because they make data vis so much easier? At the expense of Base graphics? |
@gavinsimpson - Welcome to the Software Carpentry repository! I don't have any direct contributions in this thread, but I wanted to quickly jump in before things get derailed. I pretty sure that @jdblischak did not mean to imply that you were flaming, just that we were approaching a topic where instructors may be sharply divided. Let's try to assume best faith on here, and if you're feeling personally attacked, please email Greg or me immediately so we can sort things out. Thanks! |
I apologize, @gavinsimpson. I used the term "flame wars" as a succinct way to describe the decision we were trying to make, but I completely see why you understood it to be inflammatory and regret having used it. And thanks to @ahmadia for helping mediate the situation. As for the solution, it looks like the best course forward from here is to create some beginning materials using base R aggregation functions as @gavinsimpson suggested. This lesson could go before @jennybc's lesson using plyr. Since we seem to agree that the specifics of version control can be left out of a beginner boot camp, that should leave us more time for R lessons like this. |
Also, thanks to @jennybc for the links. These are much easier to follow than what is currently in the |
Apologies for jumping into the middle of the conversation. I haven't been following this issue since I was interpreting it as being exclusively about the R portion of bootcamps. It seems to me that a discussion related to removing version control from Intro bootcamps that teach R (and therefore presumably also from Intro bootcamps more generally) is a larger conversation that would require its own issue for discussion, especially since version control material for Intro bootcamps is specifically being developed in #146. |
Sorry to disagree again John, but a good argument could be made to keep version control, especially for R scripts, as it allows the tracking of changes to the codes underlying a paper or thesis etc. Using VC speaks to reproducibility of the scientific process and probably should be held up as something important for bootcamp attendees to buy into. |
Adding to @ethanwhite's comment, version control is an absolutely core part of Software Carpentry's curriculum. I don't personally think that removing it from any boot camp is even up for discussion, but if someone wants to bring that up it definitely deserves it's own thread since it's not related to R at all. |
Hi everyone, |
A general point I also made on #129 that may get at why we're struggling to design novice vs intermediate R bootcamps following a more Python-y set of SWC principles: People use Python because they need a general purpose scripting language and/or they need to analyze data. People learn R because they need to analyze and visualize data, especially if they need statistical tools of at least moderate sophistication. Once they're in deep enough, they might also use it for general programming/scripting stuff, because it's convenient to keep dependencies and interfaces to a minimum. So the motivation and priorities for the typical boot camper (and instructor, for that matter!) is rather different. I break R usage into three modes
I think a beginner R SWC bootcamp should emphasize topics in 1. and touch on some in 2. The intermediate can revisit topics in 1. but push really hard on topics in 2. and 3. Reproducibility is a theme that cuts across both boot camp levels. For beginners, the goal is to get them saving scripts, not using the mouse for anything mission critical, and an intro to dynamic report generation. Maybe writing a pseudo-Makefile = a master R script to run the other? I believe unit testing in R belongs in an intermediate bootcamp, probably in the context of package development. I have found it awkward to fit version control into R beginners boot camp. Learners are usually desperate to learn more about visualization or |
@jennybc - Thanks for the comparison, that's really insightful. |
Proposed policy re: RStudio: we use it in bootcamps because learners will probably want to use it later and it eliminates great heaps of OS-specific issues with editors, etc. We do NOT rely on RStudio buttons, etc. to accomplish anything mission critical. Exception: we might use Rstudio buttons to get a certain product quickly (e.g. Knit HTML or Compile Notebook). With the desired output in hand, learners are then receptive when we show how to emulate at the command line (e.g. |
+1 on the proposed policy re RStudio, Jenny. |
My view re: I apply this to visualization. Yes, I practically ignore base graphics; for years, I emphasized I apply this to data aggregation, especially for beginners. I made the conversion from base R to |
@jennybc - thanks also for the general points comparing the ideas behind python vs R bootcamps. I disagree a bit about the emphasis on analysis in 1. For domain-specific courses analysis will be a important component, but for a SWC bootcamp? For that I feel that understanding the language is more important, with touching on programming (writing your own functions etc.) also being an important SWC bootcamp (BC) topic, but not necessarily an important domain-specific component at the novice level. For an BC, I have to agree with other comments on keeping/including version control. Again, there'd be other more important topics if I were giving a domain-specific workshop or course, but for a BC and the things that SWC aims to convey? For me it is essential. As for them not having anything to version control, what about the script(s) they are writing for examples as they go through the early material. If writing scripts and having them write new scripts for each Section/Part of the BC, all in a single folder, you'd have things to version control at that point and something to work with when you introduced version control topics. I wonder if part of the issue here is not python vs R, but lessons vs bootcamps? Lessons that people might want to take and adapt for their own domain-specific classes probably don't need to include version control, but a lesson in the a BC probably should. |
@jennybc - regarding your plyr comment, and I don't want to prolong that particular discussion, but I do wonder if this also doesn't boil down to the conflict between domain-specific courses/workshops and SWC bootcamps (BC). In a domain specific course I'd have no issue with adopting plyr or ggplot2 or whatever higher-level packages were needed. But my understanding of BCs is that they are a bit different to this, that regardless what programming language being taught, the aim is to convey some basic, critical principles of using your computer effectively. In that situation, I would expect more recourse to the core language and some common key skills (which I mention in an earlier comment). I disagree about the plotting example; I think it is arguably harder to get a student to do what you propose and understand it in ggplot than in base graphics, which probably only requires 3 lines of code (probably comparable in length but of lower complexity than ggplot). You're also cutting the student off from the majority of plotting code examples available in books, on websites, and used in other packages. If all you want the students to do is make their own plots, then fine, but I suspect those students will probably end up working with code that uses base graphics and then wondering what to do. Your point about |
Even in a typical Python boot camp we teach very little about doing data analysis with Python. That's not because the students are not going to do that, they absolutely are (at least we hope). It's because that's not the actual goal of Software Carpentry. We teach scripting in boot camps because we want to move people toward one-button reproducible workflows, but that's only one aspect of what Software Carpentry is about. If they are going to write code, even just a little, in any language, the tools and methods of software engineering are going to help them and they will have the easiest time adopting those if we start them early. Teaching R is great, but if that's all you do it isn't a Software Carpentry boot camp. It's an R boot camp. |
@jiffyclub Good point. Helps me refine what I find awkward. Maybe it's not R vs Python. I think it's hard to teach these "meta" issues about good programming practices to folks who aren't yet able to do cool and useful things with whatever language we're talking about. Chicken. Egg. If learners don't have that base competence, the few glimpses you give them of doing useful things -- great example: visualization with I didn't mean for this to get so existential. 😕 |
On Thu, Nov 14, 2013 at 05:04:09PM -0800, Jennifer (Jenny) Bryan wrote:
Going from “no basic competence” to “does useful things with |
I feel the same way that @gvwilson, @jiffyclub, and @gavinsimpson feel about version control and the general identity of a SWC Bootcamp. For a little historical perspective I think it's also helpful to take a look at the summary of our discussion about adding R as a language being taught by SWC [1]. |
I mistakenly thought that the creation of separate novice and beginner bootcamps included the discussion not only of the depth to cover each topic but also which topics to be covered. I apologize for my clear misunderstanding of the goals of the recent reorganization, and I'll strive to be more diligent in my reading in the future. Please disregard my previous uninformed contributions to this thread. |
Hi John, |
@jdblischak - I'm in agreement with Greg. There's no need to apologize, there is always tension in determining the proper course for Software Carpentry materials, and I think you've been a great help these past few months. |
Thanks @ethanwhite for that link. That is very helpful. I feel like we -- or at least I -- needed this discussion to sort of think through and rediscover alot of those points for myself. I actually feel the skeleton of the May NC R bootcamps are fairly close in spirit to what a beginner R boot camp should be. I can't revisit this today but am happy to lay out a proposed outline (bullet points) for a beginner bootcamp. Maybe later today or over the weekend. It'll be an evoluation of what @jdblischak put at the very beginning of this issue. I better understand why so many think version control must be in and can get on board with that. I do have serious misgivings about formal unit testing in a beginner R boot camp. I think that may truly be an issue where the different nature of R and R users suggests we need to implement SWC principles differently than Python. I think it's intermediate. |
+1 for no need to apologize @jdblischak. The work you're doing is awesome. Keep it up. @jennybc I'm glad it was helpful. I think having more discussion of the scope of a beginner camp is great and appreciate all of the conversation in this thread. @gvwilson's been providing some guidance through the material he's developing, but I think everyone's still wrapping their head around this. For example, I personally think that discussing whether or not to include formal unit testing for beginners would be valuable (in fact Greg and I were just chatting about this this morning). I'd recommend starting a new issue with an appropriately broad title so that anyone who is interested will notice it. |
Revision of @jdblischak 's initial proposal at beginning of this issue. I'm trying to sum up this whole thread. Could we close this issue and reopen discussion on a new issue with this as starting point? And plan actual sessions and lesson there? Important historical background: Assume typical learner currently works mostly in Excel or with a GUI stats program. Day 1 (implement core SWC principles in the context of R):
Day 2 (...SWC...):
The deal with RStudio: we will use it in bootcamps because, long-term, it does support sustainable workflows. Importantly, it practically eliminates many OS- and editor-related headaches. However, we will discourage over-reliance on, e.g. RStudio buttons and menus, for mission critical pieces of a workflow. Illustration: we might use Rstudio buttons to get a certain product quickly (e.g. Knit HTML or Compile Notebook). Then, with the desired output in hand, learners are then receptive when we show how to emulate at the command line (e.g. Open loops:
Settled questions (?):
What are we starting with? Course materials from @jennybc cover many of these topics. A version got pulled into a SWC repo back in May/June, prematurely. Look here for material that is improved and easier to navigate:
@karthik has submitted a large pull request based on his recent Australian bootcamps. See #91 for a standlone |
wow, @jennybc, blown away by your fantastic summary. 👍 that we close the issue and split everything up. |
@karthik Agree re: I don't think we can trim anything out of Day 1. Therefore, I believe addressing visualization would mean cutting something from Day 2. Which brings us back to the question of emphasis on programming vs data analysis? I think I can make peace with either vision, but we have to pick one. I'd be happy to see an outline that works |
Great summary and revision @jennybc!
I'm personally comfortable with teaching informal testing instead of unit testing in novice bootcamps (e.g., create a simple test dataset and run the code on it to make sure that it gives the right answers), but this is something we should get @gvwilson's feedback on and perhaps have a more general (language agnostic) discussion about if necessary. To me the most important thing is that we should be making decisions about "beginner bootcamps", not "beginner R bootcamps". They are all SWC bootcamps. The programming component for some is taught in R, some in Python, just like in some cases we teach version control in Mercurial instead of in Git.
I don't think focusing on analysis (using code) is at all in contrast to the core principles. Everything I typically teach is analysis based and @gvwilson's new novice Python lessons are largely along these lines. For example: http://nbviewer.ipython.org/urls/raw.github.com/swcarpentry/bc/master/python/novice/01-numpy.ipynb
Couldn't agree more. I'm +1 for closing this and moving to a new issue(s). If @jdblischak is OK with that I think that either he or @jennybc should go ahead and make the shift. |
I am +1 on starting a new issue. @jennybc, you can start the new issue. |
@jennybc @BernhardKonrad @jduckles I hope the R bootcamp at Miami went well. Do you think some of your materials could be used as the start for the |
Should be closed soon by #396. |
Construct a lesson on R for novices in
r/novice
.The text was updated successfully, but these errors were encountered: