Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

Construct intermediate Git lesson. #127

Closed
gvwilson opened this issue Nov 2, 2013 · 20 comments
Closed

Construct intermediate Git lesson. #127

gvwilson opened this issue Nov 2, 2013 · 20 comments
Assignees
Labels

Comments

@gvwilson
Copy link
Contributor

gvwilson commented Nov 2, 2013

Construct lesson on Git for intermediates in git/intermediate.

@ghost ghost assigned ethanwhite Nov 2, 2013
@wking
Copy link
Contributor

wking commented Nov 2, 2013

On Sat, Nov 02, 2013 at 09:01:57AM -0700, Greg Wilson wrote:

Construct lesson on Git for intermediates in git/intermediate.

I've got @jiffyclub's instructor-notes for Git-on-GitHub along with my
motivational blurb and cheat-sheet in
git://tremily.us/swc-version-control-git.git if you're interested.

@jamespjh
Copy link

@ethanwhite
Copy link
Contributor

Based on where @gvwilson is planning on stopping with the Intro material I'm thinking of the following as being the core of the Intermediate Git lesson:

  • Local Branching
  • Github Flow (Fork, Branch, Edit, Push, PR)
  • Tags & Releases

Basically we would assume that folks are generally familiar working solo, without using branches, but using remotes. We would then teach them how to branch, work collaboratively, and do some level of release work (i.e., via simple tagging and relying on GitHub's release mechanism. Is there other stuff we should consider including for a half a day on Intermediate Git?

There are certainly lots of good existing resources for this kind of material and we should choose one of the available options and adopt it to our needs if possible.

Since we're at the intermediate level I think it would be best to use code examples. If we do this then we need to develop code examples that are easily swappable for R/Python since there will be workshops teaching both as for the programming component.

@ahmadia
Copy link
Contributor

ahmadia commented Nov 22, 2013

I think we need to rewrite the entire approach to Git from the ground up. Recent experience teaching Git (and answering questions on our mailing list and elsewhere) is that we need to teach branching much, much earlier.

I agree that advanced GitHub flow goes here, as well as more of the etiquette and norms for contributing and working with open source projects here.

I'll ping back when I have done some more work on this, please keep me in the loop when developing these lessons and I'll try to keep my eyes on this.

@jdblischak
Copy link
Contributor

Two quick thoughts:

  • Are you still planning on covering the basics of Git, albeit at an accelerated pace? If we are assuming that participants in an intermediate bootcamp have not attended a novice bootcamp, I do not think it is safe to assume that they have any experience using Git. There are many scientists that are comfortable working in the shell and writing loops/functions, but have no idea what version control is (which is why SWC is so important). And if they have used version control before, it could have been SVN or another system.
  • One option to avoid tying the Git material to a specific programming language is to use a text document as the example. From my experience, bootcamp attendees are intrigued by the idea of a system for collaborating on writing documents that frees them from Track Changes in Microsoft Word.

@ethanwhite
Copy link
Contributor

I think we need to rewrite the entire approach to Git from the ground up. Recent experience teaching Git (and answering questions on our mailing list and elsewhere) is that we need to teach branching much, much earlier.

I'm curious to hear your thoughts on this. My general feeling is that with the average room of version control novices you can cover the basics and then either branching or remotes in half a day. In #146 @gvwilson has taking the approach of teaching remotes which makes it possible to teach conflicts. In this context I guess you're suggesting that we teach local branching instead of remotes/conflicts in the Novice lesson?

I'll ping back when I have done some more work on this, please keep me in the loop when developing these lessons and I'll try to keep my eyes on this.

Definitely. In fact I was planning on seeing if you'd be interested in writing some of them :)

@ethanwhite
Copy link
Contributor

Are you still planning on covering the basics of Git, albeit at an accelerated pace? If we are assuming that participants in an intermediate bootcamp have not attended a novice bootcamp, I do not think it is safe to assume that they have any experience using Git. There are many scientists that are comfortable working in the shell and writing loops/functions, but have no idea what version control is (which is why SWC is so important). And if they have used version control before, it could have been SVN or another system.

I believe the idea here is that each bootcamp is not Novice or Intermediate, but that each section can be selected as Novice or Intermediate. So the host would basically be saying that all the participants will have some background in the topic if we do an Intermediate version. That said, I definitely think that any Intermediate lesson should re-introduce the basics at an accelerated pace to make sure everyone is up to speed.

One option to avoid tying the Git material to a specific programming language is to use a text document as the example. From my experience, bootcamp attendees are intrigued by the idea of a system for collaborating on writing documents that frees them from Track Changes in Microsoft Word.

This certainly works great for a first introduction. I guess the concern is that we're trying to show them how to use this for code, so at the intermediate level we should actually show it to them with code so they get a feel for the natural process. That said, I'm certainly open to doing something different.

@ahmadia
Copy link
Contributor

ahmadia commented Nov 22, 2013

Thanks for pointing me to #146, somehow I missed that PR.

Yes, here's the rough overview of how I want to do things differently (across both beginner and intermediate):

  • Start with a working example repository and inspect history in it.
    • Rationale: when using existing history, I can speak about commits by their ids. This is impossible if the students are creating their own commits.
  • Define the concept of commit (snapshot)
  • Define the concept of history, working tree, and the SHA1 hash (DNA is not quite the right metaphor for SHA1, I've just given up so far and briefly explained the concept of a one-way hash)
    • It's important to just cover these three. There are some great "introduction to Git for computer scientists" tutorials that belabor all the excruciating details. We have to distill out the really important details while leaving the technicals of tips, trees, and blobs mostly out of the discussion.
  • Get students comfortable with the detached HEAD state.
    • *Rationale: Most people have an unrational fear about having a detached HEAD. This is exactly the state you want to be in when doing anything besides developing. Going from detached HEAD to development leads us to:
  • Define the concept of branch and HEAD (movable labels for commits)
  • Introduce checkout
  • Introduce remote branches (how else do you explain origin/master?)
  • Introduce feature branches
  • Introduce git add
  • Introduce the staging area
  • Introduce git diff and git diff --cached
  • Create first commit

If beginners get here, they just need a little more (pulling changes, push) before they're fully functional and hopefully can read StackOverflow questions and other documentation to get more help. I've helped one too many people who didn't understand what branches were to skip them in the first 1.5 hours of Git instruction. Also, understanding the feature branch workflow early will, I believe, help the learners in the long run.

Almost every piece of knowledge learners have about previous version control systems eventually hurts them when working with Git. I doubt we are going to be teaching any intermediate Git courses any time soon, so I'll be focusing my efforts on writing a ground-up lesson.

Update: A dump of the order I go through commands is available from this Etherpad, but I need to write the lesson up in longform.

@wking
Copy link
Contributor

wking commented Nov 22, 2013

On Fri, Nov 22, 2013 at 12:58:41PM -0800, Aron Ahmadia wrote:

Most people have an unrational fear about having a detached HEAD.

This may be due to the alarming phrasing ;).

  • Introduce remote branches (how else do you explain origin/master?)
  • Introduce feature branches

I'd flip the order here, and talk about local branches first.
Namespaced remote branches are just a special case of local branches.

If beginners get here, they just need a little more (pulling
changes, push)

I'm not sure where merging and conflict resolution land in your
layout, but that's certainly a key issue if they'll be maintaining a
project (instead of just contributing to an existing project).

I agree that embracing branches as a movable commit nickname as early
as possible is a good idea. I'd drop:

  • a feature branch that has already been merged and
  • a feature branch that was abandoned as a dead end

into the training repository to help clarify the idea before students
start creating their own commits.

@ethanwhite
Copy link
Contributor

Yes, here's the rough overview of how I want to do things differently (across both beginner and intermediate):

It's an interesting idea of putting some of the more core theory of git earlier in the process. But I'm sceptical that most novice scientists will respond well to things like talking about SHA1 hash, HEAD, detached HEAD, etc. before getting further into using version control. I know it's important for understanding what's going on, but putting this stuff in the first half a day is going to have a lot of scientists thinking "why do I need to know all of this?" and (more problematically) "this is all too complicated I guess I'm not smart/computery enough to use version control." The fact that you need to use add and the idea of a stage seems too esoteric to a lot of our learners so explaining the theory behind git seems like a lot for a half day introduction.

That said, I think this would be great material in an intermediate lesson that assumes previous use of version control. In fact since the intermediate material should start with a basic review of git (which also serves as an introduction for those who use something else) that introduction of the basics would be a natural place teach these core concepts that people need to really understand what is going on.

@wking
Copy link
Contributor

wking commented Nov 30, 2013

On Fri, Nov 22, 2013 at 12:58:41PM -0800, Aron Ahmadia wrote:

Update: A dump of the order I go through commands is available from
this Etherpad 1, but I need to write the lesson up in longform.

1: https://etherpad.mozilla.org/swc-2013-07-washington

Looking over this, the meaning of git push (without further
arguments) is not clear from the Etherpad notes. I'm not sure we
should be teaching the more implicit forms of push, because the
default semantics are going to change with Git 2.0 1. I imagine
that will lead to confusion if students are referring to notes that
haven't been future-proofed by either:

a. explaining the change in push semantics, or
b. only using explicit forms (git push $REMOTE $BRANCH)

In the interest of simplicity (and following PEP 20 2), I'm in favor
of option b. Then students don't need to internalize the default
remote- and branch-selection rules.

@ethanwhite
Copy link
Contributor

In the interest of simplicity (and following PEP 20 [2]), I'm in favor
of option b. Then students don't need to internalize the default
remote- and branch-selection rules.

+1

@ahmadia
Copy link
Contributor

ahmadia commented Dec 2, 2013

@wking
Copy link
Contributor

wking commented Dec 7, 2013

On Sun, Dec 01, 2013 at 07:50:06PM -0800, Aron Ahmadia wrote:

Some progress here:
https://github.com/ahmadia/bc/compare/swcarpentry:master...git-intermediate?expand=1

I'm concerned that there are over 300 lines of content and you've only
covered:

  • git config --global user.name
  • git config --global user.email
  • git config --global color.ui
  • git config --global core.editor
  • git clone https://github.com/ahmadia/bio-pipeline.git
  • git log
  • git log -n 1
  • git log -n 1 -p

From your Etherpad notes 1, you still want to cover:

  • git add changed_file
  • git branch
  • git branch -v
  • git checkout
  • git checkout -b branch_name
  • git checkout hash_id
  • git checkout master
  • git checkout python_pipeline.py
  • git commit
  • git commit -a
  • git commit -m "commit message"
  • git diff
  • git diff --cached
  • git fetch upstream
  • git log --branches --graph --oneline
  • git log --oneline
  • git log --oneline --decorate
  • git log --pretty=oneline
  • git log --stat
  • git merge upstream/master
  • git mv
  • git push origin master
  • git push origin master:remote_branch
  • git remote
  • git remote -v
  • git reset changed_file # discards changes
  • git rm
  • git status

Besides the risk of running over on time, I think culling the content
to bare minimum number of concepts will be better for students.
Things I think can be cut from your current dd9da96 (Added first part
of Conversational Git section, 2013-12-01):

  • The “why does Git want your email address” section. Git doesn't
    care, but your collaborators might. Is this an early thrust to
    oppose pushback you've gotten in earlier boot camps?
  • git log -n 1. I always use git show when I want to see a single
    commit. For students, I think it's fine to just use git log and
    drop an ellipsis (…) after the output you care about.

I also think the list of version-control motivators:

  • Unlimited "undo" button
  • Record of who made changes when
  • Avoid overwriting changes
  • Distributed checkpoints

Should be simplified to:

  • Who made changes, when, and why. This makes it easy to automatically:
    • Revert changes (unlimited "undo")
    • Automatically integrate most parallel changes
    • Figure out what the author of a particular line was thinking

I like the conversational-Git approach, and I think one benefit is
that we can explicitly focus on defining new terms. I'd like to see a
stripped down version of Git Pro's branching introduction 2,
covering:

  • Directory/file trees. I'm fine if you don't want to call these
    “trees”. Intermediate students (and basically everyone who's ever
    used a computer before) shouldn't already have internalized this
    idea, even if they don't agree on all the names (e.g. “folder”
    vs. “directory”).
  • Commits as metadata tagging a specific tree, recording author
    information, a commit message, and parent commits.
  • Branches as commit pointers that move forward automatically.
  • Repositories as containers that hold branches.

I'd also throw in figure 1-5 and it's “snapshot” view for extra
clarity 3. Those four concepts give you enough to start digging
into the history of your cloned project. Then I'd talk about checking
out earlier revisions and introduce:

  • The working tree as a tree unpacked on your filesystem.
  • HEAD as a pointer to the currently-checked-out branch/commit.

Then I'd introduce commits and the staging area using a graphic like
4. That gets you to local development with seven new ideas (tree,
commit, branch, repository, working tree, HEAD, staging area).

I'd introduce the concepts with graphics before I mentioned the Git
commands that manipulate them (e.g. I'd get through the repository
explanation before doing anything, and then start in with clone,
log, log -p, and log --graph --oneline --decorate. Then I'd go
through the HEAD explanation, after which we'd play with checkout $HASH). Then I'd go through the staging area explanation, after
which we'd play with add, rm, status, diff, diff --cached,
and commit -v).

I can write this up as long-form notes if you'd like, or you can
consider it advice for your own notes, and incorporate it (or not) as
you see fit ;).

@ahmadia
Copy link
Contributor

ahmadia commented Dec 7, 2013

Thanks. This feedback is super helpful. Give me a couple of days to digest
and think about it :)

On Saturday, December 7, 2013, W. Trevor King wrote:

On Sun, Dec 01, 2013 at 07:50:06PM -0800, Aron Ahmadia wrote:

Some progress here:

https://github.com/ahmadia/bc/compare/swcarpentry:master...git-intermediate?expand=1

I'm concerned that there are over 300 lines of content and you've only
covered:

  • git config --global user.name
  • git config --global user.email
  • git config --global color.ui
  • git config --global core.editor
  • git clone https://github.com/ahmadia/bio-pipeline.githttps://github.com/ahmadia/bio-pipeline.git
  • git log
  • git log -n 1
  • git log -n 1 -p

From your Etherpad notes 1, you still want to cover:

  • git add changed_file
  • git branch
  • git branch -v
  • git checkout
  • git checkout -b branch_name
  • git checkout hash_id
  • git checkout master
  • git checkout python_pipeline.py
  • git commit
  • git commit -a
  • git commit -m "commit message"
  • git diff
  • git diff --cached
  • git fetch upstream
  • git log --branches --graph --oneline
  • git log --oneline
  • git log --oneline --decorate
  • git log --pretty=oneline
  • git log --stat
  • git merge upstream/master
  • git mv
  • git push origin master
  • git push origin master:remote_branch
  • git remote
  • git remote -v
  • git reset changed_file # discards changes
  • git rm
  • git status

Besides the risk of running over on time, I think culling the content
to bare minimum number of concepts will be better for students.
Things I think can be cut from your current dd9da96 (Added first part
of Conversational Git section, 2013-12-01):

  • The “why does Git want your email address” section. Git doesn't
    care, but your collaborators might. Is this an early thrust to
    oppose pushback you've gotten in earlier boot camps?
  • git log -n 1. I always use git show when I want to see a single
    commit. For students, I think it's fine to just use git log and
    drop an ellipsis (…) after the output you care about.

I also think the list of version-control motivators:

  • Unlimited "undo" button
  • Record of who made changes when
  • Avoid overwriting changes
  • Distributed checkpoints

Should be simplified to:

  • Who made changes, when, and why. This makes it easy to automatically:
  • Revert changes (unlimited "undo")
  • Automatically integrate most parallel changes
  • Figure out what the author of a particular line was thinking

I like the conversational-Git approach, and I think one benefit is
that we can explicitly focus on defining new terms. I'd like to see a
stripped down version of Git Pro's branching introduction 2,
covering:

  • Directory/file trees. I'm fine if you don't want to call these
    “trees”. Intermediate students (and basically everyone who's ever
    used a computer before) shouldn't already have internalized this
    idea, even if they don't agree on all the names (e.g. “folder”
    vs. “directory”).
  • Commits as metadata tagging a specific tree, recording author
    information, a commit message, and parent commits.
  • Branches as commit pointers that move forward automatically.
  • Repositories as containers that hold branches.

I'd also throw in figure 1-5 and it's “snapshot” view for extra
clarity 3. Those four concepts give you enough to start digging
into the history of your cloned project. Then I'd talk about checking
out earlier revisions and introduce:

  • The working tree as a tree unpacked on your filesystem.
  • HEAD as a pointer to the currently-checked-out branch/commit.

Then I'd introduce commits and the staging area using a graphic like
4. That gets you to local development with seven new ideas (tree,
commit, branch, repository, working tree, HEAD, staging area).

I'd introduce the concepts with graphics before I mentioned the Git
commands that manipulate them (e.g. I'd get through the repository
explanation before doing anything, and then start in with clone,
log, log -p, and log --graph --oneline --decorate. Then I'd go
through the HEAD explanation, after which we'd play with checkout $HASH). Then I'd go through the staging area explanation, after
which we'd play with add, rm, status, diff, diff --cached,
and commit -v).

I can write this up as long-form notes if you'd like, or you can
consider it advice for your own notes, and incorporate it (or not) as
you see fit ;).


Reply to this email directly or view it on GitHubhttps://github.com//issues/127#issuecomment-30059547
.

@ethanwhite
Copy link
Contributor

On Sun, Dec 01, 2013 at 07:50:06PM -0800, Aron Ahmadia wrote:
Some progress here: https://github.com/ahmadia/bc/compare/swcarpentry:master...git-intermediate?expand=1

Looks like a great start! Thanks for tackling this and sorry I've been so slow in looking at it. I should be much more available the next couple of weeks

I agree with @wking that the introduction could be tightened up and am generally +1 on his suggestions. Since the target audience is folks who already have some experience with version control I think you can shorten the intro substantially. We should still very briefly motivate version control, but these folks have already drunk the coolaid so I don't think we need a introductory story or a lot of detail on why they should use version control.

We aren't assuming that they've used distributed version control so a few sentences in the intro about the big picture differences/benefits of distributed vs. centralized version control might be useful.

I'd introduce the concepts with graphics before I mentioned the Git commands that manipulate them

+1

@ethanwhite
Copy link
Contributor

@ahmadia is this something you'll have some bandwidth to work on in the next month or so? I ask because I'm getting ready to put out a call for help developing the intermediate material later this week and I'll either flag this as being something you're handling or ask for help depending on whether you'd like it or not. Clearly I'm quite understanding of limited time given my slow progress on the Python material, just wanted to see what you'd like me to do when advertising for help.

@ahmadia
Copy link
Contributor

ahmadia commented Jan 15, 2014

This should be done within the next week, actually :)

Thanks for the ping.

On Tue, Jan 14, 2014 at 4:35 PM, Ethan White [email protected]:

@ahmadia https://github.com/ahmadia is this something you'll have some
bandwidth to work on in the next month or so? I ask because I'm getting
ready to put out a call for help developing the intermediate material later
this week and I'll either flag this as being something you're handling or
ask for help depending on whether you'd like it or not. Clearly I'm quite
understanding of limited time given my slow progress on the Python
material, just wanted to see what you'd like me to do when advertising for
help.


Reply to this email directly or view it on GitHubhttps://github.com//issues/127#issuecomment-32309906
.

@ethanwhite
Copy link
Contributor

Awesome!

@wking
Copy link
Contributor

wking commented Apr 19, 2014

This landed in #243, so I'm closing this issue.

@wking wking closed this as completed Apr 19, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants