Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several concerns on Ep 11 (new assembly workflow) #71

Open
tbooth opened this issue Aug 29, 2024 · 3 comments
Open

Several concerns on Ep 11 (new assembly workflow) #71

tbooth opened this issue Aug 29, 2024 · 3 comments
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17

Comments

@tbooth
Copy link
Collaborator

tbooth commented Aug 29, 2024

From review by @cmeesters:

Chapter 11 - Designing a new workflow
This Chapter needs a major revision:

  • the assembly part comes out of the blue and is unrelated to everything before. If you want it, you need additional material, describing the background. Best put it into a separate chapter (or several), then.
  • genome assembly is an intricate challenge, recommending a relatively outdated tool like velvet is dangerous, as there are numerous follow-up implementation tailored for various genome types.
  • the design phase is ok, but does not mention the template from the Snakemake workflow catalogue. However, standardizing and contributing(!) a workflow has an enormous impact on the deployment and portability of workflows. And thereby on the whole ecosystem of Snakemake. Not to mention, the catalogue and how to contribute to it is a major flaw.
  • for the whole community it would be better, if people do not re-invent the wheel (e.g. new workflows for existing solutions), but were able to contribute to existing workflows and fix issues. This, however, requires a bit more documentation in Snakemake. A basic intro to git (pull, fork, commit, create PRs) might be helpful - and beyond the scope of this intro. Yet, perhaps a pointer to the catalogue and snakedeploy might be a good idea after all.
  • the separation of workflow and data is not taught (unless overlooked by me). Please introduce the --directory flag and the recommendation to separate workflow and data, which enables new users to apply the workflow onto several different datasets.
@tbooth
Copy link
Collaborator Author

tbooth commented Aug 29, 2024

Regarding the last three points, I completely agree that workflow re-use and maintainability is a vital topic and I'm currently in the process of preparing (and being funded to do so!) additional material on this topic. The whole idea of this episode is to test the skills that the learners have acquired in the previous chapters by presenting a new challenge, so I don't want to introduce any new technical concepts here.

So I'm in agreement with the reviewer that all these things are important, but I don't think they can be added to this chapter.

@tbooth tbooth added the reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17 label Aug 29, 2024
@tbooth
Copy link
Collaborator Author

tbooth commented Aug 29, 2024

genome assembly is an intricate challenge, recommending a relatively outdated tool like velvet is
dangerous, as there are numerous follow-up implementation tailored for various genome types.

I totally disagree with the comment that my course is "dangerous". There is no recommendation here to use Velvet in real research. Rather, Velvet is chosen as a simple tool to illustrate an assembly-centric workflow. Likewise, "finding the longest contig" is no way to judge the quality of your assembly, but serves as a useful and easy-to-understand proxy for this exercise.

I will modify the text of the episode to make this extra clear.

@tbooth
Copy link
Collaborator Author

tbooth commented Aug 29, 2024

I have added clarification to the text regarding what the workflow is doing and the choice of tools. I have also added a section "Biology and bioinformatics" to instructor/prereqs.html. The main intended audience for this course would be familiar with the terms "de-novo assembly" and "contigs" and "adapters" but for those not coming from a biology background this new section provides some pointers to background reading.

the assembly part comes out of the blue and is unrelated to everything before.
If you want it, you need additional material, describing the background.
Best put it into a separate chapter (or several), then.

This is the entire point - to present a "whole new workflow", in order that the learners can practise applying what they now know about Snakemake to a fresh challenge. Secondarily, but no less importantly, we want learners to practise debugging (TTT notes how neglected this is in general) but it's hard to teach debugging by presenting learners with pre-written "deliberate mistakes" because they can't grasp what the intention of the broken code was in the first place or why they would make that mistake themselves. Having this chapter allows the learners to make their own mistakes and debug them.

In practical experience of teaching this, the time taken from presenting the original script to learners having their working Snakemake code is much longer than a Carpentries episode should be. Two hours is a reasonable time estimate. But I don't see how to break it down, as most of that time is spent on the extended exercise. In practise, the tutor can break up the session by sharing debugging sessions with the class as and when learners ask for help. Not only does this help people who are stuck on similar problems, but the class will engage with the process of trying to spot and rectify the problems when they see their fellow learners are stuck.

One could argue that presenting an extended exercise is "not the Carpentries way". Everything should be bite size. Everything should be done in lock step. But I just don't think the process of conceptualising, writing out, and debugging a Snakemake workflow (or learning programming in general) can be reduced like this. Learners need to get the confidence that all the tools they have been given up to this point really can be applied to a fresh problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reviewer Issues arising from comments on https://github.com/carpentries-lab/reviews/issues/17
Projects
None yet
Development

No branches or pull requests

1 participant