diff --git a/docs/R-IOC/02_IOC_R_week_01.md b/docs/R-IOC/02_IOC_R_week_01.md index c929bd7a1..b53ae061c 100644 --- a/docs/R-IOC/02_IOC_R_week_01.md +++ b/docs/R-IOC/02_IOC_R_week_01.md @@ -61,15 +61,15 @@ To avoid this inconvenience, you need to add the `.txt` extension to make your f ![](images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!** -- [x] Create a variable called `my_var` that contain your favorite color. -- [x] Create a variable called `surname` with the string _Marilyn Monroe_. -- [x] Create a variable with the number 9. -- [x] What's its type? -- [x] Change it to character. -- [x] Calculate the Ln, log in base 2 and log in base 10 of the value 1. -- [x] Round the fraction 9/7 with 2 and then 4 decimal numbers. -- [x] Create a function that takes a value and substract the number 4. -- [x] Test your function for the values : 12, 5.6 and 0. +- [x] 1. Create a variable called `my_var` that contain your favorite color. +- [x] 2. Create a variable called `surname` with the string _Marilyn Monroe_. +- [x] 3. Create a variable with the number 9. +- [x] 4. What's its type? +- [x] 5. Change it to character. +- [x] 6. Calculate the Ln, log in base 2 and log in base 10 of the value 1. +- [x] 7. Round the fraction 9/7 with 2 and then 4 decimal numbers. +- [x] 8. Create a function that takes a value and substract the number 4. +- [x] 9. Test your function for the values : 12, 5.6 and 0. Please be aware of the best practices for your Rscript, we will be attentive to them! diff --git a/docs/R-IOC/03_IOC_R_week_02.md b/docs/R-IOC/03_IOC_R_week_02.md index 70d385b56..ee2b87a05 100644 --- a/docs/R-IOC/03_IOC_R_week_02.md +++ b/docs/R-IOC/03_IOC_R_week_02.md @@ -34,25 +34,25 @@ To avoid this inconvenience, you need to add the `.txt` extension to make your f ![](images/toolbox-do-it-yourself.png){ style="width:75px"} **Do it yourself!** -- [x] Create a factor for exam grades "A", "B", "C", "D". What is the current reference level? -- [x] Now set the grade "B" as the reference level. -- [x] The grade "D" is no longer used in exam grades, please delete it from the vector and drop this unused level. -- [x] How to check if there are same elements in `v1` (`v1 <- c(1, 2, 3, 4, 5)`) and `v2` (`v2 <- c(8, 3, 7, 9)`) -- [x] Are all elements in `v1` greater than 3? -- [x] Is any element in `v1` greater than 8 AND is any element in `v2` greater than 8? -- [x] Try `c(TRUE, FALSE) & TRUE`, `c(TRUE, FALSE) & c(TRUE, FALSE)`, `c(TRUE, FALSE) & c(TRUE, FALSE, TRUE)`, `FALSE && TRUE`, `c(TRUE, FALSE) && TRUE` and +- [x] 1. Create a factor for exam grades "A", "B", "C", "D". What is the current reference level? +- [x] 2. Now set the grade "B" as the reference level. +- [x] 3. The grade "D" is no longer used in exam grades, please delete it from the vector and drop this unused level. +- [x] 4. How to check if there are same elements in `v1` (`v1 <- c(1, 2, 3, 4, 5)`) and `v2` (`v2 <- c(8, 3, 7, 9)`) +- [x] 5. Are all elements in `v1` greater than 3? +- [x] 6. Is any element in `v1` greater than 8 AND is any element in `v2` greater than 8? +- [x] 7. Try `c(TRUE, FALSE) & TRUE`, `c(TRUE, FALSE) & c(TRUE, FALSE)`, `c(TRUE, FALSE) & c(TRUE, FALSE, TRUE)`, `FALSE && TRUE`, `c(TRUE, FALSE) && TRUE` and `c(TRUE, FALSE) && c(TRUE, FALSE)` in the R terminal, can you tell how to use properly `&` and `&&`? -- [x] Download the data in any folder of your choice using this url: +- [x] 8. Download the data in any folder of your choice using this url: https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000841/ScoringFiles/PGS000841.txt.gz -- [x] Read (*i.e.*, import into R) the downloaded file and observe what you got. -- [x] How many lines of comment (also called metadata) should we skip to get the data? -- [x] Re read the file again with appropriate parameters of `read.delim()`. -- [x] Save the readed table in `.csv` format and in Excel `.xlsx` format. -- [x] The comment lines are sometime useful, in this example we can get the information of the downloaded polygenic score (PGS). Try to read only the comment lines in R and transforme it into a `data.frame`. -- [x] Save the PGS information table in an `.RDS`. -- [x] Save both PGS score table and the information table in an `.RData`. -- [x] Save both PGS score table and the information table in a single Excel `.xlsx` file. -- [x] Read the cells A8 to C10 of the first sheet of the previous saved Excel file. +- [x] 9. Read (*i.e.*, import into R) the downloaded file and observe what you got. +- [x] 10. How many lines of comment (also called metadata) should we skip to get the data? +- [x] 11. Re read the file again with appropriate parameters of `read.delim()`. +- [x] 12. Save the readed table in `.csv` format and in Excel `.xlsx` format. +- [x] 13. The comment lines are sometime useful, in this example we can get the information of the downloaded polygenic score (PGS). Try to read only the comment lines in R and transforme it into a `data.frame`. +- [x] 14. Save the PGS information table in an `.RDS`. +- [x] 15. Save both PGS score table and the information table in an `.RData`. +- [x] 16. Save both PGS score table and the information table in a single Excel `.xlsx` file. +- [x] 17. Read the cells A8 to C10 of the first sheet of the previous saved Excel file. Please be aware of the best practices for your Rscript, we will be attentive to them! diff --git a/docs/R-IOC/04_IOC_R_week_03.md b/docs/R-IOC/04_IOC_R_week_03.md index dd2d03a08..f1e56e5cd 100644 --- a/docs/R-IOC/04_IOC_R_week_03.md +++ b/docs/R-IOC/04_IOC_R_week_03.md @@ -16,22 +16,22 @@ To avoid this inconvenience, you need to add the `.txt` extension to make your f ![](images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!** -- [x] Create a list with vectors (numeric, character and logical) of length 15, 8 and 10 respectively. Don't hesitate to use R functions to create them without having to write them in hard copy (like : `hardcopy_vec <- c("it's", "not", "very", "effective", "that", "way")`). +- [x] 1. Create a list with vectors (numeric, character and logical) of length 15, 8 and 10 respectively. Don't hesitate to use R functions to create them without having to write them in hard copy (like : `hardcopy_vec <- c("it's", "not", "very", "effective", "that", "way")`). - [x] The numeric vector must follow a binomial distribution - [x] The character vector is the last 8 letters of the alphabet in capital - [x] The logical vector is composed as many true values as false in the order of your like but remember not written in hard copy! -- [x] Add names for each element of your list. -- [x] Retrieve the character vector from your list. -- [x] Retrieve the 4th value of the logical vector from your list. -- [x] Remove positive elements of the numerical vector from your list. -- [x] Filter to keep only false value of the logical vector from your list. -- [x] Create a function that generate a random DNA sequence of a specified length (example, for a length 7 you must obtain : `ATCGATC`) -- [x] Create a list of 4 random DNA sequences with a **random** number between 10 and 200 bases +- [x] 2. Add names for each element of your list. +- [x] 3. Retrieve the character vector from your list. +- [x] 4. Retrieve the 4th value of the logical vector from your list. +- [x] 5. Remove positive elements of the numerical vector from your list. +- [x] 6. Filter to keep only false value of the logical vector from your list. +- [x] 7. Create a function that generate a random DNA sequence of a specified length (example, for a length 7 you must obtain : `ATCGATC`) +- [x] 8. Create a list of 4 random DNA sequences with a **random** number between 10 and 200 bases (don't hard copy the length) called human, mouse, chicken, fly -- [x] Compute the number of bases of each sequences -- [x] Test of many sequences had more than 50 nucleotides -- [x] Filter the list to keep only non mammals sequences +- [x] 9. Compute the number of bases of each sequences +- [x] 10. Test of many sequences had more than 50 nucleotides +- [x] 11. Filter the list to keep only non mammals sequences Please be aware of the best practices for your Rscript, we will be attentive to them! diff --git a/docs/R-IOC/05_IOC_R_week_04.md b/docs/R-IOC/05_IOC_R_week_04.md index 4f6793a27..902d38156 100644 --- a/docs/R-IOC/05_IOC_R_week_04.md +++ b/docs/R-IOC/05_IOC_R_week_04.md @@ -29,23 +29,23 @@ To avoid this inconvenience, you need to add the `.txt` extension to make your f ![](images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!** -- [x] Create a matrix object named `my_mat` with 3 rows and 4 columns, +- [x] 1. Create a matrix object named `my_mat` with 3 rows and 4 columns, fill with numbers 1 to 12 by row, name the rows with "r1", "r2", "r3" and the columns with "c1", "c2", "c3", "c4". -- [x] Extract the 2nd row of `my_mat`. -- [x] Extract the 2nd row of `my_mat` but keep it in matrix format. -- [x] Extract the 2nd row of `my_mat` using a logical vector. -- [x] What are the positions for the numbers that are multiples of 3 in `my_mat`? -- [x] Based on `my_mat`, add a column "c5" containing the values "a", "b", "c". What happens after this add? -- [x] Now delete the added column of `my_mat` and convert the matrix to numeric mode. -- [x] Replace the element bigger than 10 by 99 in `my_mat`. -- [x] Transforme the matrix `my_mat` to a `data.frame` named `my_df`. -- [x] Use the rownames to create a new column "id" for `my_df`. -- [x] Which row(s) has(have) duplicated values in `my_df`? -- [x] Create a new column named "total" in `my_df`, which calculates the sum of column "c1" to "c4" by row. -- [x] Change the column order to put the "id" in the first column in `my_df`. -- [x] Remove the rownames of `my_df`. -- [x] Add a new row in `my_df` which contains the sum of each column (except the "id" column, put `NA` in the new row for this column). +- [x] 2. Extract the 2nd row of `my_mat`. +- [x] 3. Extract the 2nd row of `my_mat` but keep it in matrix format. +- [x] 4. Extract the 2nd row of `my_mat` using a logical vector. +- [x] 5. What are the positions for the numbers that are multiples of 3 in `my_mat`? +- [x] 6. Based on `my_mat`, add a column "c5" containing the values "a", "b", "c". What happens after this add? +- [x] 7. Now delete the added column of `my_mat` and convert the matrix to numeric mode. +- [x] 8. Replace the element bigger than 10 by 99 in `my_mat`. +- [x] 9. Transforme the matrix `my_mat` to a `data.frame` named `my_df`. +- [x] 10. Use the rownames to create a new column "id" for `my_df`. +- [x] 11. Which row(s) has(have) duplicated values in `my_df`? +- [x] 12. Create a new column named "total" in `my_df`, which calculates the sum of column "c1" to "c4" by row. +- [x] 13. Change the column order to put the "id" in the first column in `my_df`. +- [x] 14. Remove the rownames of `my_df`. +- [x] 15. Add a new row in `my_df` which contains the sum of each column (except the "id" column, put `NA` in the new row for this column). Please be aware of the best practices for your Rscript, we will be attentive to them! diff --git a/docs/R-IOC/06_IOC_R_week_05.md b/docs/R-IOC/06_IOC_R_week_05.md index 2b4037ec9..d671cf5c7 100644 --- a/docs/R-IOC/06_IOC_R_week_05.md +++ b/docs/R-IOC/06_IOC_R_week_05.md @@ -44,23 +44,23 @@ To avoid this inconvenience, you need to add the `.txt` extension to make your f ![](images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!** -- [x] Create a matrix with several columns of numeric values (use `rnorm` for example) and use the apply function to calculate the max of each column. -- [x] Use the apply function to find the minimun value in each row of your matrix. -- [x] Write a function that takes a DNA sequence as input and checks if it contains any invalid characters (i.e., characters other than A, T, C, or G). If it does, print an error message, otherwise, print "Valid DNA sequence". -- [x] Write an update version of the function created in week 3 to also created false DNA sequences thanks to another parameter where when true it takes `nucleotides <- c("A", "T", "C", "G")`and otherwise `nucleotides <- c("A", "T", "C", "G", "X")` -- [x] By using apply and its subfonctions, create a list with 4 sequences where you select : - - [x] The length the same way as question 8 of week 3 - - [x] The veracity randomly (no hard copy!) -- [x] Test the validity of your sequences using apply and its subfunctions. -- [x] Create a function called `which_season` that takes the month (integer) and returns the season -- [x] Create the variable `my_airquality` from available dataframe `airquality`. -- [x] Add the column `season` to `my_airquality` thanks to `which_season` -- [x] Compute the number of rows for each season -- [x] With `lapply`, create a list of numeric vectors (use `rnorm` for example) and calculate the sum of each vector. -- [x] For each element of this list, plot a simple histogram where you add a vertical line that represente the mean of the distribution. -- [x] Create two numeric vectors of equal length and use mapply to calculate the element-wise product of the two vectors. -- [x] Create a custom function that takes to argument the day and month and determinate more accurately the season. Apply it for your first dataset. -- [x] Compare the result with those from `which_season`. If the result is equal, change the value in the column `season` to be in uppercase, otherwise don't change the value (or change to be in lowercase if it's already in uppercase). +- [x] 1. Create a matrix with several columns of numeric values (use `rnorm` for example) and use the apply function to calculate the max of each column. +- [x] 2. Use the apply function to find the minimun value in each row of your matrix. +- [x] 3. Write a function that takes a DNA sequence as input and checks if it contains any invalid characters (i.e., characters other than A, T, C, or G). If it does, print an error message, otherwise, print "Valid DNA sequence". +- [x] 4. Write an update version of the function created in week 3 to also created false DNA sequences thanks to another parameter where when true it takes `nucleotides <- c("A", "T", "C", "G")`and otherwise `nucleotides <- c("A", "T", "C", "G", "X")` +- [x] 5. By using apply and its subfonctions, create a list with 4 sequences where you select : + - [x] a. The length the same way as question 8 of week 3 + - [x] b. The veracity randomly (no hard copy!) +- [x] 6. Test the validity of your sequences using apply and its subfunctions. +- [x] 7. Create a function called `which_season` that takes the month (integer) and returns the season +- [x] 8. Create the variable `my_airquality` from available dataframe `airquality`. +- [x] 9. Add the column `season` to `my_airquality` thanks to `which_season` +- [x] 10. Compute the number of rows for each season +- [x] 11. With `lapply`, create a list of numeric vectors (use `rnorm` for example) and calculate the sum of each vector. +- [x] 12. For each element of this list, plot a simple histogram where you add a vertical line that represente the mean of the distribution. +- [x] 13. Create two numeric vectors of equal length and use mapply to calculate the element-wise product of the two vectors. +- [x] 14. Create a custom function that takes to argument the day and month and determinate more accurately the season. Apply it for your first dataset. +- [x] 15. Compare the result with those from `which_season`. If the result is equal, change the value in the column `season` to be in uppercase, otherwise don't change the value (or change to be in lowercase if it's already in uppercase). Please be aware of the best practices for your Rscript, we will be attentive to them! diff --git a/docs/R-IOC/07_IOC_R_week_06.md b/docs/R-IOC/07_IOC_R_week_06.md index 93743443c..57ec726f6 100644 --- a/docs/R-IOC/07_IOC_R_week_06.md +++ b/docs/R-IOC/07_IOC_R_week_06.md @@ -14,18 +14,18 @@ To avoid this inconvenience, you need to add the `.txt` extension to make your f ![](images/toolbox-do-it-yourself.png){: style="width:75px"} **Do it yourself!** -- [x] Create a tibble `my_phones` from the available data.frame `WorldPhones`. Beware of the rownames ! We don't want to lose them -- [x] Make it tidy and write it in `my_phones` -- [x] Filter the tibble to retrieve only the European (don't write it in `myphones`). -- [x] Select the tibble to retrieve only the region (don't write it in `myphones`). -- [x] Replace "." by underscore in region name -- [x] Replace truncated region names (`Amer`) by the full continent name using stringr's pattern matching functions. -- [x] Group the data based on the region -- [x] Compute the mean of number of telephones per region -- [x] Add a column with a normalized number of telephone per region (Reminder : `norm_val = val - mean(val) / sd(val)`) -- [x] Resume the information to check if the mean equal 0 and the sd equal 1. What do you get ? -- [x] Do the same for the last 4 questions but by grouping on the year you can change the created column names -- [x] Resume the information to retrieve the Year with the most phones for each region +- [x] 1. Create a tibble `my_phones` from the available data.frame `WorldPhones`. Beware of the rownames ! We don't want to lose them +- [x] 2. Make it tidy and write it in `my_phones` +- [x] 3. Filter the tibble to retrieve only the European (don't write it in `myphones`). +- [x] 4. Select the tibble to retrieve only the region (don't write it in `myphones`). +- [x] 5. Replace "." by underscore in region name +- [x] 6. Replace truncated region names (`Amer`) by the full continent name using stringr's pattern matching functions. +- [x] 7. Group the data based on the region +- [x] 8. Compute the mean of number of telephones per region +- [x] 9. Add a column with a normalized number of telephone per region (Reminder : `norm_val = val - mean(val) / sd(val)`) +- [x] 10. Resume the information to check if the mean equal 0 and the sd equal 1. What do you get ? +- [x] 11. Do the same for the last 4 questions but by grouping on the year you can change the created column names +- [x] 12. Resume the information to retrieve the Year with the most phones for each region Please be aware of the best practices for your Rscript, we will be attentive to them! diff --git a/docs/R-IOC/08_IOC_R_week_07.md b/docs/R-IOC/08_IOC_R_week_07.md index 7b1f0943c..2f2aabf47 100644 --- a/docs/R-IOC/08_IOC_R_week_07.md +++ b/docs/R-IOC/08_IOC_R_week_07.md @@ -26,18 +26,18 @@ Let's play with the dataset `diamonds` provided in the `ggplot2` package, it contains prices of more than 50,000 round cut diamonds, with 10 variables. Use `?diamonds` to get the full description and `str(diamonds)` to have a glimpse of the data structure. -- [x] Create a plot to visualize the `price` and the `carat`, colored by the quality of the `cut`. -- [x] Change the shape and the size of the points. -- [x] Create a histogram of `price` by the diamonds' `color`. -- [x] Make the bars in histogram side by side. -- [x] Do the same figure but only for diamonds with prices higher than 10,000$. -- [x] Draw a density plot of prices by group of `clarity`. -- [x] Visualize the diamonds' `carat` and width (`y`), colored by `clarity` and use `color` as facet. -- [x] Add a 2nd facet for the `cut`, make the scales vary across both columns and rows. +- [x] 1. Create a plot to visualize the `price` and the `carat`, colored by the quality of the `cut`. +- [x] 2. Change the shape and the size of the points. +- [x] 3. Create a histogram of `price` by the diamonds' `color`. +- [x] 4. Make the bars in histogram side by side. +- [x] 5. Do the same figure but only for diamonds with prices higher than 10,000$. +- [x] 6. Draw a density plot of prices by group of `clarity`. +- [x] 7. Visualize the diamonds' `carat` and width (`y`), colored by `clarity` and use `color` as facet. +- [x] 8. Add a 2nd facet for the `cut`, make the scales vary across both columns and rows. #### Bonus for heatmap -- [x] Use the previously built `p_heatmap` from the [ggplot2 reference](r09_viz_ggplot2.md), try to add clustering tree (dendrogram) on the figure. +- [x] 9. Use the previously built `p_heatmap` from the [ggplot2 reference](r09_viz_ggplot2.md), try to add clustering tree (dendrogram) on the figure. !!! tip "Hints" - We first need data for dendrogram: think about what you will use to build the dendrogram?