Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a "labels with" button to the rename dialog #9295

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

Vitalis95
Copy link
Contributor

Fixes #9280
@rdstern @N-thony @lilyclements , added Labels With button to Rename dialog

@rdstern
Copy link
Collaborator

rdstern commented Dec 4, 2024

@Vitalis95 and @lilyclements excellent and seems to work fine. But in the example data I started by copying the names into labels and this now gives an error. I'm sure it ised to work and I'll check now.

image

I used the data given in the survey, included in the issue. The task above works fine in Version 0.8.0

@lilyclements
Copy link
Contributor

@Vitalis95 the R code from @rdstern's error shows that this "Multiple" tab is running type = "rename_labels". On this tab we don't need any changes from how it was before. It should run the following:

data_book$rename_column_in_data(data_name="Responses", type="multiple", new_labels_df=data.frame(cols=c( <columns> ), index=c(2,1,3,4,5,6,7,8,9,10,11,12,13,14)))

@Vitalis95
Copy link
Contributor Author

@rdstern , have a look at it

@rdstern
Copy link
Collaborator

rdstern commented Dec 8, 2024

@Vitalis95 and @lilyclements generally the dialog is now working well as a whole. I have been using the survey data from the recent e-siac that has the questions as the names of the columns.
a) So I made them into the labels. That's working fine now.
b) I made a mess of the editing the labels - explained below. No problem, I was able to copy the names again and paste the original labels.
c) Now I tried to abbreviate the names - that's an option on the Names with. They are terrible abbraviations, but no worry, I was now able to paste the labels back as the names!

This is all good, and the dialog is working well when I am mixing the rop radio buttons. Now I tried to get rid of the dot, between each word in the labels. (there are lots of them in most of the labels.

I tried Contains and replace dot by space. Instead it replaced the first letter of each label. Humpf, that like a regex thingy! But I thought Contains didn't do that.

So I tried \\. instead and it did nothing. That's understandable, that it didn't find that phrase. (I still don't understand why it didn't replace at least the first dot.)

Next I tried Replace with \. because that does understand regex. And it correctly did replace the dot, but just the first. I suggest the default here should be the Replace All options. Maybe there is a checkbox - default unchecked - to replace just the first?

But then I wondered about Contains, but on the names rather than the labels. So I tried to replace dot by space in the names - of course space isn't allowed in a name! And I got this error, that crashed me out.

image

I tried again, replacing dot by underscore and got the same error.

And, @Vitalis95 maybe you could also include our regex keyboard when there is the replace option. You may need to check also with @N-thony.

This is still close - I hope.

@N-thony
Copy link
Collaborator

N-thony commented Dec 9, 2024

And, @Vitalis95 maybe you could also include our regex keyboard when there is the replace option. You may need to check also with @N-thony.

@Vitalis95 you can copy the regex control from Prepare > Column: Text > Find/Replace dialogue.

@lilyclements
Copy link
Contributor

@Vitalis95 the "Replace all" option is just to use stringr::str_replace_all if this checkbox is checked (otherwise stringr::str_replace, which is the current default)

@lilyclements
Copy link
Contributor

lilyclements commented Dec 9, 2024

of course space isn't allowed in a name! And I got this error, that crashed me out.

@rdstern I was being crashed out for unallowed names whne investigating this the other day (see point 2. here). Here, you suggested about us "catching" this error to say something more useful (and to not run the R code so it doesn't crash you out!). I think your example is another case making it much more needed to have this sort of "catch"

@rdstern
Copy link
Collaborator

rdstern commented Dec 10, 2024

@Vitalis95 and @lilyclements this has all become more interesting!
@Vitalis95 I hope you can soon make the amendments so it works ok, except for the major problem that it could crash out in the rename with option, if the formula produces illegal names.

I have been checking what happens in the other options of the rename dialog:
a) In the single column (first button) it traps and doesn't allow a name with a space. I also tried starting with a number and it trapped that too.
b) In the Multiple option it just seems to ignore changes when I introduced spaces in a name. It didn't make that change. But when I started a name with a number it has given me the dreaded exception error, and I think I need to crash out. So it isn't just in the rename with! option.
c) So I wonder what we should do? What happens in RStudio? I think it might be complicated to catch everything, but we could do a backup before making the changes, so we can always return easily to the databook on a restart.

@N-thony
Copy link
Collaborator

N-thony commented Dec 10, 2024

@Vitalis95 and @lilyclements this has all become more interesting! @Vitalis95 I hope you can soon make the amendments so it works ok, except for the major problem that it could crash out in the rename with option, if the formula produces illegal names.

I have been checking what happens in the other options of the rename dialog: a) In the single column (first button) it traps and doesn't allow a name with a space. I also tried starting with a number and it trapped that too. b) In the Multiple option it just seems to ignore changes when I introduced spaces in a name. It didn't make that change. But when I started a name with a number it has given me the dreaded exception error, and I think I need to crash out. So it isn't just in the rename with! option. c) So I wonder what we should do? What happens in RStudio? I think it might be complicated to catch everything, but we could do a backup before making the changes, so we can always return easily to the databook on a restart.

@Vitalis95 following your question on Skype, below is how it should look like when introducing regex feature.

data_book$rename_column_in_data(data_name="survey", type="rename_with", .fn=stringr::str_replace, .cols=tidyselect::starts_with("f"), pattern=stringr::regex("^f", FALSE, multiline=FALSE, comments=FALSE), replacement="g")

or
we need to ignore .cols as we are using Regex

data_book$rename_column_in_data(
  data_name = "survey", 
  type = "rename_with", 
  .fn = stringr::str_replace,  # Function to apply for renaming columns
  pattern = stringr::regex("^f", FALSE, multiline=FALSE, comments=FALSE),  # Regex pattern for replacement
  replacement = "g"  # Replace "f" with "g"
)

image

but the following doesn't work because the tidyselect::starts_with() function does not accept a regex expression, I'm wondering why by default the tidyselect functions... by default get the pattern value. @lilyclements can clarify about this.
data_book$rename_column_in_data(data_name="survey", type="rename_with", .fn=stringr::str_replace, .cols=tidyselect::starts_with("^f"), pattern=stringr::regex("^f", FALSE, multiline=FALSE, comments=FALSE), replacement="g")

@lilyclements
Copy link
Contributor

@rdstern to c), it can work in RStudio, but can be very frustrating-

# e.g., 1: replace "Sepal." with "Petal." so we have duplicated column names
data(iris)

names(iris) <- stringr::str_replace_all(names(iris), "Sepal.", "Petal.")

names(iris)
# [1] "Petal.Length" "Petal.Width"  "Petal.Length" "Petal.Width"  "Species"   

# our column names are duplicated, but we can't actually access the duplicated ones! I can only get one of the `Petal.Length`'s

head(iris)
#Petal.Length Petal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa


# E.g., 2, replacing with an "illegal" character works fine, you just have to access that variable by putting a "`" around it:

data(iris)

names(iris) <- stringr::str_replace_all(names(iris), "S", "2")
names(iris)
# [1] "2epal.Length" "2epal.Width"  "Petal.Length" "Petal.Width"  "2pecies"  

iris$`2epal.Length`

@N-thony why are we looking at stringr::regex? What was wrong with the initial approach (or is this on the string text dialog, not the rename dialog).

You've said "we need to ignore .cols as we are using Regex", this isn't true is it? I found on the rename columns/labels dialog that it would perform the pattern replacement to the columns which were defined under ".cols", which fit that pattern.

E.g., if we have

Sepal.Length, Sepal.Width, Species, Petal.Length, Petal.Width

I could set .cols to be any starting with "S"
And set my pattern to be any full stops, and to replace that with a "_"
This would change our names to be:

Sepal_Length, Sepal_Width, Species, Petal.Length, Petal.Width

i.e., only apply the change to our columns starting with an "S"

@Vitalis95
Copy link
Contributor Author

@lilyclements, Roger suggested we add regex keyboard when there is the replace option

@Vitalis95
Copy link
Contributor Author

@N-thony , have a look at it

@rdstern
Copy link
Collaborator

rdstern commented Dec 11, 2024

a) @Vitalis95 and @lilyclements in both the Rename with and Label with, the contains is still not working properly. I think I can see the problem, but don't know the correction. When I use Contains it still generates this code,:

data_book$rename_column_in_data(data_name="Responses", type="rename_labels", .fn=stringr::str_replace, .cols=tidyselect::contains("."), pattern=".", replacement=" ", new_labels_df=data.frame(cols=c("Response.number","Which.province.do.you.work.in.","Which.is.the.name.of.your.duty.station.","How.many.people.work.with.you.at.your.duty.station.office.","Have.you.participated.in.an.online.course.before.","This.course.requires.you.to.have.access.to.a.computer..How.do.you.plan.to.access.one.for.the.duration.of.the.course.","Do.copies.of.paper.records.exist.at.your.duty.station","If.yes..what.year.do.they.start.","Does.a.copy.of.the.paper.records.exist.at.the.provincial.office.","Is.there.a.list.of.stations.including.volunteer.weather.stations.for.your.province.","What.is.your.level.of.experience.in.using.spreadsheets.e.g..Excel.to.process.columns.of.data.","Have.you.been.to.an.R.Instat.training.before.","If.yes..would.be.prepared.to.support.your.colleagues.in.this.course.","Is.there.anything.else.you.would.like.to.share.or.comment.about.as.we.begin.this.course"), index=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14)))
So it includes the contains, but that's after str_replace above, so it still uses regex. That's why when I ask it (in contains) to replace the dot with a space, it just gets rid of the first letter instead.

b) Lily, as you say, with the Matches option it replaces just the first option when I use \\., because it is using str_replace. When I changed that (in the script) to str_replace_all it was fine.

Could we have an extra option in the pull down perhaps, so we have Matches as it is now and also Matches All which would use str_replace_all.

Maybe also Contains and Contains_All?

c) @Vitalis95 the Include regular expression checkbox seems to work fine. Could it be disabled unless the Matches, or Matches All options in the pull down is selected?

@Vitalis95
Copy link
Contributor Author

@rdstern , have a look at it

@lilyclements
Copy link
Contributor

@rdstern @Vitalis95 @N-thony there is a difference between .cols and pattern:

  • .cols needs it in regex form if we have matches, but not if we have contains.
  • but, pattern always needs it in regex.

We currently set it up that pattern reads the same parameter value as .cols.

I know in stringr, if you read in fixed(".") then it'll do that literal ".".
Perhaps if you read in contains, then we need to have pattern= stringr::fixed(" <VALUE IN INPUT> "). Thoughts?

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vitalis95 almost there now. The Matches and Matches all both work fine. It just needs Contains and Contains All to be literal and not use regex. Does the suggestion by @lilyclements above achieve that?

@Vitalis95
Copy link
Contributor Author

@rdstern , check it again

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vitalis95 lovely - many thanks. Also thanks @lilyclements
@N-thony it will be great to merge this. Then I wonder about a plan to trap errors in names and institute your undo, if so. I'll put this in a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a "labels with" button to the rename dialog?
4 participants