Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lpad, rpad use textwidth/char count incoherently #25016

Closed
StefanKarpinski opened this issue Dec 10, 2017 · 10 comments
Closed

lpad, rpad use textwidth/char count incoherently #25016

StefanKarpinski opened this issue Dec 10, 2017 · 10 comments
Labels
strings "Strings!"

Comments

@StefanKarpinski
Copy link
Member

Here's a slightly cleaned up version of lpad:

function lpad(s::AbstractString, n::Integer, p::AbstractString=" ")
    m = n - textwidth(s)
    m  0 && return s
    l = textwidth(p)
    q, r = divrem(m, l)
    string(p^q, first(p, r), s)
end

The quotient and remainder are computed in terms of textwidth, while the remainder is then used to select a number of characters, not a width of text. We should either revert the change I made in 8be4acc, or compute the number of characters for fractional padding by taking as many leading characters as fit into the allowed text width.

@StefanKarpinski
Copy link
Member Author

cc @jiahao, @stevengj, @nalimilan

@nalimilan
Copy link
Member

I'd just @assert all(c -> textwidth(c) == 1, p), and leave it up to somebody who actually needs a more complex behavior to implement it. Anyway if some characters have a width higher than unity, exact padding may not be possible.

@StefanKarpinski
Copy link
Member Author

In that case why not just implement the character version of this since that's what the assumption that textwidth(c) == 1 is equivalent to.

Another option is to make lpad and rpad parametric and and pass a string width function to them, defaulting to length but also allowing textwidth. Something like this:

function fractional_padding(::typeof(textwidth), p::AbstractString, r::Integer)
    w = 0
    for (i, c) in enumerate(p)
        w += textwidth(c)
        w  r || return first(p, i-1)
    end
    return p
end
fractional_padding(::typeof(length), p::AbstractString, r::Integer) = first(p, r)

function lpad(length::Function, s::AbstractString, n::Integer, p::AbstractString=" ")
    m = n - length(s)
    m  0 && return s
    l = length(p)
    q, r = divrem(m, l)
    string(p^q, fractional_padding(length, p, r), s)
end
lpad(s::AbstractString, n::Integer, p::AbstractString=" ") = lpad(textwidth, s, n, p)

@nalimilan
Copy link
Member

In that case why not just implement the character version of this since that's what the assumption that textwidth(c) == 1 is equivalent to.

That's not the same thing: the string can still contain several characters.

I feel like in general you want to pad with ASCII characters, so while we should keep the possibility of extending the function later, I don't think it's worth wasting your precious time right now on it.

@StefanKarpinski
Copy link
Member Author

That's not the same thing: the string can still contain several characters.

Yes, but asserting that is simply saying that we don't handle padding unless the characters in the padding string are all single-column width. At that point, why not just implement the behavior that treats all characters as having unit width – i.e. padding by character count rather that textwidth. Then the textwidth version can be implemented externally.

@stevengj
Copy link
Member

Didn’t we discuss this in a previous issue?

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Dec 10, 2017

Yes, but this was spurred by the fact that I just noticed that our implementation is still incorrect and arguably it's not possible to implement fully correct exact padding based on textwidth since it's impossible to take half of a width-two character (for example). Here's the previous issue: #10825.

@StefanKarpinski
Copy link
Member Author

#25021 made me realize something important: we want the behavior of base to be independent of what particular version of the Unicode standard one uses. Moving the Unicode package out of Base is a good step in that direction, but now any use of the internal Base.Unicode module is suspect – in particular, this one since it currently depends on textwidth which is only provided by Unicode.

@oscardssmith
Copy link
Member

Should textwidth throw an error if you try to pad an odd amount with a double width character?

@StefanKarpinski
Copy link
Member Author

No, we shouldn't be using textwidth in Base at all because it's behavior depends on the version of Unicode that you're using. My point above is that Base should be Unicode-version-independent. You should be able to choose your version of Unicode by choosing which version of the Unicode stdlib package you use. Accordingly, Base should not use the Unicode package at all.

StefanKarpinski added a commit that referenced this issue Dec 13, 2017
The general principle here is that we should avoid having behavior
in Base that depends on which version of Unicode you're using. It's
fine for an external package to provide versions of lpad and rpad
which use textwidth and thereby depend on Unicode details, but the
functions in Base should not depend on these.
StefanKarpinski added a commit that referenced this issue Dec 14, 2017
The general principle here is that we should avoid having behavior
in Base that depends on which version of Unicode you're using. It's
fine for an external package to provide versions of lpad and rpad
which use textwidth and thereby depend on Unicode details, but the
functions in Base should not depend on these.
StefanKarpinski added a commit that referenced this issue Dec 14, 2017
The general principle here is that we should avoid having behavior
in Base that depends on which version of Unicode you're using. It's
fine for an external package to provide versions of lpad and rpad
which use textwidth and thereby depend on Unicode details, but the
functions in Base should not depend on these.
StefanKarpinski added a commit that referenced this issue Dec 14, 2017
The general principle here is that we should avoid having behavior
in Base that depends on which version of Unicode you're using. It's
fine for an external package to provide versions of lpad and rpad
which use textwidth and thereby depend on Unicode details, but the
functions in Base should not depend on these.
StefanKarpinski added a commit that referenced this issue Dec 14, 2017
The general principle here is that we should avoid having behavior
in Base that depends on which version of Unicode you're using. It's
fine for an external package to provide versions of lpad and rpad
which use textwidth and thereby depend on Unicode details, but the
functions in Base should not depend on these.
StefanKarpinski added a commit that referenced this issue Dec 14, 2017
The general principle here is that we should avoid having behavior
in Base that depends on which version of Unicode you're using. It's
fine for an external package to provide versions of lpad and rpad
which use textwidth and thereby depend on Unicode details, but the
functions in Base should not depend on these.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

No branches or pull requests

4 participants