lpad, rpad use textwidth/char count incoherently #25016

StefanKarpinski · 2017-12-10T18:30:33Z

Here's a slightly cleaned up version of lpad:

function lpad(s::AbstractString, n::Integer, p::AbstractString=" ")
    m = n - textwidth(s)
    m ≤ 0 && return s
    l = textwidth(p)
    q, r = divrem(m, l)
    string(p^q, first(p, r), s)
end

The quotient and remainder are computed in terms of textwidth, while the remainder is then used to select a number of characters, not a width of text. We should either revert the change I made in 8be4acc, or compute the number of characters for fractional padding by taking as many leading characters as fit into the allowed text width.

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2017-12-10T18:30:55Z

cc @jiahao, @stevengj, @nalimilan

nalimilan · 2017-12-10T18:38:44Z

I'd just @assert all(c -> textwidth(c) == 1, p), and leave it up to somebody who actually needs a more complex behavior to implement it. Anyway if some characters have a width higher than unity, exact padding may not be possible.

StefanKarpinski · 2017-12-10T19:46:03Z

In that case why not just implement the character version of this since that's what the assumption that textwidth(c) == 1 is equivalent to.

Another option is to make lpad and rpad parametric and and pass a string width function to them, defaulting to length but also allowing textwidth. Something like this:

function fractional_padding(::typeof(textwidth), p::AbstractString, r::Integer)
    w = 0
    for (i, c) in enumerate(p)
        w += textwidth(c)
        w ≤ r || return first(p, i-1)
    end
    return p
end
fractional_padding(::typeof(length), p::AbstractString, r::Integer) = first(p, r)

function lpad(length::Function, s::AbstractString, n::Integer, p::AbstractString=" ")
    m = n - length(s)
    m ≤ 0 && return s
    l = length(p)
    q, r = divrem(m, l)
    string(p^q, fractional_padding(length, p, r), s)
end
lpad(s::AbstractString, n::Integer, p::AbstractString=" ") = lpad(textwidth, s, n, p)

nalimilan · 2017-12-10T20:20:10Z

In that case why not just implement the character version of this since that's what the assumption that textwidth(c) == 1 is equivalent to.

That's not the same thing: the string can still contain several characters.

I feel like in general you want to pad with ASCII characters, so while we should keep the possibility of extending the function later, I don't think it's worth wasting your precious time right now on it.

StefanKarpinski · 2017-12-10T21:22:42Z

That's not the same thing: the string can still contain several characters.

Yes, but asserting that is simply saying that we don't handle padding unless the characters in the padding string are all single-column width. At that point, why not just implement the behavior that treats all characters as having unit width – i.e. padding by character count rather that textwidth. Then the textwidth version can be implemented externally.

stevengj · 2017-12-10T21:41:05Z

Didn’t we discuss this in a previous issue?

StefanKarpinski · 2017-12-10T21:48:37Z

Yes, but this was spurred by the fact that I just noticed that our implementation is still incorrect and arguably it's not possible to implement fully correct exact padding based on textwidth since it's impossible to take half of a width-two character (for example). Here's the previous issue: #10825.

StefanKarpinski · 2017-12-13T22:10:03Z

#25021 made me realize something important: we want the behavior of base to be independent of what particular version of the Unicode standard one uses. Moving the Unicode package out of Base is a good step in that direction, but now any use of the internal Base.Unicode module is suspect – in particular, this one since it currently depends on textwidth which is only provided by Unicode.

oscardssmith · 2017-12-13T22:18:39Z

Should textwidth throw an error if you try to pad an odd amount with a double width character?

StefanKarpinski · 2017-12-13T22:22:05Z

No, we shouldn't be using textwidth in Base at all because it's behavior depends on the version of Unicode that you're using. My point above is that Base should be Unicode-version-independent. You should be able to choose your version of Unicode by choosing which version of the Unicode stdlib package you use. Accordingly, Base should not use the Unicode package at all.

The general principle here is that we should avoid having behavior in Base that depends on which version of Unicode you're using. It's fine for an external package to provide versions of lpad and rpad which use textwidth and thereby depend on Unicode details, but the functions in Base should not depend on these.

StefanKarpinski added the strings "Strings!" label Dec 10, 2017

StefanKarpinski mentioned this issue Dec 10, 2017

string overhaul #24999

Merged

StefanKarpinski closed this as completed in 29151a0 Dec 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lpad, rpad use textwidth/char count incoherently #25016

lpad, rpad use textwidth/char count incoherently #25016

StefanKarpinski commented Dec 10, 2017

StefanKarpinski commented Dec 10, 2017

nalimilan commented Dec 10, 2017

StefanKarpinski commented Dec 10, 2017

nalimilan commented Dec 10, 2017

StefanKarpinski commented Dec 10, 2017

stevengj commented Dec 10, 2017

StefanKarpinski commented Dec 10, 2017 •

edited

Loading

StefanKarpinski commented Dec 13, 2017

oscardssmith commented Dec 13, 2017

StefanKarpinski commented Dec 13, 2017

lpad, rpad use textwidth/char count incoherently #25016

lpad, rpad use textwidth/char count incoherently #25016

Comments

StefanKarpinski commented Dec 10, 2017

StefanKarpinski commented Dec 10, 2017

nalimilan commented Dec 10, 2017

StefanKarpinski commented Dec 10, 2017

nalimilan commented Dec 10, 2017

StefanKarpinski commented Dec 10, 2017

stevengj commented Dec 10, 2017

StefanKarpinski commented Dec 10, 2017 • edited Loading

StefanKarpinski commented Dec 13, 2017

oscardssmith commented Dec 13, 2017

StefanKarpinski commented Dec 13, 2017

StefanKarpinski commented Dec 10, 2017 •

edited

Loading