-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update str.cat example #23723
DOC: update str.cat example #23723
Conversation
Just realised while double-checking the code locally that there's In other words, the doc build here will have a warning (=fail?), until #23725 is merged. |
Ok, the build didn't fail - it's just a flaky hypothesis failure. |
doc/source/text.rst
Outdated
u | ||
s.str.cat([u.values, | ||
u.index.astype(str).values], na_rep='-') | ||
s2 = s.set_axis(['a', 'b', 'c', 'd'], inplace=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not obvious at all, just directly construct the Series will be much more clear
doc/source/text.rst
Outdated
u.index.astype(str).values], na_rep='-') | ||
s2 = s.set_axis(['a', 'b', 'c', 'd'], inplace=False) | ||
s2 | ||
u2 = u.set_axis(['b', 'd', 'a', 'c'], inplace=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
doc/source/text.rst
Outdated
|
||
.. ipython:: python | ||
|
||
v | ||
s.str.cat([u, v], join='outer', na_rep='-') | ||
s.str.cat([v, u, u.values], join='outer', na_rep='-') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of accessing .values
would also be preferable to use the container here, assuming you build one to instantiate a Series
on next update
Codecov Report
@@ Coverage Diff @@
## master #23723 +/- ##
=======================================
Coverage 92.38% 92.38%
=======================================
Files 166 166
Lines 52490 52490
=======================================
Hits 48493 48493
Misses 3997 3997
Continue to review full report at Codecov.
|
doc/source/text.rst
Outdated
u | ||
s.str.cat([u.values, | ||
u.index.astype(str).values], na_rep='-') | ||
s_values = np.array(['a', 'b', 'c', 'd'], dtype=object) # same as s.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the comments here. Rather harmless with the example provided but I don't know if that comment will hold universally with all types (thinking EAs in particular) so don't want to give users an impression of that without context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I though this was clearly referencing the actual series s
(as I wanted to motivate the variable name), rather than make a general statement about the relationship between np.array
and .values
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, the comment is just confusing
doc/source/text.rst
Outdated
@@ -306,21 +306,26 @@ The same alignment can be used when ``others`` is a ``DataFrame``: | |||
Concatenating a Series and many objects into a Series | |||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
|
|||
All one-dimensional list-likes can be combined in a list-like container (including iterators, ``dict``-views, etc.): | |||
Several items can be combined a list-like container (including iterators, ``dict``-views, etc.), which may contain ``Series``, ``Index`` and ``np.ndarray``. | |||
Note that ``Index`` will align as well, so we change the indexes of ``s`` and ``u`` to strings for the purpose of this example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what this last statement about Index actually means. can you reword.
doc/source/text.rst
Outdated
u | ||
s.str.cat([u.values, | ||
u.index.astype(str).values], na_rep='-') | ||
s_values = np.array(['a', 'b', 'c', 'd'], dtype=object) # same as s.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, the comment is just confusing
this example still is a bit complicated. can you simplify at all. |
I don't like the example too much either, especially initialising the ndarray directly. After all, what I want to do is change the index of >>> s2 = s.set_index(s.values)
>>> s2
a a
b b
c c
d d
dtype: object
>>> u2 = u.set_index(u.values)
>>> u2
b b
d d
a a
c c
dtype: object
>>> idx = pd.Index(['d', 'c', 'b', 'a'])
>>> idx
Index(['d', 'c', 'b', 'a'], dtype='object')
>>> u_values = u.values
>>> u_values
array(['b', 'd', 'a', 'c'], dtype=object)
>>> s2.str.cat([u2, idx, u_values], join='left')
a aaab
b bbbd
c ccca
d dddc
dtype: object |
can you merge master and will look |
doc/source/text.rst
Outdated
u.index.astype(str).values], na_rep='-') | ||
s_values = np.array(['a', 'b', 'c', 'd'], dtype=object) | ||
s2 = pd.Series(s_values, index=s_values) | ||
s2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this in multiple blocks as its too much to complicated here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole example needs to be simpler. maybe just leave the index as integers to avoid confusion, IOW focus less on the join in str.cat and more on the list-lke things that are going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc/source/text.rst
Outdated
u.index.astype(str).values], na_rep='-') | ||
s_values = np.array(['a', 'b', 'c', 'd'], dtype=object) | ||
s2 = pd.Series(s_values, index=s_values) | ||
s2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd |
doc/source/text.rst
Outdated
@@ -303,23 +303,23 @@ The same alignment can be used when ``others`` is a ``DataFrame``: | |||
Concatenating a Series and many objects into a Series | |||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
|
|||
All one-dimensional list-likes can be combined in a list-like container (including iterators, ``dict``-views, etc.): | |||
Several items can be combined a list-like container (including iterators, ``dict``-views, etc.), which may contain ``Series``, ``Index``, ``PandasArray`` and ``np.ndarray``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think it was a typo to remove the word in
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks
doc/source/text.rst
Outdated
@@ -303,23 +303,23 @@ The same alignment can be used when ``others`` is a ``DataFrame``: | |||
Concatenating a Series and many objects into a Series | |||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
|
|||
All one-dimensional list-likes can be combined in a list-like container (including iterators, ``dict``-views, etc.): | |||
Several items can be combined a list-like container (including iterators, ``dict``-views, etc.), which may contain ``Series``, ``Index``, ``PandasArray`` and ``np.ndarray``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also some nits around verbage, but I think it would be easier to keep the Series
, Index
, etc... mentions closer to "Several items"; as is I had to read a few times to truly understand what was meant after the word which
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even just `Several items (ex: Series, Index, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I'm making an explicit list though, as only those types are allowed within list-likes
doc/source/text.rst
Outdated
|
||
All elements must match in length to the calling ``Series`` (or ``Index``), except those having an index if ``join`` is not None: | ||
All elements without an index (e.g. ``PandasArray`` and ``np.ndarray``) within the passed list-like must match in length to the calling ``Series`` (or ``Index``), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might have missed this but what's the reason for bringing up PandasArray? Not really something the end user would be using directly (at least in current form)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PandasArray
is very user-facing:
>>> s = pd.Series(['a', 'b' ,'c', 'd'])
>>> s.array
<PandasArray>
['a', 'b', 'c', 'd']
Length: 4, dtype: object
and the current example was recently changed to use .array
instead of .values
. I think this should be documented clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks. I have been somewhat on the sidelines for that conversation so I'll defer to @jreback specifically on this piece
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to mention PandasArray
here, its not very interesting, nor relevant. Just say array-likes. and remove u.array
, the u.to_numpy()
is the corrent idiom here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
There is a big difference between u.array
and u.to_numpy()
:
>>> s = pd.Series(['a', 'b', 'c', 'd'])
>>> s.array
<PandasArray>
['a', 'b', 'c', 'd']
Length: 4, dtype: object
>>> s.to_numpy()
array(['a', 'b', 'c', 'd'], dtype=object)
I'm guessing .array
will eventually replace .values
-usage (e.g. to get rid of the index for .str.cat
), since it is by design better suited for pandas-internal dtypes, and so the distinction above is not just an irrelevant detail IMO.
I want to show here the explicitly allowed item types to pass into a list-like, which have to pass:
nxt = others.pop(0)
[...]
if not (isinstance(nxt, (Series, Index))
or (isinstance(nxt, np.ndarray) and nxt.ndim == 1)):
raise ValueError(...) # currently just a DeprecationWarning
Long story short, I want to show a list-like containing an np.ndarray
, a PandasArray
(to reiterate, this example was already changed by @TomAugspurger to use .array
instead of .values
in #23623), and a Series
. (Including Index
would be nice-to-have, but too complicated absent #22225).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h-vetinari .to_numpy()
replaces .values
; .array
is user-accessible, but generally is not visible to by the user. its not necessary here and is just noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.to_numpy()
replaces.values
;
As I did in the last few commits...
.array
is user-accessible, but generally is not visible to by the user. its not necessary here and is just noise.
Will remove, but pinging @TomAugspurger since he added the current u.array
in this example in #23623 (although likely "just" for replacing u.values
?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review; pushed new commit
doc/source/text.rst
Outdated
@@ -303,23 +303,23 @@ The same alignment can be used when ``others`` is a ``DataFrame``: | |||
Concatenating a Series and many objects into a Series | |||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
|
|||
All one-dimensional list-likes can be combined in a list-like container (including iterators, ``dict``-views, etc.): | |||
Several items can be combined a list-like container (including iterators, ``dict``-views, etc.), which may contain ``Series``, ``Index``, ``PandasArray`` and ``np.ndarray``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks
doc/source/text.rst
Outdated
@@ -303,23 +303,23 @@ The same alignment can be used when ``others`` is a ``DataFrame``: | |||
Concatenating a Series and many objects into a Series | |||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
|
|||
All one-dimensional list-likes can be combined in a list-like container (including iterators, ``dict``-views, etc.): | |||
Several items can be combined a list-like container (including iterators, ``dict``-views, etc.), which may contain ``Series``, ``Index``, ``PandasArray`` and ``np.ndarray``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I'm making an explicit list though, as only those types are allowed within list-likes
doc/source/text.rst
Outdated
|
||
All elements must match in length to the calling ``Series`` (or ``Index``), except those having an index if ``join`` is not None: | ||
All elements without an index (e.g. ``PandasArray`` and ``np.ndarray``) within the passed list-like must match in length to the calling ``Series`` (or ``Index``), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PandasArray
is very user-facing:
>>> s = pd.Series(['a', 'b' ,'c', 'd'])
>>> s.array
<PandasArray>
['a', 'b', 'c', 'd']
Length: 4, dtype: object
and the current example was recently changed to use .array
instead of .values
. I think this should be documented clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
doc/source/text.rst
Outdated
|
||
All elements must match in length to the calling ``Series`` (or ``Index``), except those having an index if ``join`` is not None: | ||
All elements without an index (e.g. ``PandasArray`` and ``np.ndarray``) within the passed list-like must match in length to the calling ``Series`` (or ``Index``), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks. I have been somewhat on the sidelines for that conversation so I'll defer to @jreback specifically on this piece
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL
doc/source/text.rst
Outdated
|
||
All elements must match in length to the calling ``Series`` (or ``Index``), except those having an index if ``join`` is not None: | ||
All elements without an index (e.g. ``PandasArray`` and ``np.ndarray``) within the passed list-like must match in length to the calling ``Series`` (or ``Index``), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.to_numpy()
replaces.values
;
As I did in the last few commits...
.array
is user-accessible, but generally is not visible to by the user. its not necessary here and is just noise.
Will remove, but pinging @TomAugspurger since he added the current u.array
in this example in #23623 (although likely "just" for replacing u.values
?).
thanks @h-vetinari |
There have been a couple of changes recently due to the whole docstring validation effort (e.g. #22838), which got rid of some warnings emitted by the sample code.
As it stands, the code samples don't really reflect the running text anymore, and with the changes from #22264, it makes sense to update the running text as well.