Skip to content

Commit

Permalink
Deploy: 8ecbea3
Browse files Browse the repository at this point in the history
  • Loading branch information
dataframe-api-bot committed Dec 7, 2023
1 parent 08d8f58 commit 32174c6
Show file tree
Hide file tree
Showing 7 changed files with 28 additions and 27 deletions.
26 changes: 16 additions & 10 deletions draft/API_specification/dataframe_object.html
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,8 @@
</li>
<li class="md-nav__item"><a href="#dataframe_api.DataFrame.is_null" class="md-nav__link"><code class="docutils literal notranslate"><span class="pre">DataFrame.is_null()</span></code></a>
</li>
<li class="md-nav__item"><a href="#dataframe_api.DataFrame.iter_columns" class="md-nav__link"><code class="docutils literal notranslate"><span class="pre">DataFrame.iter_columns()</span></code></a>
</li>
<li class="md-nav__item"><a href="#dataframe_api.DataFrame.join" class="md-nav__link"><code class="docutils literal notranslate"><span class="pre">DataFrame.join()</span></code></a>
</li>
<li class="md-nav__item"><a href="#dataframe_api.DataFrame.max" class="md-nav__link"><code class="docutils literal notranslate"><span class="pre">DataFrame.max()</span></code></a>
Expand Down Expand Up @@ -695,6 +697,8 @@
</li>
<li class="md-nav__item"><a href="#dataframe_api.DataFrame.is_null" class="md-nav__link"><code class="docutils literal notranslate"><span class="pre">DataFrame.is_null()</span></code></a>
</li>
<li class="md-nav__item"><a href="#dataframe_api.DataFrame.iter_columns" class="md-nav__link"><code class="docutils literal notranslate"><span class="pre">DataFrame.iter_columns()</span></code></a>
</li>
<li class="md-nav__item"><a href="#dataframe_api.DataFrame.join" class="md-nav__link"><code class="docutils literal notranslate"><span class="pre">DataFrame.join()</span></code></a>
</li>
<li class="md-nav__item"><a href="#dataframe_api.DataFrame.max" class="md-nav__link"><code class="docutils literal notranslate"><span class="pre">DataFrame.max()</span></code></a>
Expand Down Expand Up @@ -1331,6 +1335,11 @@
but note that the Standard makes no guarantees about them.</p>
</dd></dl>
<dl class="py method">
<dt class="sig sig-object py" id="dataframe_api.DataFrame.iter_columns">
<span class="sig-name descname"><span class="pre">iter_columns</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon"></span> <span class="sig-return-typehint"><span class="pre">Iterator</span><span class="p"><span class="pre">[</span></span><a class="reference internal" href="column_object.html#dataframe_api.Column" title="dataframe_api.Column"><span class="pre">Column</span></a><span class="p"><span class="pre">]</span></span></span></span><a class="headerlink" href="#dataframe_api.DataFrame.iter_columns" title="Permalink to this definition"></a></dt>
<dd><p>Return iterator over columns.</p>
</dd></dl>
<dl class="py method">
<dt class="sig sig-object py" id="dataframe_api.DataFrame.join">
<span class="sig-name descname"><span class="pre">join</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">other</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Self</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">*</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">how</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">Literal</span><span class="p"><span class="pre">[</span></span><span class="s"><span class="pre">'left'</span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="s"><span class="pre">'inner'</span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="s"><span class="pre">'outer'</span></span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">left_on</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span><span class="w"> </span><span class="p"><span class="pre">|</span></span><span class="w"> </span><span class="pre">list</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span></em>, <em class="sig-param"><span class="n"><span class="pre">right_on</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span><span class="w"> </span><span class="p"><span class="pre">|</span></span><span class="w"> </span><span class="pre">list</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon"></span> <span class="sig-return-typehint"><span class="pre">Self</span></span></span><a class="headerlink" href="#dataframe_api.DataFrame.join" title="Permalink to this definition"></a></dt>
<dd><p>Join with other dataframe.</p>
Expand Down Expand Up @@ -1393,22 +1402,19 @@
at most once per dataframe, and as late as possible in the pipeline.</p>
<p>For example, do this</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">df</span><span class="p">:</span> <span class="n">DataFrame</span>
<span class="n">features</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">std</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mi">0</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">persist</span><span class="p">()</span>
<span class="k">for</span> <span class="n">column_name</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">column_names</span><span class="p">:</span>
<span class="k">if</span> <span class="n">result</span><span class="o">.</span><span class="n">col</span><span class="p">(</span><span class="n">column_name</span><span class="p">)</span><span class="o">.</span><span class="n">get_value</span><span class="p">(</span><span class="mi">0</span><span class="p">):</span>
<span class="n">features</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">column_name</span><span class="p">)</span>
<span class="n">features</span> <span class="o">=</span> <span class="p">[</span><span class="n">col</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">iter_columns</span><span class="p">()</span> <span class="k">if</span> <span class="n">col</span><span class="o">.</span><span class="n">get_value</span><span class="p">(</span><span class="mi">0</span><span class="p">)]</span>
</pre></div>
</div>
<p>instead of this:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">df</span><span class="p">:</span> <span class="n">DataFrame</span>
<span class="n">features</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">column_name</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">column_names</span><span class="p">:</span>
<span class="c1"># Do NOT call `persist` on a `DataFrame` within a for-loop!</span>
<span class="c1"># This may re-trigger the same computation multiple times</span>
<span class="k">if</span> <span class="n">df</span><span class="o">.</span><span class="n">persist</span><span class="p">()</span><span class="o">.</span><span class="n">col</span><span class="p">(</span><span class="n">column_name</span><span class="p">)</span><span class="o">.</span><span class="n">std</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">features</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">column_name</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">std</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mi">0</span>
<span class="n">features</span> <span class="o">=</span> <span class="p">[</span>
<span class="c1"># Do NOT do this! This will trigger execution of the entire</span>
<span class="c1"># pipeline for element in the for-loop!</span>
<span class="n">col</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">iter_columns</span><span class="p">()</span> <span class="k">if</span> <span class="n">col</span><span class="o">.</span><span class="n">get_value</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">persist</span><span class="p">()</span>
<span class="p">]</span>
</pre></div>
</div>
</div>
Expand Down
1 change: 1 addition & 0 deletions draft/API_specification/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -597,6 +597,7 @@
<li class="toctree-l3"><a class="reference internal" href="dataframe_object.html#dataframe_api.DataFrame.group_by"><code class="docutils literal notranslate"><span class="pre">DataFrame.group_by()</span></code></a></li>
<li class="toctree-l3"><a class="reference internal" href="dataframe_object.html#dataframe_api.DataFrame.is_nan"><code class="docutils literal notranslate"><span class="pre">DataFrame.is_nan()</span></code></a></li>
<li class="toctree-l3"><a class="reference internal" href="dataframe_object.html#dataframe_api.DataFrame.is_null"><code class="docutils literal notranslate"><span class="pre">DataFrame.is_null()</span></code></a></li>
<li class="toctree-l3"><a class="reference internal" href="dataframe_object.html#dataframe_api.DataFrame.iter_columns"><code class="docutils literal notranslate"><span class="pre">DataFrame.iter_columns()</span></code></a></li>
<li class="toctree-l3"><a class="reference internal" href="dataframe_object.html#dataframe_api.DataFrame.join"><code class="docutils literal notranslate"><span class="pre">DataFrame.join()</span></code></a></li>
<li class="toctree-l3"><a class="reference internal" href="dataframe_object.html#dataframe_api.DataFrame.max"><code class="docutils literal notranslate"><span class="pre">DataFrame.max()</span></code></a></li>
<li class="toctree-l3"><a class="reference internal" href="dataframe_object.html#dataframe_api.DataFrame.mean"><code class="docutils literal notranslate"><span class="pre">DataFrame.mean()</span></code></a></li>
Expand Down
12 changes: 4 additions & 8 deletions draft/_sources/design_topics/execution_model.md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,13 @@ not be supported in some cases.
For example, let's consider the following:
```python
df: DataFrame
features = []
for column_name in df.column_names:
if df.col(column_name).std() > 0:
features.append(column_name)
return features
features = [col.name for col in df.iter_columns() if col.std() > 0]
```
If `df` is a lazy dataframe, then the call `df.col(column_name).std() > 0` returns
If `df` is a lazy dataframe, then the call `col.std() > 0` returns
a (ducktyped) Python boolean scalar. No issues so far. Problem is,
what happens when `if df.col(column_name).std() > 0` is called?
what happens when `if col.std() > 0` is called?

Under the hood, Python will call `(df.col(column_name).std() > 0).__bool__()` in
Under the hood, Python will call `(col.std() > 0).__bool__()` in
order to extract a Python boolean. This is a problem for "lazy" implementations,
as the laziness needs breaking in order to evaluate the above.

Expand Down
12 changes: 4 additions & 8 deletions draft/design_topics/execution_model.html
Original file line number Diff line number Diff line change
Expand Up @@ -338,17 +338,13 @@ <h2 id="scope">Scope<a class="headerlink" href="#scope" title="Permalink to this
not be supported in some cases.</p>
<p>For example, let’s consider the following:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">df</span><span class="p">:</span> <span class="n">DataFrame</span>
<span class="n">features</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">column_name</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">column_names</span><span class="p">:</span>
<span class="k">if</span> <span class="n">df</span><span class="o">.</span><span class="n">col</span><span class="p">(</span><span class="n">column_name</span><span class="p">)</span><span class="o">.</span><span class="n">std</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">features</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">column_name</span><span class="p">)</span>
<span class="k">return</span> <span class="n">features</span>
<span class="n">features</span> <span class="o">=</span> <span class="p">[</span><span class="n">col</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">df</span><span class="o">.</span><span class="n">iter_columns</span><span class="p">()</span> <span class="k">if</span> <span class="n">col</span><span class="o">.</span><span class="n">std</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">]</span>
</pre></div>
</div>
<p>If <code class="docutils literal notranslate"><span class="pre">df</span></code> is a lazy dataframe, then the call <code class="docutils literal notranslate"><span class="pre">df.col(column_name).std()</span> <span class="pre">&gt;</span> <span class="pre">0</span></code> returns
<p>If <code class="docutils literal notranslate"><span class="pre">df</span></code> is a lazy dataframe, then the call <code class="docutils literal notranslate"><span class="pre">col.std()</span> <span class="pre">&gt;</span> <span class="pre">0</span></code> returns
a (ducktyped) Python boolean scalar. No issues so far. Problem is,
what happens when <code class="docutils literal notranslate"><span class="pre">if</span> <span class="pre">df.col(column_name).std()</span> <span class="pre">&gt;</span> <span class="pre">0</span></code> is called?</p>
<p>Under the hood, Python will call <code class="docutils literal notranslate"><span class="pre">(df.col(column_name).std()</span> <span class="pre">&gt;</span> <span class="pre">0).__bool__()</span></code> in
what happens when <code class="docutils literal notranslate"><span class="pre">if</span> <span class="pre">col.std()</span> <span class="pre">&gt;</span> <span class="pre">0</span></code> is called?</p>
<p>Under the hood, Python will call <code class="docutils literal notranslate"><span class="pre">(col.std()</span> <span class="pre">&gt;</span> <span class="pre">0).__bool__()</span></code> in
order to extract a Python boolean. This is a problem for “lazy” implementations,
as the laziness needs breaking in order to evaluate the above.</p>
<p>Dask and Polars both require that <code class="docutils literal notranslate"><span class="pre">.compute</span></code> (resp. <code class="docutils literal notranslate"><span class="pre">.collect</span></code>) be called beforehand
Expand Down
2 changes: 2 additions & 0 deletions draft/genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -692,6 +692,8 @@ <h2 id="I">I</h2>
</li>
</ul></li>
<li><a href="API_specification/column_object.html#dataframe_api.Column.iso_weekday">iso_weekday() (Column method)</a>
</li>
<li><a href="API_specification/dataframe_object.html#dataframe_api.DataFrame.iter_columns">iter_columns() (DataFrame method)</a>
</li>
</ul></td>
</tr></table>
Expand Down
Binary file modified draft/objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion draft/searchindex.js

Large diffs are not rendered by default.

0 comments on commit 32174c6

Please sign in to comment.