You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I noticed that despite #22925 being merged, DeferredDataFrame.to_csv() still doesn't respect the num_shards argument. Minimal test case:
from typing import NamedTuple
import apache_beam as beam
from apache_beam.dataframe import convert
class Row(NamedTuple):
x: int
with beam.Pipeline('DirectRunner') as p:
c = (p | beam.Create([Row(x=i) for i in range(1000000)]))
df = convert.to_dataframe(c)
df.to_csv('/tmp/apache_beam_test.csv', index=False, num_shards=2)
Running this with apache_beam 2.50.0 results in a single shard being written.
What would you like to happen?
Other sinks, such as TextIO and FileSink, allow this customization.
Issue Priority
Priority: 2
Issue Component
Component: dsl-dataframe
The text was updated successfully, but these errors were encountered: