Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Reader Bounds Incorrectly Handles Header #3364

Closed
tustvold opened this issue Dec 18, 2022 · 1 comment · Fixed by #3365
Closed

CSV Reader Bounds Incorrectly Handles Header #3364

tustvold opened this issue Dec 18, 2022 · 1 comment · Fixed by #3365
Assignees
Labels
arrow Changes to the arrow crate bug

Comments

@tustvold
Copy link
Contributor

Describe the bug

The following table illustrates the current behaviour

range has_header num_rows
0..4 false 4
0..4 true 3
1..4 false 3
1..4 true 2

The first two could be justified if the start offset included the header, but this is inconsistent with the two subsequent results

To Reproduce

#[test]
    fn test_header_bounds() {
        let csv = "a,b\na,b\na,b\na,b\na,b\n";
        let tests = [
            (None, false, 5),
            (None, true, 4),
            (Some((0, 4)), false, 4),
            (Some((1, 4)), false, 3),
            (Some((0, 4)), true, 4),
            (Some((1, 4)), true, 3),
        ];

        for (bounds, has_header, expected) in tests {
            let mut reader = ReaderBuilder::new().has_header(has_header);
            if let Some((start, end)) = bounds {
                reader = reader.with_bounds(start, end);
            }
            let b = reader
                .build(Cursor::new(csv.as_bytes()))
                .unwrap()
                .next()
                .unwrap()
                .unwrap();
            assert_eq!(b.num_rows(), expected);
        }
    }

Expected behavior

The bounds shouldn't include the header, i.e. an offset of 1 will return the 2nd row of data irrespective of the presence or not of a header.

Additional context

@tustvold tustvold added the bug label Dec 18, 2022
@tustvold tustvold self-assigned this Dec 18, 2022
@alamb alamb added the arrow Changes to the arrow crate label Dec 29, 2022
@alamb
Copy link
Contributor

alamb commented Dec 29, 2022

label_issue.py automatically added labels {'arrow'} from #3365

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants