Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom window frame logic (support ROWS, RANGE, PRECEDING and FOLLOWING for window functions) #3570

Merged
merged 11 commits into from
Sep 30, 2022
Merged
1 change: 1 addition & 0 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ jobs:
env:
POSTGRES_PASSWORD: postgres
POSTGRES_DB: db_test
POSTGRES_INITDB_ARGS: --encoding=UTF-8 --lc-collate=C --lc-ctype=C
ports:
- 5432/tcp
options: >-
Expand Down
113 changes: 113 additions & 0 deletions datafusion/common/src/bisect.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

//! This module provides the bisect function, which implements binary search.

use crate::{DataFusionError, Result, ScalarValue};
use arrow::array::ArrayRef;

/// this function implements bisct_left and bisect_right since these functions
metesynnada marked this conversation as resolved.
Show resolved Hide resolved
/// are a lot of code in common we have decided to implement with single function where
/// we separate left and right with compile time lookup.
/// To use bisect_left give true in the template argument
/// To use bisect_right give false in the template argument
pub fn bisect<const SIDE: bool>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reading this I can't help but wonder if the same logic could be found using https://doc.rust-lang.org/std/primitive.slice.html#method.binary_search

Though the subtlety of left and right would need some finagling

item_columns: &[ArrayRef],
target: &[ScalarValue],
) -> Result<usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this returns an index then it's a binary search instead of bisect?

let mut low: usize = 0;
let mut high: usize = item_columns
.get(0)
.ok_or_else(|| {
DataFusionError::Internal("Column array shouldn't be empty".to_string())
})?
.len();
while low < high {
let mid = ((high - low) / 2) + low;
let val = item_columns
.iter()
.map(|arr| ScalarValue::try_from_array(arr, mid))
.collect::<Result<Vec<ScalarValue>>>()?;

// flag true means left, false means right
let flag = if SIDE {
val[..] < *target
} else {
val[..] <= *target
};
if flag {
low = mid + 1;
} else {
high = mid;
}
}
Ok(low)
}

#[cfg(test)]
mod tests {
use arrow::array::Float64Array;
use std::sync::Arc;

use crate::from_slice::FromSlice;
use crate::ScalarValue;
use crate::ScalarValue::Null;

use super::*;

#[test]
fn test_bisect_left_and_right() {
let arrays: Vec<ArrayRef> = vec![
Arc::new(Float64Array::from_slice(&[5.0, 7.0, 8.0, 9., 10.])),
Arc::new(Float64Array::from_slice(&[2.0, 3.0, 3.0, 4.0, 0.0])),
Arc::new(Float64Array::from_slice(&[5.0, 7.0, 8.0, 10., 0.0])),
Arc::new(Float64Array::from_slice(&[5.0, 7.0, 8.0, 10., 0.0])),
];
let search_tuple: Vec<ScalarValue> = vec![
ScalarValue::Float64(Some(8.0)),
ScalarValue::Float64(Some(3.0)),
ScalarValue::Float64(Some(8.0)),
ScalarValue::Float64(Some(8.0)),
];
let res: usize = bisect::<true>(&arrays, &search_tuple).unwrap();
assert_eq!(res, 2);
let res: usize = bisect::<false>(&arrays, &search_tuple).unwrap();
assert_eq!(res, 3);
}

#[test]
fn vector_ord() {
assert!(vec![1, 0, 0, 0, 0, 0, 0, 1] < vec![1, 0, 0, 0, 0, 0, 0, 2]);
assert!(vec![1, 0, 0, 0, 0, 0, 1, 1] > vec![1, 0, 0, 0, 0, 0, 0, 2]);
assert!(
vec![
ScalarValue::Int32(Some(2)),
Null,
ScalarValue::Int32(Some(0)),
] < vec![
ScalarValue::Int32(Some(2)),
Null,
ScalarValue::Int32(Some(1)),
]
);
}

#[test]
fn ord_same_type() {
assert!((ScalarValue::Int32(Some(2)) < ScalarValue::Int32(Some(3))));
}
}
1 change: 1 addition & 0 deletions datafusion/common/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
// specific language governing permissions and limitations
// under the License.

pub mod bisect;
mod column;
mod dfschema;
mod error;
Expand Down
Loading