Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RowIterator implementation #64

Merged
merged 2 commits into from
Aug 5, 2022

Conversation

Gor027
Copy link
Contributor

@Gor027 Gor027 commented Jul 8, 2022

This PR is dependent upon #63

As the C++ driver interface requires functions cass_iterator_from_result and cass_iterator_from_row to return CassIterator, this PR modifies already existing CassIterator struct to be enum and to contain CassResultIterator and CassRowIterator.
Additionally, some necessary functions were implemented in order to have at least 1 passing integration test. So, it should pass a basic test called BasicsTests.Integration_Cassandra_RowsInRowsOut.

@Gor027 Gor027 force-pushed the basic_tests_rows_in_rows_out branch 3 times, most recently from 758b7d8 to a1f4817 Compare July 20, 2022 08:01
@Gor027 Gor027 mentioned this pull request Jul 20, 2022
Comment on lines 51 to 54
let new_pos: usize = match result_iterator.position {
Some(prev_pos) => prev_pos + 1,
None => 0,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: you can use map_or to make is shorter:

let new_pos = result_iterator.map_or(0, |x| x + 1);

};

Box::into_raw(Box::new(iterator))
#[repr(C)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repr(C) is not needed here. It's only important if you have a corresponding definition in a C header and you want both Rust and C to use the same struct. If you return an opaque pointer to C then repr doesn't matter and it's better to leave it as default.

#[no_mangle]
pub unsafe extern "C" fn cass_result_free(result_raw: *mut CassResult) {
free_arced(result_raw);
#[repr(C)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for repr(C).

pub unsafe extern "C" fn cass_result_has_more_pages(result: *const CassResult) -> cass_bool_t {
let result = ptr_to_ref(result);
result.paging_state.is_some() as cass_bool_t
#[repr(C)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for repr(C).


iter.position = Some(new_pos);
let column: &CassValue = row_iterator.row.columns.get(iter_position).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this panic here intentional? If the iterator goes past the last column, there will be a panic here. How does the original cpp driver code behave?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the iterator goes past the last column then cass_iterator_next should return false and cass_iterator_get_column will not get executed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing prevents users from calling cass_iterator_get_column after cass_iterator_next returns false. In such case we should behave as the original driver does.

Besides, you already implemented a similar function cass_iterator_get_row and there you return std::ptr::null() when it is called on an exhausted iterator. We have two similar functions which behave inconsistently. Please check how the original driver behaves and align those functions with this behavior (I'd expect that they return null).

_ => CassDataType::Value(CassValueType::CASS_VALUE_TYPE_UNKNOWN),
};

Arc::into_raw(Arc::new(cass_data_type))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring next to cass_value_data_type contains the following text:

 * @return Returns a reference to the data type of the value.
 * Do not free this reference as it is bound to the lifetime of the value.

We cannot allocate and return new objects because users won't be freeing them and this will lead to memory leaks.

You need to store a CassDataType along with CassValue somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way I could think of is to change the type CassValue to either a tuple or a struct to contain CassDataType, but that would cause a lot more other changes. Perhaps, you have other suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid changing CassValue to a struct that contains CassDataType is the simplest solution that I can think of.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really fix it. OK, you added value_type field to CassValue, but you are still using Arc::new() which allocates a new object. You will still have memory leaks because users won't free this object.

Once again - the docstring explains that the function is supposed to return a CassDataType which lives as long as the CassValue it is associated with. Therefore, you just need to return a pointer to the value_type.

Comment on lines 64 to 67
let new_pos: usize = match row_iterator.position {
Some(prev_pos) => prev_pos + 1,
None => 0,
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use map_or here, too.

@Gor027 Gor027 force-pushed the basic_tests_rows_in_rows_out branch 3 times, most recently from 094382f to d282cba Compare July 26, 2022 16:34
@piodul
Copy link
Collaborator

piodul commented Jul 26, 2022

Please avoid addressing review comments by adding commits that fix bugs introduced in previous commits. In particular, the first commit still has the issue with the lifetime of the value returned from cass_value_data_type. You should first change the CassValue to store CassDataType so that you can correctly implement cass_value_data_type in the next commit.

Moreover, please add commit descriptions. Each description should explain what is changed in the commit and why.

@Gor027 Gor027 force-pushed the basic_tests_rows_in_rows_out branch from d282cba to 34719b4 Compare July 27, 2022 09:35
@@ -276,13 +416,13 @@ pub unsafe extern "C" fn cass_value_get_string(
output_size: *mut size_t,
) -> CassError {
let val: &CassValue = ptr_to_ref(value);
match val {
match val.value.clone() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why .clone()? You aren't moving out anything of the value you are matching on, so probably match &val.value will suffice.

cass_data_type_type(&value_from_raw.value_type)
}

pub fn get_type_from_value(value: Option<CqlValue>) -> CassDataType {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CqlValue might potentially be a big object. There is no reason for this function to receive it by value, it only forces the caller to .clone() it. Please change the type of value to &Option<CqlValue>.

let value_type = get_type_from_value(value.clone());
let column = CassValue { value, value_type };

return Box::into_raw(Box::new(column)) as *const CassValue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not allocate a new value in the original implementation. It just returns a reference to an existing CassValue.

You need to make bigger changes in order to make it work. More specifically, You need to change CassRow so that it keeps a Vec<CassValue>. In turn, you probably need to modify the definition of CassResult, too. I suggest doing those changes first, in a separate commit, and then continuing with the rest.

@Gor027 Gor027 force-pushed the basic_tests_rows_in_rows_out branch 4 times, most recently from 6452ec7 to 3f03efb Compare July 28, 2022 10:29
@Gor027 Gor027 requested a review from piodul July 29, 2022 07:39
@@ -79,7 +91,7 @@ pub unsafe extern "C" fn cass_iterator_get_row(iterator: *const CassIterator) ->
None => return std::ptr::null(),
};

let row: &Row = match iter
let row = match iter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: please avoid doing irrelevant changes like this. They introduce unnecessary noise in the diff.

If you want to do some cleanups (and have a good reason to do so, e.g. it is related to the PR) then I recommend putting them in a separate commit. In this case it's better to drop it altogether IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the type of row was changed to CassRow, so I just removed the explicit type annotation. Maybe it is better to annotate it as CassRow.

pub type CassRow = Row;
pub struct CassRow {
pub columns: Vec<CassValue>,
pub result_: Arc<CassResultData>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there an underscore at the end of result_?

Besides, wouldn't a name like metadata be better, like in CassResult::metadata? I don't understand why it is inconsistent now.

Comment on lines 113 to 135
match rows {
Some(rs) => {
let cass_rows = rs
.iter()
.map(|r| CassRow {
columns: r.columns.clone(),
result_: metadata.clone(),
})
.collect();

Some(cass_rows)
}
None => None,
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: no need to write such a big match block. The populate_cass_rows_from_rows function returns an Option, so you can just use the ? operator:

let rows = rows.as_ref()?;
let cass_rows = /* ... */

Err(err) => Ok(CassResultValue::QueryError(Arc::new(err))),
}
})
}

fn populate_cass_rows_from_rows(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: usually, populate means filling an already existing structure with data. This function creates a completely new Option<Vec<CassRow>>, so how about using create instead of populate?

Box::into_raw(Box::new(CassIterator::CassRowIterator(iterator)))
}

// This was const for some reason, seems like a mistake in cpp driver
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like free_arced takes a const pointer as an argument. Shouldn't this function work with *const CassResult?

return CassError::CASS_ERROR_LIB_INDEX_OUT_OF_BOUNDS;
}

let column_spec: &ColumnSpec = result_from_raw.metadata.col_specs.get(index_usize).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the original function behave when index_usize is larger than the last column index? I doubt that we should crash here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check it here, and in case it's larger than the last column index, just returns a CassError::CASS_ERROR_LIB_INDEX_OUT_OF_BOUNDS. The implementation is the same as in the C++ driver.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I missed it, thanks.

Err(err) => Ok(CassResultValue::QueryError(Arc::new(err))),
}
})
}

fn populate_cass_rows_from_rows(
rows: &Option<Vec<Row>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Takes rows by value, not by reference. This way you can later use rs.into_iter() instead of rs.iter() and avoid copying.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the type of rows should be made in the first commit.

@@ -126,6 +127,16 @@ fn populate_cass_rows_from_rows(
}
}

fn populate_cass_row_columns(row: &Row) -> Vec<CassValue> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you address my comment about populate_cass_rows_from_rows, then you will be able to change this function so that it takes row by value.

Also, please change populate to create in the function's name.

@@ -128,6 +129,40 @@ impl CassDataType {
}
}

pub fn get_type_from_value(value: &Option<CqlValue>) -> CassDataType {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot get a complete type just from the value alone. It doesn't contain full information about complex data types, i.e.: lists, sets, maps and UDTs. You must get this information from the metadata in CassResultData.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CassResultData.col_specs has enough information to retrieve a complete type, however, it's not accessible. The field typ: ColumnType of ColumnSpec is not public in the Rust driver. I think the only way of getting column-specific metadata is the QueryResult.col_specs, so we are obliged to make some changes in the Rust driver side: making the field typ: ColumnType to be public. Perhaps we should do that in another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: sorry, I looked at the forked driver version, in the original repo the mentioned ColumnSepc.typ is public. So, after merging #67, the function get_type_from_value can be fixed to return complete type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piodul This is fixed now, could you please review it once again?

@Gor027 Gor027 force-pushed the basic_tests_rows_in_rows_out branch 3 times, most recently from 6b38bc1 to 7ea8aaf Compare August 1, 2022 15:03
@@ -128,6 +129,61 @@ impl CassDataType {
}
}

pub fn get_column_type(column_type: Option<ColumnType>) -> CassDataType {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnType can be potentially heavy when describing compound types such as collections or UDTs. This function doesn't have to take column_type by value, you can take it by reference.

Err(err) => Ok(CassResultValue::QueryError(Arc::new(err))),
}
})
}

fn populate_cass_rows_from_rows(
rows: &Option<Vec<Row>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the type of rows should be made in the first commit.

Comment on lines 127 to 130
row.columns
.into_iter()
.enumerate()
.map(|(index, col)| CassValue {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip: the length of col_specs and row.columns should always be the same, so you should be able to just use zip:

row.columns
    .iter()
    .zip(metadata.col_specs.iter())
    .map(|(col, spec)| {
        // ...
    })
    .collect();

@Gor027 Gor027 force-pushed the basic_tests_rows_in_rows_out branch from 7ea8aaf to fb8d65f Compare August 2, 2022 13:42
@Gor027 Gor027 requested a review from piodul August 3, 2022 10:09
rows: Option<Vec<Row>>,
metadata: &Arc<CassResultData>,
) -> Option<Vec<CassRow>> {
let rows = rows.as_ref()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop as_ref() here and you won't have to clone in line 117. Rows contain user data so they can be potentially large and we would like to avoid copies.

Gor027 added 2 commits August 3, 2022 12:25
Added query result data about paging_state and col_specs
to CassResult.

Changed CassRow to have reference on query result metadata
about columns.
CassIterator struct modified to simulate iterator hierarchy in C++
dirver.

Added RowIterator implementation with necessary functions to get
column value and type.
@Gor027 Gor027 force-pushed the basic_tests_rows_in_rows_out branch from fb8d65f to e1688bc Compare August 3, 2022 10:27
@Gor027 Gor027 requested a review from avelanarius August 3, 2022 10:49
@Gor027 Gor027 merged commit 0740d23 into scylladb:master Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants