Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filterx format csv #132

Merged
merged 3 commits into from
Jun 5, 2024
Merged

Filterx format csv #132

merged 3 commits into from
Jun 5, 2024

Conversation

bshifter
Copy link
Member

add csv formatter function to filterx, helps re-format csv data parsed by parse_csv previously

  • also accepts json_array and json_object as input which are the possible output types for parse_csv filter func
  • supports optional named argument for delimiter character
  • supports optional columns argument with json_object input type for column filtering/ordering
  • ignores column names with json_array input
  • uses json_object's key ordering when column names are not set (which is seemingly correct order)
  • auto double-quotes column values when they contain delimiter character
  • auto escapes double quotes when double-quoting a column value

examples:

filterx {
...
  # array to unnamed cols
  x = parse_csv($MSG); # returns json_array as '["foo","bar","baz"]'
  $MSG = format_csv(x, delimiter=","); # custom delimiter
...
}
filterx {
...
  # dict to unnamed cols
  cols = json_array(["future_use1","receive_time","serial_number","type"]);
  x = parse_csv($MSG, columns=cols); # returns json_object as '{"future_use1":"foo"...}'
  $MSG = format_csv(x); # uses json_oject's key ordering
...
}
filterx {
...
  #dict to named cols
  cols = json_array(["future_use1","receive_time","serial_number","type"]);
  x = parse_csv($MSG, columns=cols);
  cols2 = json_array(["type", "receive_time"]);
  $MSG = format_csv(x, delimiter=" ", columns=cols2); # filters and orders columns by cols2 
...
}

@bshifter bshifter force-pushed the filterx-format-csv branch from 0bc0e91 to 0830d8a Compare May 31, 2024 18:37

self->super.super.eval = _eval;
self->super.super.free_fn = _free;
self->delimiter = ',';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default delimiter for the parse_csv() function is a space, so I think we should match that here.

{
guint64 size;
if (!filterx_object_len(csv_data, &size))
return FALSE;
Copy link
Member

@bazsi bazsi Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should return NULL here.

FilterXObject *csv_data = filterx_expr_eval_typed(self->input);
if (!csv_data)
{
filterx_eval_push_error("Failed to evaluate input. " FILTERX_FUNC_FORMAT_CSV_USAGE, s, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to push the error here as filterx_expr_eval() would set that already.

success = _append_to_buffer(NULL, elt, user_data);
filterx_object_unref(elt);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] I'd extract the specializations to separate functions to make this a bit easier to follow.

filterx_object_unref(col);
filterx_object_unref(elt);
}
}
Copy link
Member

@bazsi bazsi Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should handle if columns is not a list (return NULL with an error)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now, I can see that in that case, success is FALSE, so that's what we would do.

I think extracting the type specific specializations would have avoided my misunderstanding.

else
{
filterx_eval_push_error("input must be a dict or list. " FILTERX_FUNC_FORMAT_CSV_USAGE, s, csv_data);
filterx_object_unref(csv_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do the same in the epilogue part, as sucess is FALSE in this case.

g_string_append_c(buffer, '"');
append_unsafe_utf8_as_escaped_binary(buffer, value_buffer->str, value_buffer->len, "\"");
g_string_append_c(buffer, '"');

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kind of the "dialect" parameter we already support on the parse_csv() side. the dialect would determine what kind of quotation we produce on the output side.

Copy link
Member

@bazsi bazsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, I had a few comments, some of which would be worth addressing before merging this.

@jszigetvari
Copy link
Contributor

@bshifter Just an idea:
If we try to format a set of fields to csv, and at a certain point all the remaining fields would be empty strings, then I think it would be safe to not print any more fields or separators from that point on. Thus we could stop the formatting of that message right at that place.

@bshifter bshifter force-pushed the filterx-format-csv branch from 0830d8a to 1dd210c Compare June 3, 2024 13:38
@jszigetvari
Copy link
Contributor

@bshifter Just an idea: If we try to format a set of fields to csv, and at a certain point all the remaining fields would be empty strings, then I think it would be safe to not print any more fields or separators from that point on. Thus we could stop the formatting of that message right at that place.

After discussing this with @bshifter , it seems better to let this approach go.

Copy link
Member

@alltilla alltilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a partial review. I will continue the review tomorrow.

{
guint64 size;
if (!filterx_object_len(cols, &size))
return FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaks cols.

}

static gboolean
_handle_dict_input(FilterXFunctionFormatCSV *self, FilterXObject *csv_data, GString *formatted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional]

Usually we functions are written with early returns in axosyslog, as it is more readable. Can you reorganize this function? Thanks!

@bshifter bshifter force-pushed the filterx-format-csv branch from 1dd210c to 23908cb Compare June 4, 2024 16:47
@bazsi bazsi merged commit 4b5f9b6 into axoflow:main Jun 5, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants