Filterx format csv #132

bshifter · 2024-05-31T17:55:09Z

add csv formatter function to filterx, helps re-format csv data parsed by parse_csv previously

also accepts json_array and json_object as input which are the possible output types for parse_csv filter func
supports optional named argument for delimiter character
supports optional columns argument with json_object input type for column filtering/ordering
ignores column names with json_array input
uses json_object's key ordering when column names are not set (which is seemingly correct order)
auto double-quotes column values when they contain delimiter character
auto escapes double quotes when double-quoting a column value

examples:

filterx {
...
  # array to unnamed cols
  x = parse_csv($MSG); # returns json_array as '["foo","bar","baz"]'
  $MSG = format_csv(x, delimiter=","); # custom delimiter
...
}

filterx {
...
  # dict to unnamed cols
  cols = json_array(["future_use1","receive_time","serial_number","type"]);
  x = parse_csv($MSG, columns=cols); # returns json_object as '{"future_use1":"foo"...}'
  $MSG = format_csv(x); # uses json_oject's key ordering
...
}

filterx {
...
  #dict to named cols
  cols = json_array(["future_use1","receive_time","serial_number","type"]);
  x = parse_csv($MSG, columns=cols);
  cols2 = json_array(["type", "receive_time"]);
  $MSG = format_csv(x, delimiter=" ", columns=cols2); # filters and orders columns by cols2 
...
}

bazsi · 2024-06-02T09:27:04Z

modules/csvparser/filterx-func-format-csv.c

+
+  self->super.super.eval = _eval;
+  self->super.super.free_fn = _free;
+  self->delimiter = ',';


the default delimiter for the parse_csv() function is a space, so I think we should match that here.

bazsi · 2024-06-02T09:30:11Z

modules/csvparser/filterx-func-format-csv.c

+    {
+      guint64 size;
+      if (!filterx_object_len(csv_data, &size))
+        return FALSE;


we should return NULL here.

bazsi · 2024-06-02T09:31:08Z

modules/csvparser/filterx-func-format-csv.c

+  FilterXObject *csv_data = filterx_expr_eval_typed(self->input);
+  if (!csv_data)
+    {
+      filterx_eval_push_error("Failed to evaluate input. " FILTERX_FUNC_FORMAT_CSV_USAGE, s, NULL);


I think we don't need to push the error here as filterx_expr_eval() would set that already.

bazsi · 2024-06-02T09:32:38Z

modules/csvparser/filterx-func-format-csv.c

+          success = _append_to_buffer(NULL, elt, user_data);
+          filterx_object_unref(elt);
+        }
+    }


[optional] I'd extract the specializations to separate functions to make this a bit easier to follow.

bazsi · 2024-06-02T09:34:05Z

modules/csvparser/filterx-func-format-csv.c

+                  filterx_object_unref(col);
+                  filterx_object_unref(elt);
+                }
+            }


we should handle if columns is not a list (return NULL with an error)

now, I can see that in that case, success is FALSE, so that's what we would do.

I think extracting the type specific specializations would have avoided my misunderstanding.

bazsi · 2024-06-02T09:37:09Z

modules/csvparser/filterx-func-format-csv.c

+  else
+    {
+      filterx_eval_push_error("input must be a dict or list. " FILTERX_FUNC_FORMAT_CSV_USAGE, s, csv_data);
+      filterx_object_unref(csv_data);


we do the same in the epilogue part, as sucess is FALSE in this case.

bazsi · 2024-06-02T09:39:54Z

modules/csvparser/filterx-func-format-csv.c

+      g_string_append_c(buffer, '"');
+      append_unsafe_utf8_as_escaped_binary(buffer, value_buffer->str, value_buffer->len, "\"");
+      g_string_append_c(buffer, '"');
+


this is kind of the "dialect" parameter we already support on the parse_csv() side. the dialect would determine what kind of quotation we produce on the output side.

bazsi

This is looking good, I had a few comments, some of which would be worth addressing before merging this.

jszigetvari · 2024-06-03T10:30:40Z

@bshifter Just an idea:
If we try to format a set of fields to csv, and at a certain point all the remaining fields would be empty strings, then I think it would be safe to not print any more fields or separators from that point on. Thus we could stop the formatting of that message right at that place.

jszigetvari · 2024-06-03T14:43:50Z

@bshifter Just an idea: If we try to format a set of fields to csv, and at a certain point all the remaining fields would be empty strings, then I think it would be safe to not print any more fields or separators from that point on. Thus we could stop the formatting of that message right at that place.

After discussing this with @bshifter , it seems better to let this approach go.

alltilla

This is only a partial review. I will continue the review tomorrow.

alltilla · 2024-06-04T15:00:32Z

modules/csvparser/filterx-func-format-csv.c

+        {
+          guint64 size;
+          if (!filterx_object_len(cols, &size))
+            return FALSE;


This leaks cols.

alltilla · 2024-06-04T15:01:24Z

modules/csvparser/filterx-func-format-csv.c

+}
+
+static gboolean
+_handle_dict_input(FilterXFunctionFormatCSV *self, FilterXObject *csv_data, GString *formatted)


[optional]

Usually we functions are written with early returns in axosyslog, as it is more readable. Can you reorganize this function? Thanks!

Signed-off-by: shifter <[email protected]>

bshifter force-pushed the filterx-format-csv branch from 0bc0e91 to 0830d8a Compare May 31, 2024 18:37

bazsi reviewed Jun 2, 2024

View reviewed changes

bshifter force-pushed the filterx-format-csv branch from 0830d8a to 1dd210c Compare June 3, 2024 13:38

alltilla reviewed Jun 4, 2024

View reviewed changes

bshifter added 3 commits June 4, 2024 18:44

csvparser: add filterx-func-format-csv function

23adff7

Signed-off-by: shifter <[email protected]>

csvparser: filter-func-format-csv unit tests

08e51bd

Signed-off-by: shifter <[email protected]>

csvparser: copyright policy update

23908cb

Signed-off-by: shifter <[email protected]>

bshifter force-pushed the filterx-format-csv branch from 1dd210c to 23908cb Compare June 4, 2024 16:47

bazsi approved these changes Jun 5, 2024

View reviewed changes

bazsi merged commit 4b5f9b6 into axoflow:main Jun 5, 2024
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filterx format csv #132

Filterx format csv #132

bshifter commented May 31, 2024

bazsi Jun 2, 2024

bazsi Jun 2, 2024 •

edited

Loading

bazsi Jun 2, 2024

bazsi Jun 2, 2024

bazsi Jun 2, 2024 •

edited

Loading

bazsi Jun 2, 2024

bazsi Jun 2, 2024

bazsi Jun 2, 2024

bazsi left a comment

jszigetvari commented Jun 3, 2024

jszigetvari commented Jun 3, 2024

alltilla left a comment

alltilla Jun 4, 2024

alltilla Jun 4, 2024

Filterx format csv #132

Filterx format csv #132

Conversation

bshifter commented May 31, 2024

Choose a reason for hiding this comment

bazsi Jun 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bazsi Jun 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bazsi left a comment

Choose a reason for hiding this comment

jszigetvari commented Jun 3, 2024

jszigetvari commented Jun 3, 2024

alltilla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bazsi Jun 2, 2024 •

edited

Loading

bazsi Jun 2, 2024 •

edited

Loading