Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: nested struct shall be considered too #4

Open
0cv opened this issue Jan 24, 2023 · 13 comments
Open

Feature: nested struct shall be considered too #4

0cv opened this issue Jan 24, 2023 · 13 comments

Comments

@0cv
Copy link

0cv commented Jan 24, 2023

given

#[derive(FieldNamesAsArray)]
struct StructA {
   a: String,
   b: StructB
}

struct StructB {
   c: String,
   d: String
}

We shall have this:
assert_eq!(StructA::FIELD_NAMES_AS_ARRAY, ["a", "b.c", "b.d"]);

But at the moment we have:
assert_eq!(StructA::FIELD_NAMES_AS_ARRAY, ["a", "b"]);

@jofas
Copy link
Owner

jofas commented Jan 26, 2023

Hi @0cv,

thanks for the feature request. Yes, flattened field names would be a killer feature for sure. I myself use this crate mostly for constructing queries for document databases like mongodb and elasticsearch. Both allow nested field updates with the "foo.bar.baz" syntax.

Unfortunately I kind of shot myself in the foot when I decided to make FIELD_NAMES_AS_ARRAY an associated constant instead of something more complex done at runtime, like what serde does (which supports flattening like that). So I dropped this feature. Today though I picked the topic back, up given your interest and I think with the advancements in rust's const capabilities, we should be able to pull this off! It will require some major restructuring (including moving the procedural macro into a new struct-field-names-as-array-derive crate) and nightly features though (including #![feature(generic_const_exprs)], which isn't even complete as of today).

My vision would be something like this:

#[derive(FieldNamesAsArray)]
struct StructA {
   a: String,
   #[field_names_as_array(flatten)]
   b: StructB
}

#[derive(FieldNamesAsArray)]
struct StructB {
   c: String,
   d: String
}

fn main() {
   assert_eq!(StructA::FIELD_NAMES_AS_ARRAY, ["a", "b.c", "b.d"]);
}

@0cv
Copy link
Author

0cv commented Jan 26, 2023

There is also a similar issue and some options discussed here https://stackoverflow.com/questions/71175052/how-to-parse-nested-struct-member-variables-in-rust-proc-macro

@jofas
Copy link
Owner

jofas commented Jan 29, 2023

I've started working on enabling nesting, but unfortunately I can't get around a problem I did not foresee. Right now it is impossible to iterate over a constant array, something we need to map the field names of StructB from ["c", "d"] to ["b.c", "b.d"]. I hope once this PR for the #![feature(const_iter)] feature lands, we will be able to proceed.

@0cv
Copy link
Author

0cv commented Mar 20, 2023

Some updates here. I've a solution (for my needs), but it has some breaking changes, also some other features of that crate would not work anymore, so I'm just posting this solution here in case it's helpful to anyone:

Replace derive_field_names_as_array of that crate by the following function

#[proc_macro_derive(
  FieldNamesAsArray,
  attributes(field_names_as_array)
)]
pub fn derive_field_names_as_array(
  input: TokenStream,
) -> TokenStream {
  let input = parse_macro_input!(input as DeriveInput);

  let struct_name = &input.ident;
  let fields = match &input.data {
    syn::Data::Struct(data) => &data.fields,
    _ => panic!("FieldNamesAsArray can only be derived for structs"),
  };

  let mut field_exprs = Vec::new();

  for field in fields.iter() {
    let field_name = field.ident.as_ref().unwrap().to_string();

    if let Some(attr) = field
      .attrs
      .iter()
      .find(|a| a.path.is_ident("field_names_as_array"))
    {
      let nested_struct = &field.ty;
      let flatten = attr.tokens.to_string().contains("flatten");
      if flatten {
        field_exprs.push(quote! { <#nested_struct>::field_names_as_array().iter().map(|s| format!("{}.{}", #field_name, s)).collect::<Vec<_>>() });
      } else {
        field_exprs.push(quote! { vec![#field_name.to_string()] });
      }
    } else {
      field_exprs.push(quote! { vec![#field_name.to_string()] });
    }
  }

  let output = quote! {
      impl #struct_name {
          pub fn field_names_as_array() -> Vec<String> {
              let mut field_names = Vec::new();
              #( field_names.extend(#field_exprs); )*
              field_names
          }
      }
  };

  output.into()
}

Then the previous example works:

#[derive(FieldNamesAsArray)]
struct StructA {
   a: String,
   #[field_names_as_array(flatten)]
   b: StructB
}

#[derive(FieldNamesAsArray)]
struct StructB {
   c: String,
   d: String
}

fn main() {
   assert_eq!(StructA::field_names_as_array(), ["a", "b.c", "b.d"]);
}

Please note, that it's now a function, so it's a little bit different ...

@jofas
Copy link
Owner

jofas commented Mar 20, 2023

Thanks for your solution! It is what I meant by:

Unfortunately I kind of shot myself in the foot when I decided to make FIELD_NAMES_AS_ARRAY an associated constant instead of something more complex done at runtime

We can do the flattening at runtime, but not in a constant context or constant function. I really want FIELD_NAMES_AS_ARRAY to be a constant and not a function at runtime, which feels like too much overhead for such a small feature. Maybe we could release a "bridge" version that accommodates both FIELD_NAMES_AS_ARRAY and field_names_as_array() until Rust's constant evaluation capabilities become sufficient to do flattening with FIELD_NAMES_AS_ARRAY alone.

@0cv
Copy link
Author

0cv commented Mar 20, 2023

Ok, I understand what you mean. I suppose a PR in that crate does not make sense, it's indeed not as good.

In my use case, I've 100s (if not 1000s) of fields which far outweigh the little bit of overhead, and it's a cron job whose performance is very much secondary.

If anyone finds this code useful, then that will be here anyway

@0cv
Copy link
Author

0cv commented Mar 20, 2023

I will stop the spam here, but here is the final (fixed) version to support an Optional struct. Not sure whether there is not something better, because suddenly this was definitely not straightforward anymore

#[derive(FieldNamesAsArray)]
struct StructA {
   a: String,
   #[field_names_as_array(flatten)]
   b: Option<StructB>,
}

#[derive(FieldNamesAsArray)]
struct StructB {
   c: String,
   d: String,
}

fn main() {
   assert_eq!(StructA::field_names_as_array(), ["a", "b.c", "b.d"]);
}
#[proc_macro_derive(
  FieldNamesAsArray,
  attributes(field_names_as_array)
)]
pub fn derive_field_names_as_array(
  input: TokenStream,
) -> TokenStream {
  let input = parse_macro_input!(input as DeriveInput);

  let struct_name = &input.ident;
  let fields = match &input.data {
    syn::Data::Struct(data) => &data.fields,
    _ => panic!("FieldNamesAsArray can only be derived for structs"),
  };

  let mut field_exprs = Vec::new();

  for field in fields.iter() {
    let field_name = field.ident.as_ref().unwrap().to_string();

    if let Some(attr) = field
      .attrs
      .iter()
      .find(|a| a.path.is_ident("field_names_as_array"))
    {
      let field_type = &field.ty;
      let nested_struct = if let Type::Path(type_path) = field_type {
        if type_path.path.segments.last().unwrap().ident == "Option" {
          if let PathArguments::AngleBracketed(arguments) =
            &type_path.path.segments.last().unwrap().arguments
          {
            if let Some(generic_arg) = arguments.args.first() {
              if let syn::GenericArgument::Type(inner_type) =
                generic_arg
              {
                Some(inner_type)
              } else {
                None
              }
            } else {
              None
            }
          } else {
            None
          }
        } else {
          Some(field_type)
        }
      } else {
        None
      };

      let flatten = attr.tokens.to_string().contains("flatten");
      if flatten {
        if let Some(nested_struct) = nested_struct {
          field_exprs.push(quote! { <#nested_struct>::field_names_as_array().iter().map(|s| format!("{}.{}", #field_name, s)).collect::<Vec<_>>() });
        }
      } else {
        field_exprs.push(quote! { vec![#field_name.to_string()] });
      }
    } else {
      field_exprs.push(quote! { vec![#field_name.to_string()] });
    }
  }

  let output = quote! {
      impl #struct_name {
          pub fn field_names_as_array() -> Vec<String> {
              let mut field_names = Vec::new();
              #( field_names.extend(#field_exprs); )*
              field_names
          }
      }
  };

  output.into()
}

@jofas
Copy link
Owner

jofas commented Mar 20, 2023

Again, thanks for sharing your code freely. Much appreciated!

This is my personal opinion, but I think once we start considering generic type parameters like T in Option<T>, we leave the scope of this crate (and the problem it tries to solve) and actually enter XY problem territory. For example, I think that Option<T> should not automatically have all the field names of T. And what were we to do with other generic types like Result<T, E>? Should we consider the field names of T or E?

I think once we want to perform powerful operations like flattening types that are generic parameters and/or are wrapped in an enum, we need to fully commit to the idea that the field names are computed at runtime and for instances, not types. For example, I believe that this code is better at semantically expressing what the field names of Option<T> are:

#[derive(FieldNamesAsArray)]
struct StructA {
   a: String,
   #[field_names_as_array(flatten)]
   b: Option<StructB>,
}

#[derive(FieldNamesAsArray)]
struct StructB {
   c: String,
   d: String,
}

fn main() {
    let foo = StructA {
        a: "hello".to_owned(),
        b: Some(StructB {
            c: "world".to_owned(),
            d: "!".to_owned(),
        }),
   };
   
   let bar = StructA { a: "hello".to_owned(), b: None };

   assert_eq!(foo.field_names_as_array(), ["a", "b.c", "b.d"]);
   assert_eq!(bar.field_names_as_array(), ["a"]); // b is None and None has no fields
}

If we were to do field name computation at runtime like this I actually believe that we can and probably should implement this functionality in a serde Serializer, rather than develop a stand-alone crate. This would allow us to re-use the extensive capabilities of serde and integrate with the greater Rust ecosystem. This could be a cool project. But like I said, I believe this is fundamentally different from what this crate (currently) tries to accomplish.

@0cv
Copy link
Author

0cv commented Mar 20, 2023

I see, that makes sense for serializing

So to give a bit of background as to why I need(ed) this, I use this during the Deserialization phase. Actually, I'm querying a database (Salesforce) which may return null fields and except some ID and system fields, almost all other fields are potentially null, including the fields of (non system) relationship struct - hence Option, also for the struct.

@jofas
Copy link
Owner

jofas commented Mar 20, 2023

Interesting. May I ask how exactly you are using the field names to deserialize a struct? Like I said above, I originally developed this crate to create database queries. I'd love to hear more about your use-case to better understand the possible applications of this crate.

@0cv
Copy link
Author

0cv commented Mar 20, 2023

Actually it's pretty simple. I use the same struct for building my query and retrieving the results:

#[allow(non_snake_case)]
#[derive(Deserialize, Debug, FieldNamesAsArray)]
pub struct CreatedBy {
    pub id: String,
    pub field_random__c: Option<String>,
}

#[allow(non_snake_case)]
#[derive(Deserialize, Debug, FieldNamesAsArray)]
pub struct DcrFromPepSfdc {
    pub id: String,
    pub match_id_mvn__c: Option<String>,
    pub match_criteria_mvn__c: Option<String>,
    #[field_names_as_array(flatten)]
    pub createdby: Option<CreatedBy>,
}

fn main() {
    let fields = DcrFromPepSfdc::field_names_as_array().join(",");
    let query = format!("SELECT {} FROM Account", fields);

    // using Salesforce crate https://github.com/0cv/rust_sync_force/
    let records: QueryResponse<DcrFromPepSfdc> = sfdc_client.query(&query)?;
}

I want to query all fields, but there is usually no guarantee on whether fields are non null, incl. relationship.

Also, I've simplified various aspects, but that's at high level roughly how it works.

@jofas
Copy link
Owner

jofas commented Mar 21, 2023

Thanks for explaining your use-case to me.

That is pretty close to my original use-case. Only difference I see is that you use the fields to retrieve data from an SQL database whereas I originally used the fields array to update documents in a document database.

The more I think about it, the more I'm wondering whether there is a better way to construct queries like this statically from Rust types. struct-field-names-as-array solves only two real problems of mapping a struct to a database query, namely (I) the problem of adding a field to a struct and forgetting to add it to the query, which was the original concern why I build this procedural macro and (II) the problem of writing humongous queries for big structs.

But what about support for multiple queries with different fields from the same struct? And at which point would this crate be yet another query builder? And what about completely different use-cases beyond query construction? I'm not sure how other people use this crate and what their expectations are. All I know is that I like the simplicity. It is a small crate solving a small problem, namely some sort of weak runtime introspection capabilities into the field names of a named struct. While I love the idea of supporting nested field names, I'm not so sure about handling special cases like Option<T> and how this would fit in with the size of struct-field-names-as-array. Maybe it would make more sense to build something around this crate that is more focused on query building?

@0cv
Copy link
Author

0cv commented Mar 21, 2023

Yes, that's clear that one crate can't support all kind of different systems how they retrieve, and update records. For example on Salesforce, there is no such thing as a SQL update statement to update records. Instead it's a REST API with a JSON payload, so in that situation I simply use the Serde Serializer and it's just right.

That crate is ideal if one doesn't need nested structs. I've published the code I've written above in another crate. If you find a magical way to support (optional) nested struct on compilation, I will deprecate my crate but .. it's probably not a high priority, I think my crate is just solving an edge case of an edge case 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants