Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct update dsl #83

Closed
wants to merge 3 commits into from
Closed

Conversation

tjcsrc16
Copy link
Contributor

Issue #, if available:

Description of changes: This adds a builder DSL for copying StructElements while changing / adding new fields. Dropping fields is currently not implemented, but can be added. See test/com/amazon/ionelement/demos/kotlin/StructUpdateDemo.kt for example usage.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Jun 29, 2023

Codecov Report

❗ No coverage uploaded for pull request base (master@247c128). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff            @@
##             master      #83   +/-   ##
=========================================
  Coverage          ?   88.96%           
  Complexity        ?      459           
=========================================
  Files             ?       31           
  Lines             ?     1015           
  Branches          ?      150           
=========================================
  Hits              ?      903           
  Misses            ?       68           
  Partials          ?       44           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@popematt popematt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some good ideas for enhancements here.

I think there are actually two distinct features you are proposing—path-based accessors and a convenient way to update ContainerElements. I think they are sufficiently distinct that they can be done in separate PRs, and I think it will be easier to make incremental progress if we split them up.

Some of my comments might seem like big changes. Don't assume that it has to be done my way. I'm really just trying to get a discussion going to make sure that we've considered alternatives and their tradeoffs.

public data class Index(val index: Int) : IonPathElement()

/**
* Fetch an element from a nested IonElement using the [path] to traverse struct fields and [SeqElement] indices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let me start by saying that this is a useful enhancement—based on my prior experience, it's clearly something that people want.

Can you document what the caller should expect for failure cases (i.e. an element is the wrong type, a field name does not exist, or an index is out of bounds)?

What happens if there is more than one element matching the path? E.g.:

// Given a path of ["foo", 0] we could find 1, a, or both 1 and a
{
  foo: [1, 2, 3],
  bar: true,
  foo: (a + b),
}

Would you consider adding a getPathOrNull function—similar to how the Kotlin Standard Library first() and firstOrNull()?

What would you think of adding a type parameter to the function so that you can get a specific type of element from the given path? For example:

val mySymbol = someElement.getPath<SymbolElement>(somePath)
val anyElement = someElement.getPath<AnyElement>(someOtherPath)

Copy link
Contributor Author

@tjcsrc16 tjcsrc16 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getPath was left over from some other experiments I was doing and I think it makes sense to move to another PR.

What happens if there is more than one element matching the path?

There should be separate functions for getting the first value and getting all values, similar to the existing API for StructElement.

Would you consider adding a getPathOrNull function—similar to how the Kotlin Standard Library first() and firstOrNull()?

Yes, makes sense.

What would you think of adding a type parameter to the function so that you can get a specific type of element from the given path?

Great idea, this seems idiomatic. If the result does not match up I suppose an IonElementConstraintException should be thrown similar to the as* functions.

Comment on lines +8 to +29
/**
* An [IonPath] represents a traversal into a nested IonElement made of [StructElement]s and [SeqElement]s
*/
public class IonPath(public val elements: List<IonPathElement>) : Collection<IonPathElement> by elements {
public constructor(vararg elements: IonPathElement) : this(elements.asList())

public fun tail(): IonPath {
return IonPath(elements.drop(1))
}
}

public sealed class IonPathElement

/**
* Traverse into a nested [StructElement] by its field name.
*/
public data class Field(val fieldName: String) : IonPathElement()

/**
* Traverse into a nested [SeqElement] by its numerical index.
*/
public data class Index(val index: Int) : IonPathElement()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, let me say that this is a good enhancement idea. I have seen similar things in multiple applications I've worked on in the past, so clearly this is something that Ion users want. Thank you!

Here are some thoughts, in no particular order.

There is already an ion-java-path-extraction library. It would be a good idea if we could come up with a name other than IonPath. I don't think there's any particular reason it needs to have Ion in the name, so something like TreePath or ElementPath could work.

There is nothing about your IonPath that is inherently tied to ion-element-kotlin—would you consider contributing it to ion-java instead? The ion-java repo is base for all of our JVM libraries, so having the path in ion-java would make it easier to write things in the future that use these paths and are interoperable with each other.

Have you considered any other ways of modeling it? If so, what made you choose this particular way?
For example, I've tried to consider other possible ways of modeling it, and came up with this.

sealed class TreeLocation {
    data class Path(public val components: List<TreeLocation>): TreeLocation()
    data class Field(public val fieldName: String): TreeLocation()
    data class Index(public val index: Int): TreeLocation()
}

I think the benefit of something like this is that it allows users to interchangeably provide a single index, a single field name, or a path. However, it would make the logic to interpret the path a little more complicated. (Just to be clear, I don't think that this is necessarily a better or worse way to model the path. I'm just trying to stimulate some discussion about the design decisions.)

Alternately, we could try to reuse the ion-java-path-extraction path syntax. That could be a really nice feature because it allows you to select multiple matching things, or match conditionally if a certain annotation is present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I've tried to consider other possible ways of modeling it, and came up with this.

sealed class TreeLocation {
    data class Path(public val components: List<TreeLocation>): TreeLocation()
    data class Field(public val fieldName: String): TreeLocation()
    data class Index(public val index: Int): TreeLocation()
}

I was planning on changing it to this pattern because I don't think there is a need to wrap the list of path elements into another class at the moment. One reason to wrap it would be to encapsulate path manipulation functions, but I haven't thought that far ahead yet and they could always be added as extension functions to List<TreeLocation>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware of ion-java-path-extraction (I had already planned on extending this representation to sexps too). So are you suggesting to allow fetching from an IonElement using a PathExtractor? I think that makes sense if there is already a well known way to represent paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you suggesting to allow fetching from an IonElement using a PathExtractor?

Yes—more or less. It would be nice to see if we can have some similar/familiar APIs across multiple libraries.

I think that the path syntax/model defined in that library could be used to fetch child IonElements from a container. That would fulfill the same use case that you are trying to solve with this PR.

It would also be interesting to be able to extract IonElements directly off of an IonReader (as in ion-java-path-extraction), but I am not suggesting that you should do that. (It would be a big scope increase.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but just to be clear, I don't think ion-element-kotlin should take a dependency on ion-java-path-extraction. If we want to try to have a unified path extractor API, the classes to model the path should probably be implemented in ion-java.

Comment on lines +44 to +49
/**
* Implement [] operator for fetching a nested element
*/
public operator fun IonElement.get(path: IonPath): IonElement {
return this.getPath(path)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hesitant to include an overloaded operator for this particular case because I think people will expect it to behave like accessing a List or Map—which is tricky because a Map will return null if a key is not found, but a List will throw an exception if the index is out of bounds. My personal opinion is just to pass on the operator fun rather than implement something that could be surprising.

Copy link
Contributor Author

@tjcsrc16 tjcsrc16 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I don't have a strong opinion about this.

Comment on lines +64 to +68
// Determines whether new fields are added to the struct or replace existing fields of the same name
public enum class FieldUpdateMode {
REPLACE,
ADD
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the idea of having a "default mode". Since there's only two modes here, I think we're better off just having separate functions for the two cases to be explicit about them. Maybe set() and add()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue is that the this["foo"] = ... syntax in the builder body will have to choose to be add or replace.

// Gets a field, including fields that may have been added by current builder
public operator fun get(fieldName: String): AnyElement {
return fieldUpdates[fieldName]?.first()
?: constraintError(sourceStruct, "Required struct field '$fieldName' missing")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to throw an exception here, then we must include a function to check whether a field name is present. Also, given that this is a builder (i.e. the thing we're building is expected to be incomplete), I'm not sure whether it's a good idea to throw an exception in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mimics the behavior of the get operator for structs. Are you saying this should return null, or are you suggesting there should be some sort of lazy evaluation that happens at build time?

Even for lazy evaluation this would still be an error (or null) unless we do not preserve the order of operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's consistent with structs, then we can keep it the way it is. However, we must include a contains function so that you can check if a field exists before you try to get it.

Or maybe I'm misunderstanding how this builder is supposed to work?

import kotlin.test.assertEquals
import org.junit.jupiter.api.Test

class StructUpdateDemo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a Java demo as well so we can make sure the API works for Java?

}

/**
* A [StructUpdateBuilder] builder implements a typesafe builder that allows updating a [StructElement] by adding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to typesafe builder or just the generic concept of a builder that is typesafe in some way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was referring to that link, basically the concept of using lambda with receiver to make a builder body. Maybe, I am misusing the terminology.

Comment on lines +59 to +62
private val fieldUpdates: MutableMap<String, MutableList<AnyElement>> =
sourceStruct.fields.groupBy({ it.name }, { it.value })
.mapValues { it.value.toMutableList() }
.toMutableMap()
Copy link
Contributor

@popematt popematt Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's a multimap behind the scenes here, I have to ask—how much value do you think there is in having a StructUpdateBuilder class vs. just using a multimap (or multimap-like class)?

Or, instead of a traditional builder, what do you think of something like this?

public interface MutableStructFields: Iterable<StructField> {
    public operator fun set(name: String, value: IonElement)
    public fun add(name: String, value: IonElement)
    public operator fun get(key: String): MutableCollection<AnyElement>
    public fun remove(key: String)
    public operator fun contains(key: String): Boolean
    public fun compute(name: String, operation: (AnyElement?) -> IonElement)

    // At a later date, add things like these:
    // public fun putIfAbsent(name: String, value: IonElement): Boolean
    // public operator fun plusAssign(field: StructField)
    // public operator fun plusAssign(field: Pair<String, IonElement>)
    // public fun removeIf(predicate: (String, AnyElement) -> Boolean)
    // public fun getOrPut(name: String, provider: () -> IonElement): List<AnyElement>
    // public fun getOrPut(name: String, default: IonElement): List<AnyElement>
}

Because it implements Iterable<StructField>, it can be used with the ionStruct factory function, and it could be used with a copy function as well. E.g.:

val newStruct = myStruct.copy(myStruct.mutableFields.apply {
    compute("documentRevision") { it + 1 }
    set("lastUpdatedDate", Instant.now().toTimestampElement())
    // ...
})

I think this might also be a little more Java-friendly than having an update DSL.

Copy link
Contributor Author

@tjcsrc16 tjcsrc16 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, I will have to play around with it bit. I would want to make sure that within a group of operations, only single copy is made with all of the changes, instead of a copy for each change, which the current builder ensures.

I was also thinking, that since StructElement and SeqElement already use PersistentMap and PersistentList we should be able to expose those to leverage copy-on-write semantics, again as long as multiple updates only create a single copy. I just wasn't sure how receptive you would be to changing the internals too much.

Copy link
Contributor

@popematt popematt Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would want to make sure that within a group of operations, only single copy is made with all of the changes, instead of a copy for each change, which the current builder ensures.

I can elaborate a bit more. The implementation of MutableStructFields could use a MutableMap or a MutableList behind the scenes, so we would create one shallow copy when we create the MutableStructFieldsImpl instance, and then one shallow copy when we pass the instance to the constructor or copy() function.

This is fundamentally similar to what I suggested for SeqElement. The mutableStructFields could return a MutableList<StructField>, but that would make it more difficult for users to do simple things like update an existing field. It could return a MutableMap<String, MutableList<IonElement>>, but that's awkward to have to deal with the list of values. The MutableStructFields is just a wrapper around a mutable collection that makes it easier to perform useful operations on the fields.

I was also thinking, that since StructElement and SeqElement already use PersistentMap and PersistentList we should be able to expose those to leverage copy-on-write semantics, again as long as multiple updates only create a single copy. I just wasn't sure how receptive you would be to changing the internals too much.

I'm open to changing the internals, but I'm hesitant to expose the persistent collections because that has the potential to limit our ability to change implementation details in the future. We would want to be really sure before we commit to changing the persistent collections from being an implementation detail to being part of the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while, but after coming back, I really like the mutableFields approach. The interface exposes the data in a writeable way and doesn't leak the internals. Is someone currently working on implementing this?

Copy link
Contributor

@popematt popematt Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, no one is implementing it right now. Since it's a fairly thin layer built on top of ion-java, generally ion-element-kotlin does not get feature development unless someone requests/contributes a feature.

Edit—just to be clear, ion-element-kotlin does benefit from many of the features and improvements in ion-java, and in no way am I suggesting that ion-element-kotlin is unmaintained.

@@ -362,6 +362,8 @@ public interface SeqElement : ContainerElement {
override fun withMetas(additionalMetas: MetaContainer): SeqElement
override fun withMeta(key: String, value: Any): SeqElement
override fun withoutMetas(): SeqElement

public fun withValueAtIndex(index: Int, value: IonElement): SeqElement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on just adding an overload of copy() instead of this function? It would be slightly more verbose, but it means that you get all of the operations of MutableList without having to rewrite them for ListElement.

E.g.:

public fun copy(
    values: Iterable<IonElement>, 
    annotations: List<String> = this.annotations, 
    metas: MetaContainer = this.metas
)

Usage:

val newList = myList.copy(myList.valuesAsMutable.apply { 
    set(1, myOtherElement) 
    removeAt(0)
})

Or in Java:

List<AnyElement> values = myList.getValuesAsMutable();
values.set(1, myOtherElement);
values.removeAt(0);
ListElement newList = myList.copy(values);

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would prefer to move this behavior to copy.

@@ -450,4 +452,7 @@ public interface StructElement : ContainerElement {
override fun withMetas(additionalMetas: MetaContainer): StructElement
override fun withMeta(key: String, value: Any): StructElement
override fun withoutMetas(): StructElement

/** Copy the struct, replacing all occurrences of fields with names that match the provided fields **/
public fun withFields(vararg fields: StructField): StructElement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Would you consider overloading the copy() function?

@popematt
Copy link
Contributor

Closing this since it was mostly superseded by #84. If you still want to contribute path-based accessors, you can re-open this or (probably easier) start a new PR.

@popematt popematt closed this Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants