Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SynonymMap model should take IEnumerable<string> for synonyms #10725

Closed
heaths opened this issue Mar 19, 2020 · 14 comments · Fixed by #11394
Closed

SynonymMap model should take IEnumerable<string> for synonyms #10725

heaths opened this issue Mar 19, 2020 · 14 comments · Fixed by #11394
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Search
Milestone

Comments

@heaths
Copy link
Member

heaths commented Mar 19, 2020

The generated model for SynonymMap takes a single string for synonyms, which the user would then have to pass with newlines embedded. This "magic formula" approach is something we try to avoid and just make it obvious and idiomatic. Guidelines recommend instead taking an IEnumerable<string> for the parameter and perhaps define the property as an IList<string> to allow adding to it. The de/serializer should handle splitting and joining with newlines accordingly.

/cc @brjohnstmsft

@heaths heaths added Search Client This issue points to a problem in the data-plane of the library. labels Mar 19, 2020
@heaths heaths added this to the [2020] April milestone Mar 19, 2020
@heaths heaths self-assigned this Mar 19, 2020
@brjohnstmsft
Copy link
Member

This sounds reasonable to me.

@Yahnoosh, can you think of any reason why we might need to keep the synonym map body represented as a single string on the client?

@heaths
Copy link
Member Author

heaths commented Mar 20, 2020

We could provide the transformed string as wel (though I see little value until we need it) since we'll have to eventually de/serialize it anyway. Making callers do that themselves is not very dev-friendly nor idiomatic.

@brjohnstmsft
Copy link
Member

My only concern is that we might add synonym map formats in the future that aren’t line-oriented. I’m not the SME for synonyms, so I’ll check and get back to you.

@heaths
Copy link
Member Author

heaths commented Mar 20, 2020

That would be a REST breaking change, since they are already newline delimited. I would suggest a new model type, per our REST guidelines, which we'd need to support in our SDK then anyway.

@brjohnstmsft
Copy link
Member

There's a format parameter that currently only accepts one value. The docs say the map itself is newline-delimited regardless, but I'm not sure if that's accurate. If we add a new format that uses a different delimiter, it wouldn't technically be a breaking change.

@heaths
Copy link
Member Author

heaths commented Mar 20, 2020

I wouldn't add a newline for a single value, so that would still work fine.

Changing implementation without changing the API is still a breaking change. If customers had been using newlines and that delimiter changed, their code would break, right? You'd need a new model or, at the very least, change the api-version - either of which we can change the delimiter in code.

/cc @johanste who's been working with service teams to identify and mitigate breaking changes.

@johanste
Copy link
Member

Can you add a link to the specific model/REST API that we are considering changing?

@johanste
Copy link
Member

In general, if we are being additive for inputs only, then it is generally safer than additive for outputs. But details matter.

@brjohnstmsft
Copy link
Member

If the delimiter depends on the value given for the format property, then adding a new format with a different delimiter is not a breaking change because customers would have to opt in to the new format.

There’s also the possibility that customers have existing synonym maps as text files and may want to upload them as-is instead of constructing them programmatically line by line.

The bottom line is, none of us are SMEs on this topic, and further speculation is pointless until I have a chance to sync with more people on my team. 🙂

@brjohnstmsft
Copy link
Member

Talked with @Yahnoosh about this. Here's the summary:

  • Synonym maps are usually authored as text files, and often imported from another system.
  • The REST API documentation is misleading about newline always being the separator -- that actually depends on the format. For now, we only support the SOLR format, and newline is the delimiter for that format.
  • Stream makes sense as a way to ingest synonym map files on Create/CreateOrUpdate. However, the single-string property in the SynonymMap object should also be an option, if for no other reasons that demos/samples and the Get/List APIs.

Please let me know if you have any questions.

@johanste
Copy link
Member

If the delimiter depends on the value given for the format property, then adding a new format with a different delimiter is not a breaking change because customers would have to opt in to the new format.

Based on the fact that the synonym map can be returned to older clients (per https://github.com/Azure/azure-rest-api-specs/blob/8919f8072121d11f0bb61f1c5d02f2f4063d4f48/specification/search/data-plane/Microsoft.Azure.Search.Service/preview/2016-09-01-preview/searchservice.json#L704), it is very likely a change that requires a new API version per Azure's versioning rules.

Let's follow up ones we have more specifics on the proposed change.

@brjohnstmsft
Copy link
Member

@johanste True, I forgot about the round-tripping rule.

@heaths
Copy link
Member Author

heaths commented Mar 20, 2020

What about taking a Stream + format, or either a string or array of strings with SOLR assumed? Looking through a few examples, there are some that pass \n like https://docs.microsoft.com/en-us/rest/api/searchservice/create-synonym-map#-examples--.

public class SynonymMap
{
  public SynonymMap(string format, Stream stream, Encoding encoding = null);
  public SynonymMap(string value) : this(new[] { value });
  public SynonymMap(IEnumerable<string> value);

  public string Name { get; }
  public string Format { get; }
  public string Value { get; }
  public EncryptionKey EncryptionKey { get; set; }
  public string ETag { get; set; }
}

Optionally, if text is always expected (safe assumption, or not?), we could take in a TextReader and not have to worry about encoding directly (the TextReader implementation would).

Either way, the content would be decoded into a string for Value and sending across the wire. I do question whether there's much value in the second constructor above, though.

@brjohnstmsft
Copy link
Member

@heaths Having a constructor that takes a Stream or TextReader makes sense. I think it's a pretty safe assumption that it will always be text, so TextReader is probably fine.

Taking format as a parameter is problematic right now because there is only one allowable value (it should be an enum and not a string BTW). We hide it in Track 1 because it's pointless for users to have to pass the same value all the time. If we can add it as an additional parameter (or constructor overload) later, IMO we can leave it out for now.

I would actually question the value of the third constructor rather than the second. If the map isn't coming from a file, it's probably being pasted from somewhere else. Constructing it programmatically line-by-line is useful for samples, but that's about it. CC @Yahnoosh who can back me up on this.

heaths added a commit to heaths/azure-sdk-for-net that referenced this issue Mar 26, 2020
Also prototypes SynonymMap changes discussed in Azure#10725
heaths added a commit to heaths/azure-sdk-for-net that referenced this issue Mar 31, 2020
Also prototypes SynonymMap changes discussed in Azure#10725
heaths added a commit that referenced this issue Apr 1, 2020
* Initial SearchIndex creation model changes

Also prototypes SynonymMap changes discussed in #10725

* More model changes and tests for create/get index

* Resolve PR feedback

* Update Search APIs
@heaths heaths modified the milestones: [2020] April, [2020] May Apr 6, 2020
heaths added a commit to heaths/azure-sdk-for-net that referenced this issue Apr 17, 2020
heaths added a commit that referenced this issue Apr 17, 2020
* Remove IEnumerable<string> ctor from SynonynMap

Fixes #10725

* Used shared index to reduce resource utilization

* Updated APIs and CHANGELOG
@github-actions github-actions bot locked and limited conversation to collaborators Mar 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Search
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants