Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search][BUG] Wrong dynamic for empty array of string #8699

Closed
fabiomaulo opened this issue Nov 9, 2019 · 13 comments
Closed

[Search][BUG] Wrong dynamic for empty array of string #8699

fabiomaulo opened this issue Nov 9, 2019 · 13 comments
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Search
Milestone

Comments

@fabiomaulo
Copy link

fabiomaulo commented Nov 9, 2019

Having two fields declared as

new Field("tags", DataType.Collection(DataType.String)) {IsRetrievable=true, IsFilterable=true, IsSortable=false, IsFacetable=false, IsSearchable=true, Analyzer= AnalyzerName.EsMicrosoft },
new Field("tagsModelos", DataType.Collection(DataType.String)) {IsRetrievable=true, IsFilterable=true, IsSortable=false, IsFacetable=false, IsSearchable=true, Analyzer= AnalyzerName.EsMicrosoft },

And in Azure correctly created
image

The search result returs an array of string when the array has some elements and an empty array of object when the array is empty.
image

The bug was found in version 10.1.0 actualizing from 3.x so... I don't know which is the exact version where the bug was introduced.

@triage-new-issues triage-new-issues bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Nov 9, 2019
@fabiomaulo
Copy link
Author

fabiomaulo commented Nov 9, 2019

The applied workaround is

// workaround BUG introduced by AzureSearch 10 package. 
// The empty array of strings is returned as array of objects
// reported: https://github.com/Azure/azure-sdk-for-net/issues/8699
var t = ((source.tags as IEnumerable) ?? new object[0])
		.Cast<object>().Where(x=> x!=null).Select(x=> x.ToString()).ToArray();
var tm = ((source.tagsModelos as IEnumerable) ?? new object[0])
		.Cast<object>().Where(x=> x!=null).Select(x=> x.ToString()).ToArray();

@fabiomaulo fabiomaulo changed the title [BUG] Wrong dynamic for empty array of string [Search][BUG] Wrong dynamic for empty array of string Nov 9, 2019
@brjohnstmsft
Copy link
Member

This behavior is by design.

In the dynamic case, the SDK has no information about what data type is contained within the collection. In previous versions, the only collection type that was supported was Collection(Edm.String), but now that collections of arbitrary type are supported, there is no way to know what the correct type should be for an empty collection. It is less likely to violate the Principle of Least Surprise to use the most expansive type (object[]) rather than a type that could be incorrect (string[], which would be incorrect if the field is an integer collection, for example).

If your application is aware of the types of these fields, then I'd recommend using your own model type as this will guide deserialization and always result in the correct .NET type (assuming you've defined your model type correctly).

We do not plan to change this behavior, especially since there is a workaround, so I'm closing this issue.

@fabiomaulo
Copy link
Author

"by design", really?
I'm curious to see the tests where you are checking the "design".
Inside those tests you should define an index with an array of strings, then you should store an empty array of string and, in the assert, you should check that the response of the document contains an empty array of objects (object[]).

Why you are constraining a user to define a type of a collection to then check the type when the user store something but, in the response, you are returning an aubergine?

In OSS "by design" is not a magic word to avoid bug fix.

BTW make what your professionalism say you.
Be the force with you!

@brjohnstmsft brjohnstmsft reopened this Nov 13, 2019
@brjohnstmsft
Copy link
Member

@fabiomaulo My apologies, I should not have closed the issue without further discussion. I've re-opened it until we can reach a consensus.

I'm curious to see the tests where you are checking the "design".

The test for this case is here:

There are other tests in the same file that pin other behaviors that similarly aren't ideal and are caused by the lack of type information.

This behavior is documented here.

Why you are constraining a user to define a type of a collection to then check the type when the user store something but, in the response, you are returning an aubergine?

I will explain in more detail why this doesn't work the way you expect. I will also list the alternatives we considered.

In the .NET SDK, there are two sets of data retrieval methods: Those that take a generic type parameter, and those that don't. Both sets of methods use JSON.NET to deserialize the JSON responses returned by Azure Cognitive Search. In the case of the generically-typed methods, the type you pass is supposed to correspond to your index definition, and it's used to deserialize each JSON property in the response to its correct type. However, for the methods that don't take a type parameter, there is no type information available to determine the actual type of the data in the response. This is because the REST API does not return fully self-describing responses. For this reason, the implementation currently has to make some assumptions. For empty collections, the assumption is that it's less surprising to receive object[] for say, Collection(Edm.Int32), than it is to receive string[] in that case.

This is not ideal behavior, so we considered some ways to fix it:

  1. Change the REST API to include type information that would disambiguate situations like this. This is an expensive change for us to make, and to date this hasn't been a priority for our customers, so we've focused on other work to improve the service instead.
  2. Use Index instances on the client-side to get type information for deserialization. This could be done either by requiring the Index as a parameter, which would be inconvenient, or by transparently fetching and caching index definitions on the client, which has a few problems:
    1. Performance: On cache misses, it could greatly increase latency. Generally it's a design goal of the Azure SDKs to have predictable performance, and not hide multiple calls to the REST API within a single method unless there's a very compelling reason.
    2. Consistency: It's possible the index definition will change (for example, if the index is deleted and recreated between queries, something a surprising number of our customers do periodically), making the client type information incorrect anyway. It's hard to detect this case without expensive polling.

So while it's possible to fix this, none of the fixes are easy and they all have their own drawbacks. Also, workarounds are possible.

If you have an idea for fixing this that is simpler and more robust than what I've mentioned, we would be happy to consider it.

@brjohnstmsft brjohnstmsft self-assigned this Nov 13, 2019
@triage-new-issues triage-new-issues bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Nov 13, 2019
@fabiomaulo
Copy link
Author

Hi Bruce.
The force with you is.
Thanks for the feedback.

The problem here is something I saw even in others SDKs (IIRC docDb) and, IMO, is a conceptual error: don't touch the dynamic result ;).

You give us a way to work with strongly-typed DTO/entities and with dynamic-objects;
when we use the dynamic object (and here I should explain to you the reason but it is for a call) you should avoid obscuring the real nature of the underlining object returned by the API. This mean that you should return the result o the JSON.NET deserialization or a transformation of the deserialization maintaining just a part of the JSON.NET result. You shouldn't try to transform it into another object.
In the Search-SDK case, you shouldn't return a DocumentSearchResult<Document>, instead you should return a DocumentSearchResult<dynamic>. This way you are giving us a way to understand what the underlying object is and so a way to work with its power.
Receiving a real dynamic I can see that it is a JSON.NET and I can deal with it:

class MyArrays
{
	public int[] Int { get; set; } = null;
	public DateTime[] Dates { get; set; } = new DateTime[0];
	public string[] Strings { get; set; } = new string[0];
}
static void Main(string[] args)
{
	var sedialized = Newtonsoft.Json.JsonConvert.SerializeObject(new MyArrays());
	dynamic deserialized = Newtonsoft.Json.JsonConvert.DeserializeObject(sedialized);
	var mypropAsStrings = deserialized.Dates.ToObject<string[]>();
	var mypropAsInt = deserialized.Strings.ToObject<int[]>();
	var mynullarray = deserialized.Int?.ToObject<int[]>();
	Console.WriteLine($"{mypropAsStrings.GetType()} {mypropAsStrings.Length}");
	Console.WriteLine($"{mypropAsInt.GetType()} {mypropAsInt.Length}");
	Console.WriteLine($"{mynullarray?.GetType()} {mynullarray?.Length}");
	Console.ReadLine();
}

In practice: don't try to transform the dynamic to a Dictionary<string, object> when what a need/want is dynamic. Give us the dynamic you have in your hands and that is all ;) .

I'll happy to read your opinion about DocumentSearchResult<dynamic> and/or to talk with you if you need some explication about the reason I'm using a dynamic instead a strongly-typed DTO to read/write the index.

P.D. just something to transform the dynamic without know what it contains: https://gist.github.com/fabiomaulo/d03700a8ea1ec99d8725058cc67b10a7

@brjohnstmsft
Copy link
Member

@fabiomaulo Thanks for the explanation -- this makes sense now. The fact that you can convert an empty JArray to any type you want seems to be the missing piece for your scenario.

This feedback comes at a good time, because we're starting to think about the next generation of the .NET SDK. Several months ago, new Azure API Guidelines were announced, and we plan to re-engineer our .NET SDK to be compliant with these guidelines (although there is no timeline for this yet).

One complicating factor is that JSON.NET is being deprecated in favor of the new System.Text.Json library. So far it looks very low-level in comparison; it remains to be seen whether it will move toward feature parity with JSON.NET.

In the meantime, I'll keep this issue open to track this scenario and I'll run it by the Azure API review board. Please don't expect any short-term updates though -- this is very much a long-term issue.

@fabiomaulo
Copy link
Author

Thanks.
I really appreciate your attitude: the real point behind an issue is to recognize and fix the problem and not close the ticket.

@AlexGhiondea AlexGhiondea added the Client This issue points to a problem in the data-plane of the library. label Feb 11, 2020
@AlexGhiondea AlexGhiondea assigned tg-msft and unassigned AlexGhiondea Feb 11, 2020
@AlexGhiondea AlexGhiondea added this to the [2020] March milestone Feb 18, 2020
@tg-msft tg-msft modified the milestones: [2020] April, [2020] May Apr 6, 2020
@tg-msft tg-msft modified the milestones: [2020] May, [2020] June May 8, 2020
@tg-msft
Copy link
Member

tg-msft commented Jun 8, 2020

For Azure.Search.Documents, we're keeping everything as object[] until you ask for it as another type of array. It's the same as keeping "NaN" a string until you ask for it as a double. Once we have proper dynamic support in Azure.Core.Experimental, you should be able to cast to whatever you want and it'll "just work." Moving this to the backlog for now and we'll close it out once we've got the end-to-end story in place.

@tg-msft tg-msft removed this from the [2020] June milestone Jun 8, 2020
@tg-msft tg-msft added this to the Backlog milestone Jun 8, 2020
@azure-sdk azure-sdk added the customer-reported Issues that are reported by GitHub users external to the Azure organization. label Sep 24, 2020
@jsquire jsquire added question The issue doesn't require a change to the product in order to be resolved. Most issues start as that bug This issue requires a change to an existing behavior in the product in order to be resolved. and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 25, 2021
@AlexGhiondea
Copy link
Contributor

We have support for this now. @Mohit-Chakraborty let's try and get a small sample written for this.

@Mohit-Chakraborty
Copy link
Contributor

Mohit-Chakraborty commented Dec 6, 2021

We have an example of extracting properties by name and casting them to custom types here.

@Mohit-Chakraborty
Copy link
Contributor

Closing this issue. Please let us know if you need further assistance or want to continue the discussion.

@fabiomaulo
Copy link
Author

@Mohit-Chakraborty thanks I'll try it hopefully soon. We are in the process of converting all our prjs from net472+net22+net31 -> NET6 (obviously the problem are public-webs prjs).

@fabiomaulo
Copy link
Author

Since the SearchDocument has methods such as GetBoolean, GetBooleanCollection, ad so on I have, at least, a way to ask for correct conversion.
You can close the issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Search
Projects
None yet
Development

No branches or pull requests

8 participants
@fabiomaulo @jsquire @tg-msft @brjohnstmsft @Mohit-Chakraborty @AlexGhiondea @azure-sdk and others