Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CONCAT of LIKE with percentage (%) literal fails for null values #3041

Closed
SimSonic opened this issue Jun 27, 2023 · 27 comments
Closed

CONCAT of LIKE with percentage (%) literal fails for null values #3041

SimSonic opened this issue Jun 27, 2023 · 27 comments
Assignees
Labels
in: repository Repositories abstraction type: bug A general bug

Comments

@SimSonic
Copy link

SimSonic commented Jun 27, 2023

Hello! I have such repository method (simplified):

@Query("""
        SELECT pe AS profile,
               device AS profileDevice
        FROM ProfileEntity AS pe
        LEFT JOIN ProfileDeviceEntity AS device
            ON device.profile = pe
        WHERE (pe.lastSeen              IS NULL OR (pe.lastSeen >= :#{#filter.dateStart} AND pe.lastSeen <= :#{#filter.dateEnd}))
          AND ( :#{#filter.id}          IS NULL OR pe.id              = :#{#filter.id} )
          AND ( :#{#filter.login}       IS NULL OR pe.login        LIKE :#{#filter.login}% )
          AND ( :#{#filter.preset}      IS NULL OR pe.preset       LIKE :#{#filter.preset}% )
          AND ( :#{#filter.lastName}    IS NULL OR pe.lastName        = :#{#filter.lastName} )
          AND ( :#{#filter.firstName}   IS NULL OR pe.firstName       = :#{#filter.firstName} )
          AND ( :#{#filter.application} IS NULL OR device.application = :#{#filter.application} )
          AND ( :#{#filter.platform}    IS NULL OR device.platform    = :#{#filter.platform} )
          AND ( device.lastSeen         IS NULL OR device.lastDevice IS TRUE )
        ORDER BY pe.id DESC
        """)
Page<AdminProfileProjection> profilesForJournal(Pageable pageable, ProfileJournalFilter filter);

Columns are defined simply:

image

image

After I upgraded Spring Boot from 2.7.12 to 2.7.13 I've received failing test which calls this method. The error:

Caused by: org.postgresql.util.PSQLException: ERROR: operator does not exist: character varying ~~ bytea 
  Hint: No operator matches the given name and argument types. You might need to add explicit type casts. 
  Position: 1805 
 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse([QueryExecutorImpl.java:2713](https://queryexecutorimpl.java:2713/))
...

I've discovered that this is caused by this feature: #2939

How can I fix query? Or this is implementation bug?

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jun 27, 2023
@mp911de
Copy link
Member

mp911de commented Jun 27, 2023

Can you please attach the SQL that is generated from the query?

@SimSonic
Copy link
Author

Can you please attach the SQL that is generated from the query?

A little bit cropped SQL:

select profileent0_."id"                   as id1_64_0_,
       profiledev1_."id"                   as id1_68_1_,
       profileent0_."blocked"              as blocked5_64_0_,
       profileent0_."coordinates_timezone" as coordina6_64_0_,
       profileent0_."coordinates_updated"  as coordina7_64_0_,
       profileent0_."login"                as dess_log9_64_0_,
       profileent0_."first_name"           as first_n11_64_0_,
       profileent0_."gps_timezone"         as gps_tim14_64_0_,
       profileent0_."last_name"            as last_na15_64_0_,
       profileent0_."last_seen"            as last_se16_64_0_,
       profileent0_."latitude"             as latitud17_64_0_,
       profileent0_."longitude"            as longitu18_64_0_,
       profileent0_."preset"               as preset20_64_0_,
       profiledev1_."application"          as applicat2_68_1_,
       profiledev1_."version"              as version3_68_1_,
       profiledev1_."auth_domain_type"     as auth_dom4_68_1_,
       profiledev1_."created"              as created5_68_1_,
       profiledev1_."installment_id"       as installm6_68_1_,
       profiledev1_."last_device"          as last_dev7_68_1_,
       profiledev1_."last_seen"            as last_see8_68_1_,
       profiledev1_."mobile_platform"      as mobile_p9_68_1_,
       profiledev1_."platform_version"     as platfor10_68_1_,
       profiledev1_."profile_id"           as profile11_68_1_
from "e2e_public"."profiles" profileent0_
         left outer join "e2e_public"."profiles_devices" profiledev1_ on (profiledev1_."profile_id" = profileent0_."id")
where (profileent0_."last_seen">=? and profileent0_."last_seen"<=? or profileent0_."last_seen" is null)
  and (? is null or profileent0_."id"=?)
  and (? is null or profileent0_."login" like (?||'%'))
  and (? is null or profileent0_."preset" like (?||'%'))
  and (? is null or profileent0_."last_name"=?)
  and (? is null or profileent0_."first_name"=?)
  and (? is null or profiledev1_."application"=?)
  and (? is null or profiledev1_."mobile_platform"=?)
  and (profiledev1_."last_seen" is null or profiledev1_."last_device" = true)
order by profileent0_."id" DESC
limit ?

so this lines are under question:

  and (? is null or profileent0_."login" like (?||'%'))
  and (? is null or profileent0_."preset" like (?||'%'))

@SimSonic
Copy link
Author

2.7.12 generates this:

image

@mp911de
Copy link
Member

mp911de commented Jun 27, 2023

Thanks a lot. #{…}% are two parameters; therefore we went for concatenation. https://www.postgresql.org/docs/15/sql-syntax-calling-funcs.html calls out that parametrized concatenation works. Let us investigate a bit further to check what causes that the value is sent as bytea.

@gregturn
Copy link
Contributor

Just to be clear, we used to take the wildcard and alter the input parameter by applying it.

Thus, LIKE :#{#filter.preset}% would be expanded by SpEL into LIKE :preset% (I'm not sure what your SpEL expression ACTUALLY expands to, but I'm just assuming preset), which would then be turned into LIKE :preset with the actual argument going from foo to foo%.

#2939 fixes the risk you face if you had :preset more than once in the query, with the last instance getting/not getting that wildcard.

Now we do LIKE :#{#filter.preset}% -> LIKE :preset% -> LIKE :preset || '%', with the actual input untouched. As @mp911de said, Postgres documentation shows this as supported, so something else is at play that is making it seem to transform things into a bytea operation.

I don't know if you can share you complete entity type as well as the arguments fed to that query. And I see you're using a projection, so that would be useful to know as well.

@gregturn
Copy link
Contributor

Something else that could be at play is usage of Cyrillic alphabet as I see in the screenshot. I don't know if that's an issue at all or not, but if the data doesn't quite fit strings, the database could be trying to "help" by migrating to raw binary strings, and throwing everything for a loop. That would also be fitting in its warning to use a cast operation.

@gregturn gregturn added the status: waiting-for-feedback We need additional information before we can continue label Jun 27, 2023
@SimSonic
Copy link
Author

SimSonic commented Jun 28, 2023

I don't know if you can share you complete entity type as well as the arguments fed to that query. And I see you're using a projection, so that would be useful to know as well.

Nothing interesting here.

Entity:

@Entity
@Table(name = "profiles")
@Getter
@Setter
@ToString
public class ProfileEntity {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Column(name = "id", nullable = false, unique = true)
    private Long id;

    @Column(name = "login")
    private String login;

    @Column(name = "phone")
    private String phone;

    @Column(name = "email")
    private String email;

    @Column(name = "last_name")
    private String lastName;

    @Column(name = "first_name")
    private String firstName;

    @Column(name = "preset")
    private String preset;

    @Column(name = "last_seen")
    private Instant lastSeen;

    @Column(name = "latitude")
    private double latitude;

    @Column(name = "longitude")
    private double longitude;

    @Column(name = "coordinates_updated")
    private Instant coordinatesUpdated;

    @Column(name = "fis_timezone")
    private Integer fisTimezone;

    @Column(name = "coordinates_timezone")
    private Integer coordinatesTimezone;

    @Column(name = "gps_timezone")
    private Integer gpsTimezone;

    @Column(name = "blocked", nullable = false)
    private boolean blocked;
}

Filtering fields DTO:

@lombok.Value
@lombok.Builder
@lombok.Jacksonized
@JsonInclude(JsonInclude.Include.NON_NULL)
public class ProfileJournalFilter {

    Long id; // nullable...

    String login; // nullable!

    String preset; // nullable!

    @NotNull
    @Builder.Default
    Instant dateStart = TimeUtils.FILTER_DATE_TIME_START;

    @NotNull
    @Builder.Default
    Instant dateEnd = TimeUtils.FILTER_DATE_TIME_END;

    String lastName; // nullable!

    String firstName; // nullable!

    MobileApplication application; // it's enum, nullable

    MobilePlatform platform; // // it's enum, nullable
}

Projection:

public interface AdminProfileProjection {

    ProfileEntity getProfile();

    ProfileDeviceEntity getProfileDevice();
}

THe root of a problem is nullability of string fields in DTO. Before #2939 if preset was null it generated code like

... AND NULL IS NULL OR presetr LIKE NULL ...

but now it is:

... AND NULL IS NULL OR preset LIKE NULL || '%'

so PostgreSQL cannot concatenate NULL with '%' because interprets it as bytea.

Something else that could be at play is usage of Cyrillic alphabet as I see in the screenshot. I don't know if that's an issue at all or not, but if the data doesn't quite fit strings, the database could be trying to "help" by migrating to raw binary strings, and throwing everything for a loop.

No, it is not the case. All strings are utf-8 so no problems with ascii or cyrillic chars or even multi-byte emojies.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jun 28, 2023
@mp911de
Copy link
Member

mp911de commented Jun 28, 2023

so PostgreSQL cannot concatenate NULL with '%' because interprets it as bytea.

Thanks for the insight. That raises the general question how LIKE :#{#filter.login}% is supposed to work in the case of null as null cannot be combined with % (regardless of the underlying system). I suggest that you provide a property for SpEL usage that renders login + % on your side if the login field is not null.

@SimSonic
Copy link
Author

No any other logic on my side. I think implementation before #2939 just did that (unexpectedly) :)

@mp911de
Copy link
Member

mp911de commented Jun 28, 2023

Our code previously made an assumption regarding representing null values in the context of LIKE with percentage. We cannot rewrite the bound parameter value because if the parameter is used multiple times, the bound value no longer represents the actual value but the rewritten variant.

I wonder whether we could isolate LIKE with % literals so that we create a derived binding value and fall back to the previous approach @gregturn.

@mp911de mp911de changed the title Query stop working after implementing #2939 CONCAT of LIKE with percentage (%) literal fails for null values Jun 28, 2023
@gregturn
Copy link
Contributor

I wonder whether we could isolate LIKE with % literals so that we create a derived binding value and fall back to the previous approach @gregturn.

I can look into this. I'm not quite certain where this happens, but it's worth a shot.

gregturn added a commit that referenced this issue Jun 28, 2023
To handle potential NULL values when LIKE is combined with wildcards, insert a COALESCE function.

See #3041
@gregturn gregturn self-assigned this Jun 28, 2023
@gregturn
Copy link
Contributor

gregturn commented Jun 28, 2023

Looks like transforming a LIKE with wildcards into a CONCAT function happens at a different point in time then when the arguments are fed in. Essentially, we can't tell that in advance if there will be null.

However, JPA has the COALESCE function that can wrap the argument, with a fallback value of an empty string. This provides a succinct, spec-compliant way to handle potential nulls.

To illustrate:

@Query("select e from EmployeeWithName e where e.name like %:partialName%")
List<EmployeeWithName> customQueryWithNullableParam(@Nullable @Param("partialName") String partialName);

This query, when a partial name is provided, will wrap it in wildcards. If null, the COALESCE function will collapse to '', which being CONCAT'd with wildcards, would match everything, which sounds correct.

wdyt @mp911de ?

@SimSonic
Copy link
Author

SimSonic commented Jun 29, 2023

@gregturn I found this question on SO and it makes me doubt that simple ... LIKE COALESCE (:param, '') || '%' will be enough.

Could it be solution: ... LIKE ( COALESCE ( CAST (:param AS TEXT) ) || '%' ) ?

@SimSonic
Copy link
Author

SimSonic commented Jun 29, 2023

For my query I've found a solution:

AND ( :#{#filter.preset} IS NULL OR pe.preset LIKE CONCAT(COALESCE(CAST(:#{#filter.preset} AS string), ''), '%'))

or even

AND (pe.preset LIKE CONCAT(COALESCE(CAST(:#{#filter.preset} AS string), ''), '%'))

because right part began to support nullish values in filter (and in case preset is not nullable in db). The only thing that worries me is that will postgresql planner collapse LIKE '%' into TRUE or not.

@gregturn
Copy link
Contributor

I'll try the CAST operation.

But it's true that if you have a null value combined with a wildcard, it will simply collapse into the wildcard and thus match everything. If you are wanting a different behavior, you may have to rethink usage of a wildcard. What DOES a null combine with a wildcard mean to YOU when it comes to LIKE operations?

gregturn added a commit that referenced this issue Jun 29, 2023
To handle potential NULL values when LIKE is combined with wildcards, insert a COALESCE function.

See #3041
gregturn added a commit that referenced this issue Jun 29, 2023
To handle potential NULL values when LIKE is combined with wildcards, insert a COALESCE function.

See #3041
@gregturn
Copy link
Contributor

Okay, it looks like CAST( :parm as string) is purely Hibernate and will fail with EclipseLink, but CAST(:parm as text) seems to be more universal.

@SimSonic
Copy link
Author

What DOES a null combine with a wildcard mean to YOU when it comes to LIKE operations?

It is a hard question =)

My opinion is that it should be consistent with all implementations of the JPA and RDBMSes and work the same and documented. In fact, two options of concatenating null with '%' — casting to NULL and casting to '%' — are acceptable. But the fall of the request with an exception is undesirable...

@SimSonic
Copy link
Author

Okay, it looks like CAST( :parm as string) is purely Hibernate and will fail with EclipseLink, but CAST(:parm as text) seems to be more universal.

Hm, CAST (:param AS TEXT) failed for me ... That's why a wrote string.

@gregturn
Copy link
Contributor

gregturn commented Jul 4, 2023

Hm, CAST (:param AS TEXT) failed for me ... That's why a wrote string.

This implies we need a provider-oriented function that decides WHICH variant to apply. Basically, Hibernate -> CAST(:param as string) while perhaps everyone else does CAST(:param as text). Integration testing would be in order.

My opinion is that it should be consistent with all implementations of the JPA and RDBMSes and work the same and documented. In fact, two options of concatenating null with '%' — casting to NULL and casting to '%' — are acceptable. But the fall of the request with an exception is undesirable...

If you have null with wildcards on either end, we could then collapse that to an empty string using COALESCE, do the CAST, and then apply the wildcards.

If we had a data set of ["Frodo Baggins", "Bilbo Baggins"], and a query method like:

@Query("select e from Employee e where e.name LIKE '%:name%'")
List<Employee> findMatch(String name);

that sort of rule should yield:

findMatch("Frodo") => ["Frodo Baggins"]
findMatch("Bilbo") => ["Bilbo Baggins"]
findMatch("Baggins") => ["Frodo Baggins", "Bilbo Baggins"]
findMatch("") => ["Frodo Baggins", "Bilbo Baggins"]
findMatch(null) => ["Frodo Baggins", "Bilbo Baggins"]

Does that sound fitting?

@SimSonic
Copy link
Author

SimSonic commented Jul 4, 2023

For me it's an open question what is the most reasonable result for findMatch(null) and I'm not able to choose between query LIKE '%%' (which will result into ["Frodo Baggins", "Bilbo Baggins"]) or LIKE NULL (results in []).

First variant looks like backward-compatible, second is a little bit more intuitive.

I hope for your informed decision.

gregturn added a commit that referenced this issue Jul 7, 2023
To handle potential NULL values when LIKE is combined with wildcards, insert a COALESCE function.

See #3041
gregturn added a commit that referenced this issue Jul 12, 2023
To handle potential NULL values when LIKE is combined with wildcards, insert a COALESCE function. Also, apply a provider-specific CAST to parameters to ensure it doesn't transform into binary formats.

See #3041
@mp911de mp911de added type: bug A general bug and removed type: enhancement A general enhancement labels Jul 21, 2023
@mp911de mp911de added this to the 3.0.9 (2022.0.9) milestone Jul 21, 2023
gregturn pushed a commit that referenced this issue Jul 21, 2023
We now replace LIKE expressions according to their type with individual bindings if an existing binding cannot be used because of how the bound value is being transformed.

WHERE foo like %:name or bar like :name becomes
WHERE foo like :name (i.e. '%' + :name) or bar like :name_1 (i.e. :name)

See #3041
gregturn pushed a commit that referenced this issue Jul 21, 2023
We now distinguish between the binding parameter target and its origin. The parameter target represents how the binding is bound to the query, the origin points to where the binding parameter comes from (method invocation argument or an expression).

The revised design removes the assumption that binding parameters must match their indices/names of the method call to introduce synthetic parameters for different binding variants while using the same underlying invocation parameters.

See #3041
gregturn pushed a commit that referenced this issue Jul 21, 2023
We now verify all bindings to ensure that a like-parameter doesn't mix up with plain bindings (e.g. firstname = %:myparam or lastname = :myparam). We also create unique synthetic parameters for positional bindings.

Also, fix Regex to properly detect named, anonymous and positional bind markers.

See #3041
gregturn added a commit that referenced this issue Jul 21, 2023
@gregturn
Copy link
Contributor

Resolved. Merged to main.

@gregturn gregturn added in: repository Repositories abstraction and removed status: feedback-provided Feedback has been provided labels Jul 21, 2023
gregturn pushed a commit that referenced this issue Jul 21, 2023
We now replace LIKE expressions according to their type with individual bindings if an existing binding cannot be used because of how the bound value is being transformed.

WHERE foo like %:name or bar like :name becomes
WHERE foo like :name (i.e. '%' + :name) or bar like :name_1 (i.e. :name)

See #3041
gregturn pushed a commit that referenced this issue Jul 21, 2023
We now distinguish between the binding parameter target and its origin. The parameter target represents how the binding is bound to the query, the origin points to where the binding parameter comes from (method invocation argument or an expression).

The revised design removes the assumption that binding parameters must match their indices/names of the method call to introduce synthetic parameters for different binding variants while using the same underlying invocation parameters.

See #3041
gregturn pushed a commit that referenced this issue Jul 21, 2023
We now verify all bindings to ensure that a like-parameter doesn't mix up with plain bindings (e.g. firstname = %:myparam or lastname = :myparam). We also create unique synthetic parameters for positional bindings.

Also, fix Regex to properly detect named, anonymous and positional bind markers.

See #3041
gregturn added a commit that referenced this issue Jul 21, 2023
gregturn pushed a commit that referenced this issue Jul 21, 2023
We now replace LIKE expressions according to their type with individual bindings if an existing binding cannot be used because of how the bound value is being transformed.

WHERE foo like %:name or bar like :name becomes
WHERE foo like :name (i.e. '%' + :name) or bar like :name_1 (i.e. :name)

See #3041
gregturn pushed a commit that referenced this issue Jul 21, 2023
We now distinguish between the binding parameter target and its origin. The parameter target represents how the binding is bound to the query, the origin points to where the binding parameter comes from (method invocation argument or an expression).

The revised design removes the assumption that binding parameters must match their indices/names of the method call to introduce synthetic parameters for different binding variants while using the same underlying invocation parameters.

See #3041
gregturn pushed a commit that referenced this issue Jul 21, 2023
We now verify all bindings to ensure that a like-parameter doesn't mix up with plain bindings (e.g. firstname = %:myparam or lastname = :myparam). We also create unique synthetic parameters for positional bindings.

Also, fix Regex to properly detect named, anonymous and positional bind markers.

See #3041
gregturn added a commit that referenced this issue Jul 21, 2023
@gregturn
Copy link
Contributor

Backported to 3.1.x and 3.0.x.

@SimSonic
Copy link
Author

@gregturn hi, will it be backported to 2.7.15?

@gregturn
Copy link
Contributor

gregturn commented Aug 7, 2023

@SimSonic I'm afraid not. There are simply too many differences between Spring Data JPA 2.7.x and 3.x to backport it that far.

@mp911de
Copy link
Member

mp911de commented Aug 8, 2023

@gregturn how about us rolling back the conflicting change in Spring Data JPA 2.7.x? There are workarounds to omit named parameter reuse and judging from the impact of CONCAT, the previous variant seems less harmful to existing applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: repository Repositories abstraction type: bug A general bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants