Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Select after complex GroupJoin leads to unpredictable results #4858

Closed
yyjdelete opened this issue Mar 19, 2016 · 7 comments
Closed
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Milestone

Comments

@yyjdelete
Copy link

Database: MSSQL
EntityFramework: EntityFramework.Core:7.0.0-rc1-final and Microsoft.EntityFrameworkCore:1.0.0-rc2-20261(aspnetcidev)

ctx.DeviceMsgUps
     .GroupJoin(ctx.UserDeviceMaps.Where(udp => udp.UserId == 1025 && udp.Status == 0 && udp.DeviceId != null), dmu => dmu.DeviceId, udp => udp.DeviceId, (key, vals) => new { key, vals })
     .Where(pair => pair.vals.Any())
     .Select(pair=>pair.key)
     .Select(key=>new { key.Format })
     .Count()

Get 3 result(the same with ToList).


ctx.DeviceMsgUps
     .GroupJoin(ctx.UserDeviceMaps.Where(udp => udp.UserId == 1025 && udp.Status == 0 && udp.DeviceId != null), dmu => dmu.DeviceId, udp => udp.DeviceId, (key, vals) => new { key, vals })
     .Where(pair => pair.vals.Any())
     .Select(pair=>pair.key)
     //.Select(key=>new { key.Format })
     .Count()

Get 57 result(the same with ToList).


ctx.DeviceMsgUps
     .GroupJoin(ctx.UserDeviceMaps.Where(udp => udp.UserId == 1025 && udp.Status == 0 && udp.DeviceId != null), dmu => dmu.DeviceId, udp => udp.DeviceId, (key, vals) => new { key, vals })
     .Where(pair => pair.vals.Any())
     //.Select(pair=>pair.key)
     //.Select(key=>new { key.Format })
     .Count()

Get 1 result(the same with ToList).


Model

    [System.CodeDom.Compiler.GeneratedCodeAttribute("EF.Reverse.POCO.Generator", "2.18.1.0")]
    public class DeviceMsgUp
    {
        public int MessageId { get; set; } // Message_ID (Primary key)
        public int DeviceId { get; set; } // Device_ID
        public byte Format { get; set; } // Format
        public byte Action { get; set; } // Action
        public string Data { get; set; } // Data (length: 1000)
        public System.DateTime? ExpireTime { get; set; } // Expire_Time
        public System.DateTime Update { get; set; } // Update
        public byte ResultStatus { get; set; } // Result_Status
        public string ResultData { get; set; } // Result_Data (length: 1000)
    }

    [System.CodeDom.Compiler.GeneratedCodeAttribute("EF.Reverse.POCO.Generator", "2.18.1.0")]
    public partial class UserDeviceMap
    {
        public int UdMapId { get; set; } // UDMap_ID (Primary key)
        public int UserId { get; set; } // User_ID
        public int? DeviceId { get; set; } // Device_ID
        public int? StudentId { get; set; } // Student_ID
        public int? ClassId { get; set; } // Class_ID
        public string NickName { get; set; } // Nick_Name (length: 20)
        public string Image { get; set; } // Image (length: 50)
        public bool? Gender { get; set; } // Gender
        public System.DateTime? BirthDate { get; set; } // Birth_Date
        public System.DateTime? BindDate { get; set; } // Bind_Date
        public byte Status { get; set; } // Status
        public int Creator { get; set; } // Creator
        public System.DateTime Upload { get; set; } // Upload
        public int Login { get; set; } // Login
        public System.DateTime Update { get; set; } // Update
        public string CellPhoneNum { get; set; } // CellPhoneNum (length: 11)

        // Reverse navigation
        public virtual System.Collections.Generic.ICollection<UserWalkTarget> UserWalkTargets { get; set; } // Many to many mapping

        // Foreign keys
        public virtual Class Class { get; set; } // FK_User_Device_Map_Class_ID_Class_Class_ID
        public virtual Device Device { get; set; } // FK_User_Device_Map_Device_ID_Device_Device_ID
        public virtual Role Role { get; set; } // FK_User_Device_Map_User_ID_Role_Role_ID
        public virtual Student Student { get; set; } // FK_User_Device_Map_Student_ID_Student_Student_ID

        public UserDeviceMap()
        {
            UserWalkTargets = new System.Collections.Generic.List<UserWalkTarget>();
        }
    }
@yyjdelete
Copy link
Author

I paste an test code for both at #4860 (comment)

@mikary
Copy link
Contributor

mikary commented Apr 12, 2016

This appears to be a bug (or multiple bugs) with the shape of the result from GroupJoin. I restructured the sample to something closer to what we use in QueryTestBase to compare the EF results with linq-to-objects. Note that the reference implementation has two results for each query.

    public class ComplexGroupJoin
    {
        [Fact]
        public void Group_join_without_projection()
        {
            AssertQuery(
                (dmus, udms) => dmus
                    .GroupJoin(
                        udms.Where(udp => udp.UserId == 1025 && udp.DeviceId != null),
                        dmu => dmu.DeviceId,
                        udp => udp.DeviceId,
                        (key, vals) => new { key, vals })
                    .Where(pair => pair.vals.Any()),
                asserter:
                    (l2oResults, efResults) =>
                        {
                            var l2oObjects
                                = l2oResults
                                    .OfType<object[]>()
                                    .Select(o => o.Select(o2 =>
                                        new
                                        {
                                            Key = GetProperty<DeviceMsgUp>(o2, "key"),
                                            Vals = string.Join(", ", GetProperty<IEnumerable<UserDeviceMap>>(o2, "vals")
                                                .Select(dum => dum.UdMapId.ToString()))
                                        }));

                            var efObjects
                                = efResults
                                    .OfType<object[]>()
                                    .Select(o => o.Select(o2 =>
                                        new
                                        {
                                            Key = GetProperty<DeviceMsgUp>(o2, "key"),
                                            Vals = string.Join(", ", GetProperty<IEnumerable<UserDeviceMap>>(o2, "vals")
                                                .Select(dum => dum.UdMapId.ToString()))
                                        }));

                            Assert.Equal(l2oObjects, efObjects);
                        });

            // Expected: WhereSelectEnumerableIterator<Object[], IEnumerable<<>f__AnonymousType10<DeviceMsgUp, String>>> [[{ Key = DeviceMsgUp 1, Vals = 1 }, { Key = DeviceMsgUp 2, Vals = 1 }]]
            // Actual:   WhereSelectEnumerableIterator<Object[], IEnumerable<<>f__AnonymousType10<DeviceMsgUp, String>>> [[{ Key = DeviceMsgUp 1, Vals = 1, 1 }]]
        }

        [Fact]
        public void Group_join_with_key_projection()
        {
            AssertQuery(
                (dmus, udms) => dmus
                    .GroupJoin(
                        udms.Where(udp => udp.UserId == 1025 && udp.DeviceId != null),
                        dmu => dmu.DeviceId,
                        udp => udp.DeviceId,
                        (key, vals) => new { key, vals })
                    .Where(pair => pair.vals.Any())
                    .Select(pair => pair.key),
                asserter:
                    (l2oResults, efResults) =>
                        {
                            var l2oObjects
                                = l2oResults
                                    .OfType<object[]>()
                                    .Select(o => o.Select(o2 => (DeviceMsgUp)o2));

                            var efObjects
                                = efResults
                                    .OfType<object[]>()
                                    .Select(o => o.Select(o2 => (DeviceMsgUp)o2));

                            Assert.Equal(l2oObjects, efObjects);
                        });
            // Expected: WhereSelectEnumerableIterator<Object[], IEnumerable<DeviceMsgUp>> [[DeviceMsgUp 1, DeviceMsgUp 2]]
            // Actual:   WhereSelectEnumerableIterator<Object[], IEnumerable<DeviceMsgUp>> [[DeviceMsgUp 1, DeviceMsgUp 2, DeviceMsgUp 3]]
        }

        [Fact]
        public void Group_join_with_key_action_projection()
        {
            AssertQuery(
                (dmus, udms) => dmus
                    .GroupJoin(
                        udms.Where(udp => udp.UserId == 1025 && udp.DeviceId != null),
                        dmu => dmu.DeviceId,
                        udp => udp.DeviceId,
                        (key, vals) => new { key, vals })
                    .Where(pair => pair.vals.Any())
                    .Select(pair => pair.key)
                    .Select(key => new { key.Action }),
                asserter:
                    (l2oResults, efResults) =>
                        {
                            var l2oObjects
                                = l2oResults
                                    .OfType<object[]>()
                                    .Select(o => o.Select(o2 => GetProperty<byte>(o2, "Action")));

                            var efObjects
                                = efResults
                                    .OfType<object[]>()
                                    .Select(o => o.Select(o2 => GetProperty<byte>(o2, "Action")));

                            Assert.Equal(l2oObjects, efObjects);
                        });
            // Passes (Probably by coincidence.)
        }

        #region test_setup

        public ComplexGroupJoin()
        {
            using (var context = new MyDbContext())
            {
                context.Database.EnsureDeleted();
                context.Database.EnsureCreated();

                context.UserDeviceMaps.AddRange(_userDeviceMaps);
                context.DeviceMsgUps.AddRange(_deviceMsgUps);

                context.SaveChanges();
            }
        }

        private readonly List<DeviceMsgUp> _deviceMsgUps
            = new List<DeviceMsgUp>
            {
                new DeviceMsgUp
                {
                    MessageId = 1,
                    DeviceId = 1,
                    Action = 27,
                },
                new DeviceMsgUp
                {
                    MessageId = 2,
                    DeviceId = 1,
                    Action = 27,
                },
                new DeviceMsgUp
                {
                    MessageId = 3,
                    DeviceId = 2,
                    Action = 27,
                }
            };

        private readonly List<UserDeviceMap> _userDeviceMaps
            = new List<UserDeviceMap>
            {
                new UserDeviceMap
                {
                    UdMapId = 1,
                    UserId = 1025,
                    DeviceId = 1
                }
            };

        private static T GetProperty<T>(object o, string propertyName)
            => (T)o.GetType().GetProperty(propertyName).GetValue(o);

        private void AssertQuery(
            Func<IQueryable<DeviceMsgUp>, IQueryable<UserDeviceMap>, IQueryable<object>> query,
            bool assertOrder = false,
            Action<IList<object>, IList<object>> asserter = null)
        {
            using (var context = new MyDbContext())
            {
                TestHelpers.AssertResults(
                    new[] { query(_deviceMsgUps.AsQueryable(), _userDeviceMaps.AsQueryable()).ToArray() },
                    new[] { query(context.Set<DeviceMsgUp>(), context.Set<UserDeviceMap>()).ToArray() },
                    assertOrder,
                    asserter);
            }
        }

        #endregion

        #region model

        public class MyDbContext : DbContext
        {
            public DbSet<DeviceMsgUp> DeviceMsgUps { get; set; }
            public DbSet<UserDeviceMap> UserDeviceMaps { get; set; }

            protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
            {
                base.OnConfiguring(optionsBuilder);
                optionsBuilder.UseSqlServer("Data Source=(localdb)\\MSSQLLocalDB;Initial Catalog=ConfusingErrorTest;Integrated Security=True");
            }

            protected override void OnModelCreating(ModelBuilder modelBuilder)
            {
                modelBuilder.Entity<UserDeviceMap>()
                    .HasKey(udm => udm.UdMapId);
                modelBuilder.Entity<UserDeviceMap>()
                    .Property(udm => udm.UdMapId).ValueGeneratedNever();

                modelBuilder.Entity<DeviceMsgUp>()
                    .HasKey(dmu => dmu.MessageId);
                modelBuilder.Entity<DeviceMsgUp>()
                    .Property(dmu => dmu.MessageId).ValueGeneratedNever();
            }
        }

        public class DeviceMsgUp
        {
            public int MessageId { get; set; } //(Primary key)
            public int DeviceId { get; set; }
            public byte Action { get; set; }

            protected bool Equals(DeviceMsgUp other) => string.Equals(MessageId, other.MessageId);

            public override bool Equals(object obj)
            {
                if (ReferenceEquals(null, obj))
                {
                    return false;
                }
                if (ReferenceEquals(this, obj))
                {
                    return true;
                }

                return obj.GetType() == GetType()
                       && Equals((DeviceMsgUp)obj);
            }

            public override int GetHashCode() => MessageId.GetHashCode();

            public override string ToString() => "DeviceMsgUp " + MessageId;
        }

        public class UserDeviceMap
        {
            public int UdMapId { get; set; } //(Primary key)
            public int UserId { get; set; }
            public int? DeviceId { get; set; }

            protected bool Equals(UserDeviceMap other) => string.Equals(UdMapId, other.UdMapId);

            public override bool Equals(object obj)
            {
                if (ReferenceEquals(null, obj))
                {
                    return false;
                }
                if (ReferenceEquals(this, obj))
                {
                    return true;
                }

                return obj.GetType() == GetType()
                       && Equals((UserDeviceMap)obj);
            }

            public override int GetHashCode() => UdMapId.GetHashCode();

            public override string ToString() => "Customer " + UdMapId;
        }

        #endregion
    }

@mikary
Copy link
Contributor

mikary commented Apr 12, 2016

Clearing milestone for triage

@mikary mikary removed this from the 1.0.0 milestone Apr 12, 2016
@rowanmiller rowanmiller added pri1 and removed pri0 labels Apr 15, 2016
@rowanmiller rowanmiller added this to the 1.0.0 milestone Apr 15, 2016
@rowanmiller rowanmiller assigned maumar and unassigned mikary Apr 15, 2016
@rowanmiller rowanmiller modified the milestones: 1.0.0, 1.0.1 May 13, 2016
@rowanmiller rowanmiller removed the pri1 label Jul 1, 2016
@maumar
Copy link
Contributor

maumar commented Jul 20, 2016

Currently 1st and 3rd queries pass and only 2nd query fails. We produce the following query plan:

(QueryContext queryContext) => IEnumerable<DeviceMsgUp> _Select(
    source: IEnumerable<TransparentIdentifier<DeviceMsgUp, IEnumerable<ValueBuffer>>> _Where(
        source: IEnumerable<TransparentIdentifier<DeviceMsgUp, IEnumerable<ValueBuffer>>> _GroupJoin(
            queryContext: (RelationalQueryContext) queryContext, 
            source: IEnumerable<ValueBuffer> _Query(
                queryContext: queryContext, 
                shaperCommandContext: SelectExpression: 
                    SELECT [dmu].[MessageId], [dmu].[Action], [dmu].[DeviceId], [t].[DeviceId]
                    FROM [DeviceMsgUps] AS [dmu]
                    LEFT JOIN (
                        SELECT [udp0].*
                        FROM [UserDeviceMaps] AS [udp0]
                        WHERE ([udp0].[UserId] = 1025) AND [udp0].[DeviceId] IS NOT NULL
                    ) AS [t] ON [dmu].[DeviceId] = [t].[DeviceId]
                    ORDER BY [dmu].[DeviceId]
                , 
                queryIndex: default(System.Nullable`1[System.Int32])
            )
            , 
            outerShaper: BufferedEntityShaper<DeviceMsgUp>, 
            innerShaper: ValueBufferShaper, 
            innerKeySelector: (ValueBuffer udp) => (Nullable<int>) object udp.get_Item(0), 
            resultSelector: (DeviceMsgUp dmu | IEnumerable<ValueBuffer> vals) => TransparentIdentifier<DeviceMsgUp, IEnumerable<ValueBuffer>> CreateTransparentIdentifier(
                outer: dmu, 
                inner: vals
            )
            , 
            outerGroupJoinInclude: default(Internal.GroupJoinInclude), 
            innerGroupJoinInclude: default(Internal.GroupJoinInclude)
        )
        , 
        predicate: (TransparentIdentifier<DeviceMsgUp, IEnumerable<ValueBuffer>> t0) => bool Any(
            source: t0.Inner
        )
    )
    , 
    selector: (TransparentIdentifier<DeviceMsgUp, IEnumerable<ValueBuffer>> t0) => t0.Outer
)

The problem is that GroupJoin's inner elements is IEnumerable, which gets created even for rows that don't have a Device (as opposed to case where we materialize Inner as Entity - then we correctly make it null, and those rows are filtered correctly)

@maumar maumar removed this from the 1.1.0 milestone Aug 24, 2016
@maumar
Copy link
Contributor

maumar commented Aug 24, 2016

Clearing milestone for triage, we might want to take this for 1.0.1 since it can cause data corruption @divega

maumar added a commit that referenced this issue Aug 24, 2016
…esult

Problem exists for complex query with groupjoin, where the inner element is a subquery, then we apply collection operator on the inner collection, but project only the outer element like so:

context.LevelOne
  .GroupJoin(
    context.LevelTwo.Where(l2 => l2.Name != "L2 01"),
    l1 => l1.Id,
    l2 => l2.Level1_Optional_Id,
    (l1, l2s) => new { l1, l2s })
  .Where(r => r.l2s.Any())
  Select(r => r.l1);

What happens is that we look at which query sources we need to materialize (based on final projection), deduce that we don't need to materialize the elements of the inner collection, so it's being represented as value buffer. Problem is that currently value buffer can't represent a null-value element, so Any() always returns true (even for empty collection), which leads to incorrect results.

Fix is to force materialization of inner collection elements as entities. We already did that, but missed the case where the inner collection was a subquery.
@divega divega changed the title Select after complex GroupJoin lead to unpredictable result? Select after complex GroupJoin lead to unpredictable result Aug 24, 2016
@divega divega added this to the 1.0.1 milestone Aug 25, 2016
@Eilon
Copy link
Member

Eilon commented Aug 26, 2016

Approved for 1.0.1.

maumar added a commit that referenced this issue Aug 27, 2016
…esult

Problem exists for complex query with groupjoin, where the inner element is a subquery, then we apply collection operator on the inner collection, but project only the outer element like so:

context.LevelOne
.GroupJoin(
context.LevelTwo.Where(l2 => l2.Name != "L2 01"),
l1 => l1.Id,
l2 => l2.Level1_Optional_Id,
(l1, l2s) => new { l1, l2s })
.Where(r => r.l2s.Any())
Select(r => r.l1);

What happens is that we look at which query sources we need to materialize (based on final projection), deduce that we don't need to materialize the elements of the inner collection, so it's being represented as value buffer. Problem is that currently value buffer can't represent a null-value element, so Any() always returns true (even for empty collection), which leads to incorrect results.

Fix is to force materialization of inner collection elements as entities. We already did that, but missed the case where the inner collection was a subquery.
@maumar
Copy link
Contributor

maumar commented Aug 27, 2016

Fixed in: e514b49 (1.0.1) 1a46a54 (dev)

@maumar maumar closed this as completed Aug 27, 2016
@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Aug 27, 2016
@rowanmiller rowanmiller changed the title Select after complex GroupJoin lead to unpredictable result Query: Select after complex GroupJoin lead to unpredictable result Sep 13, 2016
@divega divega changed the title Query: Select after complex GroupJoin lead to unpredictable result Query: Select after complex GroupJoin leads to unpredictable results Sep 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-bug
Projects
None yet
Development

No branches or pull requests

6 participants