Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match nub* functions with Array #179

Merged
merged 24 commits into from
Jan 25, 2021
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@ Notable changes to this project are documented in this file. The format is based
## [Unreleased]

Breaking changes:
- Convert `nub`/`nubBy` to use ordering, rather than equality (#179)

New features:
- Add `nubEq`/`nubByEq` (#179)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have alternatively written this as:

Breaking changes:

New features:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way you've formulated it in the changelog currently.


Bugfixes:

Expand Down
72 changes: 59 additions & 13 deletions src/Data/List.purs
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ module Data.List

, nub
, nubBy
, nubEq
, nubByEq
, union
, unionBy
, delete
Expand All @@ -101,22 +103,20 @@ import Control.Alt ((<|>))
import Control.Alternative (class Alternative)
import Control.Lazy (class Lazy, defer)
import Control.Monad.Rec.Class (class MonadRec, Step(..), tailRecM, tailRecM2)

import Data.Bifunctor (bimap)
import Data.Foldable (class Foldable, foldr, any, foldl)
import Data.Foldable (foldl, foldr, foldMap, fold, intercalate, elem, notElem, find, findMap, any, all) as Exports
import Data.FunctorWithIndex (mapWithIndex) as FWI
import Data.List.Internal (emptySet, insertAndLookupBy)
import Data.List.Types (List(..), (:))
import Data.List.Types (NonEmptyList(..)) as NEL
import Data.Maybe (Maybe(..))
import Data.Newtype (class Newtype)
import Data.NonEmpty ((:|))
import Data.Traversable (scanl, scanr) as Exports
import Data.Traversable (sequence)
import Data.Tuple (Tuple(..))
import Data.Unfoldable (class Unfoldable, unfoldr)

import Data.Foldable (foldl, foldr, foldMap, fold, intercalate, elem, notElem, find, findMap, any, all) as Exports
import Data.Traversable (scanl, scanr) as Exports

import Prim.TypeError (class Warn, Text)

-- | Convert a list into any unfoldable structure.
Expand Down Expand Up @@ -663,18 +663,64 @@ tails list@(Cons _ tl)= list : tails tl
--------------------------------------------------------------------------------

-- | Remove duplicate elements from a list.
-- | Keeps the first occurrence of each element in the input list,
-- | in the same order they appear in the input list.
-- |
-- | ```purescript
-- | nub 1:2:1:3:3:Nil == 1:2:3:Nil
-- | ```
-- |
-- | Running time: `O(n log n)`
nub :: forall a. Ord a => List a -> List a
nub = nubBy compare

-- | Remove duplicate elements from a list based on the provided comparison function.
-- | Keeps the first occurrence of each element in the input list,
-- | in the same order they appear in the input list.
-- |
-- | ```purescript
-- | nubBy (compare `on` Array.length) ([1]:[2]:[3,4]:Nil) == [1]:[3,4]:Nil
-- | ```
-- |
-- | Running time: `O(n log n)`
nubBy :: forall a. (a -> a -> Ordering) -> List a -> List a
nubBy p = reverse <<< go emptySet Nil
where
go _ acc Nil = acc
go s acc (a : as) =
let { found, result: s' } = insertAndLookupBy p a s
in if found
then go s' acc as
else go s' (a : acc) as

-- | Remove duplicate elements from a list.
-- | Keeps the first occurrence of each element in the input list,
-- | in the same order they appear in the input list.
-- | This less efficient version of `nub` only requires an `Eq` instance.
-- |
-- | ```purescript
-- | nubEq 1:2:1:3:3:Nil == 1:2:3:Nil
-- | ```
-- |
-- | Running time: `O(n^2)`
nub :: forall a. Eq a => List a -> List a
nub = nubBy eq
nubEq :: forall a. Eq a => List a -> List a
nubEq = nubByEq eq

-- | Remove duplicate elements from a list, using the specified
-- | function to determine equality of elements.
-- | Remove duplicate elements from a list, using the provided equivalence function.
-- | Keeps the first occurrence of each element in the input list,
-- | in the same order they appear in the input list.
-- | This less efficient version of `nubBy` only requires an equivalence
-- | function, rather than an ordering function.
-- |
-- | ```purescript
-- | mod3eq = eq `on` \n -> mod n 3
-- | nubByEq mod3eq 1:3:4:5:6:Nil == 1:3:5:Nil
-- | ```
-- |
-- | Running time: `O(n^2)`
nubBy :: forall a. (a -> a -> Boolean) -> List a -> List a
nubBy _ Nil = Nil
nubBy eq' (x : xs) = x : nubBy eq' (filter (\y -> not (eq' x y)) xs)
nubByEq :: forall a. (a -> a -> Boolean) -> List a -> List a
nubByEq _ Nil = Nil
nubByEq eq' (x : xs) = x : nubByEq eq' (filter (\y -> not (eq' x y)) xs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never stack-safe. Reported in #194


-- | Calculate the union of two lists.
-- |
Expand All @@ -687,7 +733,7 @@ union = unionBy (==)
-- |
-- | Running time: `O(n^2)`
unionBy :: forall a. (a -> a -> Boolean) -> List a -> List a -> List a
unionBy eq xs ys = xs <> foldl (flip (deleteBy eq)) (nubBy eq ys) xs
milesfrain marked this conversation as resolved.
Show resolved Hide resolved
unionBy eq xs ys = xs <> foldl (flip (deleteBy eq)) (nubByEq eq ys) xs

-- | Delete the first occurrence of an element from a list.
-- |
Expand Down
63 changes: 63 additions & 0 deletions src/Data/List/Internal.purs
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
module Data.List.Internal (Set, emptySet, insertAndLookupBy) where

import Prelude

import Data.List.Types (List(..))

data Set k
= Leaf
| Two (Set k) k (Set k)
| Three (Set k) k (Set k) k (Set k)

emptySet :: forall k. Set k
emptySet = Leaf

data TreeContext k
= TwoLeft k (Set k)
| TwoRight (Set k) k
| ThreeLeft k (Set k) k (Set k)
| ThreeMiddle (Set k) k k (Set k)
| ThreeRight (Set k) k (Set k) k

fromZipper :: forall k. List (TreeContext k) -> Set k -> Set k
fromZipper Nil tree = tree
fromZipper (Cons x ctx) tree =
case x of
TwoLeft k1 right -> fromZipper ctx (Two tree k1 right)
TwoRight left k1 -> fromZipper ctx (Two left k1 tree)
ThreeLeft k1 mid k2 right -> fromZipper ctx (Three tree k1 mid k2 right)
ThreeMiddle left k1 k2 right -> fromZipper ctx (Three left k1 tree k2 right)
ThreeRight left k1 mid k2 -> fromZipper ctx (Three left k1 mid k2 tree)

data KickUp k = KickUp (Set k) k (Set k)

-- | Insert or replace a key/value pair in a map
insertAndLookupBy :: forall k. (k -> k -> Ordering) -> k -> Set k -> { found :: Boolean, result :: Set k }
insertAndLookupBy comp k orig = down Nil orig
where
down :: List (TreeContext k) -> Set k -> { found :: Boolean, result :: Set k }
down ctx Leaf = { found: false, result: up ctx (KickUp Leaf k Leaf) }
down ctx (Two left k1 right) =
case comp k k1 of
EQ -> { found: true, result: orig }
LT -> down (Cons (TwoLeft k1 right) ctx) left
_ -> down (Cons (TwoRight left k1) ctx) right
down ctx (Three left k1 mid k2 right) =
case comp k k1 of
EQ -> { found: true, result: orig }
c1 ->
case c1, comp k k2 of
_ , EQ -> { found: true, result: orig }
LT, _ -> down (Cons (ThreeLeft k1 mid k2 right) ctx) left
GT, LT -> down (Cons (ThreeMiddle left k1 k2 right) ctx) mid
_ , _ -> down (Cons (ThreeRight left k1 mid k2) ctx) right

up :: List (TreeContext k) -> KickUp k -> Set k
up Nil (KickUp left k' right) = Two left k' right
up (Cons x ctx) kup =
case x, kup of
TwoLeft k1 right, KickUp left k' mid -> fromZipper ctx (Three left k' mid k1 right)
TwoRight left k1, KickUp mid k' right -> fromZipper ctx (Three left k1 mid k' right)
ThreeLeft k1 c k2 d, KickUp a k' b -> up ctx (KickUp (Two a k' b) k1 (Two c k2 d))
ThreeMiddle a k1 k2 d, KickUp b k' c -> up ctx (KickUp (Two a k1 b) k' (Two c k2 d))
ThreeRight a k1 b k2, KickUp c k' d -> up ctx (KickUp (Two a k1 b) k2 (Two c k' d))
39 changes: 33 additions & 6 deletions src/Data/List/Lazy.purs
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ module Data.List.Lazy

, nub
, nubBy
, nubEq
milesfrain marked this conversation as resolved.
Show resolved Hide resolved
, nubByEq
, union
, unionBy
, delete
Expand Down Expand Up @@ -103,6 +105,7 @@ import Control.Monad.Rec.Class as Rec
import Data.Foldable (class Foldable, foldr, any, foldl)
import Data.Foldable (foldl, foldr, foldMap, fold, intercalate, elem, notElem, find, findMap, any, all) as Exports
import Data.Lazy (defer)
import Data.List.Internal (emptySet, insertAndLookupBy)
import Data.List.Lazy.Types (List(..), Step(..), step, nil, cons, (:))
import Data.List.Lazy.Types (NonEmptyList(..)) as NEL
import Data.Maybe (Maybe(..), isNothing)
Expand Down Expand Up @@ -590,21 +593,45 @@ partition f = foldr go {yes: nil, no: nil}
-- Set-like operations ---------------------------------------------------------
--------------------------------------------------------------------------------

-- | Remove duplicate elements from a list.
-- | Keeps the first occurrence of each element in the input list,
-- | in the same order they appear in the input list.
-- |
-- | Running time: `O(n log n)`
nub :: forall a. Ord a => List a -> List a
nub = nubBy compare

-- | Remove duplicate elements from a list based on the provided comparison function.
-- | Keeps the first occurrence of each element in the input list,
-- | in the same order they appear in the input list.
-- |
-- | Running time: `O(n log n)`
nubBy :: forall a. (a -> a -> Ordering) -> List a -> List a
nubBy p = go emptySet
where
go s (List l) = List (map (goStep s) l)
goStep _ Nil = Nil
goStep s (Cons a as) =
let { found, result: s' } = insertAndLookupBy p a s
in if found
then step (go s' as)
else Cons a (go s' as)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look stack safe, but I don't know of a good way to fix it. The stack-safety strategy for strict nubBy involved reversing the final result, but that can't be done here with infinite lazy lists. Reported in #194

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is mostly stack-safe, because it's the caller's responsibility to force the thunks and most of the recursive calls are deferred. However, there could potentially be enough duplicate elements in a row to blow the stack (since we do recursively force thunks in that case).


-- | Remove duplicate elements from a list.
-- |
-- | Running time: `O(n^2)`
nub :: forall a. Eq a => List a -> List a
nub = nubBy eq
nubEq :: forall a. Eq a => List a -> List a
nubEq = nubByEq eq

-- | Remove duplicate elements from a list, using the specified
-- | function to determine equality of elements.
-- |
-- | Running time: `O(n^2)`
nubBy :: forall a. (a -> a -> Boolean) -> List a -> List a
nubBy eq = List <<< map go <<< unwrap
nubByEq :: forall a. (a -> a -> Boolean) -> List a -> List a
nubByEq eq = List <<< map go <<< unwrap
where
go Nil = Nil
go (Cons x xs) = Cons x (nubBy eq (filter (\y -> not (eq x y)) xs))
go (Cons x xs) = Cons x (nubByEq eq (filter (\y -> not (eq x y)) xs))

-- | Calculate the union of two lists.
-- |
Expand All @@ -617,7 +644,7 @@ union = unionBy (==)
-- |
-- | Running time: `O(n^2)`
unionBy :: forall a. (a -> a -> Boolean) -> List a -> List a -> List a
unionBy eq xs ys = xs <> foldl (flip (deleteBy eq)) (nubBy eq ys) xs
unionBy eq xs ys = xs <> foldl (flip (deleteBy eq)) (nubByEq eq ys) xs

-- | Delete the first occurrence of an element from a list.
-- |
Expand Down
12 changes: 10 additions & 2 deletions src/Data/List/NonEmpty.purs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ module Data.List.NonEmpty
, partition
, nub
, nubBy
, nubEq
, nubByEq
, union
, unionBy
, intersect
Expand Down Expand Up @@ -278,12 +280,18 @@ groupAllBy = wrappedOperation "groupAllBy" <<< L.groupAllBy
partition :: forall a. (a -> Boolean) -> NonEmptyList a -> { yes :: L.List a, no :: L.List a }
partition = lift <<< L.partition

nub :: forall a. Eq a => NonEmptyList a -> NonEmptyList a
nub :: forall a. Ord a => NonEmptyList a -> NonEmptyList a
nub = wrappedOperation "nub" L.nub

nubBy :: forall a. (a -> a -> Boolean) -> NonEmptyList a -> NonEmptyList a
nubBy :: forall a. (a -> a -> Ordering) -> NonEmptyList a -> NonEmptyList a
nubBy = wrappedOperation "nubBy" <<< L.nubBy

nubEq :: forall a. Eq a => NonEmptyList a -> NonEmptyList a
nubEq = wrappedOperation "nubEq" L.nubEq

nubByEq :: forall a. (a -> a -> Boolean) -> NonEmptyList a -> NonEmptyList a
nubByEq = wrappedOperation "nubByEq" <<< L.nubByEq

union :: forall a. Eq a => NonEmptyList a -> NonEmptyList a -> NonEmptyList a
union = wrappedOperation2 "union" L.union

Expand Down
19 changes: 14 additions & 5 deletions test/Test/Data/List.purs
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,11 @@ module Test.Data.List (testList) where

import Prelude

import Data.Array as Array
import Data.Foldable (foldMap, foldl)
import Data.FoldableWithIndex (foldMapWithIndex, foldlWithIndex, foldrWithIndex)
import Data.List (List(..), (..), stripPrefix, Pattern(..), length, range, foldM, unzip, zip, zipWithA, zipWith, intersectBy, intersect, (\\), deleteBy, delete, unionBy, union, nubBy, nub, group, groupAll, groupBy, groupAllBy, partition, span, dropWhile, drop, dropEnd, takeWhile, take, takeEnd, sortBy, sort, catMaybes, mapMaybe, filterM, filter, concat, concatMap, reverse, alterAt, modifyAt, updateAt, deleteAt, insertAt, findLastIndex, findIndex, elemLastIndex, elemIndex, (!!), uncons, unsnoc, init, tail, last, head, insertBy, insert, snoc, null, singleton, fromFoldable, transpose, mapWithIndex, (:))
import Data.Function (on)
import Data.List (List(..), Pattern(..), alterAt, catMaybes, concat, concatMap, delete, deleteAt, deleteBy, drop, dropEnd, dropWhile, elemIndex, elemLastIndex, filter, filterM, findIndex, findLastIndex, foldM, fromFoldable, group, groupAll, groupAllBy, groupBy, head, init, insert, insertAt, insertBy, intersect, intersectBy, last, length, mapMaybe, mapWithIndex, modifyAt, nub, nubBy, nubByEq, nubEq, null, partition, range, reverse, singleton, snoc, sort, sortBy, span, stripPrefix, tail, take, takeEnd, takeWhile, transpose, uncons, union, unionBy, unsnoc, unzip, updateAt, zip, zipWith, zipWithA, (!!), (..), (:), (\\))
import Data.List.NonEmpty as NEL
import Data.Maybe (Maybe(..), isNothing, fromJust)
import Data.Monoid.Additive (Additive(..))
Expand Down Expand Up @@ -37,7 +39,7 @@ testList = do
assert $ (range 0 5) == l [0, 1, 2, 3, 4, 5]
assert $ (range 2 (-3)) == l [2, 1, 0, -1, -2, -3]

log "replicate should produce an list containg an item a specified number of times"
log "replicate should produce an list containing an item a specified number of times"
assert $ replicate 3 true == l [true, true, true]
assert $ replicate 1 "foo" == l ["foo"]
assert $ replicate 0 "foo" == l []
Expand Down Expand Up @@ -281,12 +283,19 @@ testList = do
assert $ partitioned.yes == l [5, 3, 4]
assert $ partitioned.no == l [1, 2]

log "nub should remove duplicate elements from the list, keeping the first occurence"
log "nub should remove duplicate elements from the list, keeping the first occurrence"
assert $ nub (l [1, 2, 2, 3, 4, 1]) == l [1, 2, 3, 4]

log "nubBy should remove duplicate items from the list using a supplied predicate"
let nubPred = \x y -> if odd x then false else x == y
assert $ nubBy nubPred (l [1, 2, 2, 3, 3, 4, 4, 1]) == l [1, 2, 3, 3, 4, 1]
let nubPred = compare `on` Array.length
assert $ nubBy nubPred (l [[1],[2],[3,4]]) == l [[1],[3,4]]

log "nubEq should remove duplicate elements from the list, keeping the first occurrence"
assert $ nubEq (l [1, 2, 2, 3, 4, 1]) == l [1, 2, 3, 4]

log "nubByEq should remove duplicate items from the list using a supplied predicate"
let mod3eq = eq `on` \n -> mod n 3
assert $ nubByEq mod3eq (l [1, 3, 4, 5, 6]) == l [1, 3, 5]

log "union should produce the union of two lists"
assert $ union (l [1, 2, 3]) (l [2, 3, 4]) == l [1, 2, 3, 4]
Expand Down
15 changes: 12 additions & 3 deletions test/Test/Data/List/Lazy.purs
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@ module Test.Data.List.Lazy (testListLazy) where
import Prelude

import Control.Lazy (defer)
import Data.Array as Array
import Data.FoldableWithIndex (foldMapWithIndex, foldlWithIndex, foldrWithIndex)
import Data.Function (on)
import Data.FunctorWithIndex (mapWithIndex)
import Data.Lazy as Z
import Data.List.Lazy (List, Pattern(..), alterAt, catMaybes, concat, concatMap, cons, delete, deleteAt, deleteBy, drop, dropWhile, elemIndex, elemLastIndex, filter, filterM, findIndex, findLastIndex, foldM, foldMap, foldl, foldr, foldrLazy, fromFoldable, group, groupBy, head, init, insert, insertAt, insertBy, intersect, intersectBy, iterate, last, length, mapMaybe, modifyAt, nil, nub, nubBy, null, partition, range, repeat, replicate, replicateM, reverse, scanlLazy, singleton, slice, snoc, span, stripPrefix, tail, take, takeWhile, transpose, uncons, union, unionBy, unzip, updateAt, zip, zipWith, zipWithA, (!!), (..), (:), (\\))
import Data.List.Lazy (List, Pattern(..), alterAt, catMaybes, concat, concatMap, cons, delete, deleteAt, deleteBy, drop, dropWhile, elemIndex, elemLastIndex, filter, filterM, findIndex, findLastIndex, foldM, foldMap, foldl, foldr, foldrLazy, fromFoldable, group, groupBy, head, init, insert, insertAt, insertBy, intersect, intersectBy, iterate, last, length, mapMaybe, modifyAt, nil, nub, nubBy, nubEq, nubByEq, null, partition, range, repeat, replicate, replicateM, reverse, scanlLazy, singleton, slice, snoc, span, stripPrefix, tail, take, takeWhile, transpose, uncons, union, unionBy, unzip, updateAt, zip, zipWith, zipWithA, (!!), (..), (:), (\\))
import Data.List.Lazy.NonEmpty as NEL
import Data.Maybe (Maybe(..), isNothing, fromJust)
import Data.Monoid.Additive (Additive(..))
Expand Down Expand Up @@ -332,8 +334,15 @@ testListLazy = do
assert $ nub (l [1, 2, 2, 3, 4, 1]) == l [1, 2, 3, 4]

log "nubBy should remove duplicate items from the list using a supplied predicate"
let nubPred = \x y -> if odd x then false else x == y
assert $ nubBy nubPred (l [1, 2, 2, 3, 3, 4, 4, 1]) == l [1, 2, 3, 3, 4, 1]
let nubPred = compare `on` Array.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test which ensures that nub behaves sensibly on infinite lists please? Perhaps

log "nub should not consume more of the input list than necessary"
assert $ (take 3 $ nub $ cycle $ l [1,2,3]) == l [1,2,3]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test be applied to all nub* functions, or just nub? I just added the single test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s worth doing it for nub and nubEq but we probably don’t need it for the *By variants, as if it works for nub it’ll probably work for nubBy too.

assert $ nubBy nubPred (l [[1],[2],[3,4]]) == l [[1],[3,4]]

log "nubEq should remove duplicate elements from the list, keeping the first occurence"
assert $ nubEq (l [1, 2, 2, 3, 4, 1]) == l [1, 2, 3, 4]

log "nubByEq should remove duplicate items from the list using a supplied predicate"
let mod3eq = eq `on` \n -> mod n 3
assert $ nubByEq mod3eq (l [1, 3, 4, 5, 6]) == l [1, 3, 5]

log "union should produce the union of two lists"
assert $ union (l [1, 2, 3]) (l [2, 3, 4]) == l [1, 2, 3, 4]
Expand Down
Loading