SFDO-Tooling · prescod · Jun 8, 2022 · Apr 1, 2022 · Apr 2, 2022 · Apr 2, 2022
@@ -596,12 +596,88 @@ The `random_reference` property creates a reference to a random, existing row fr
 
 To create a reference, `random_reference` looks for a row created in the current iteration of the recipe and matching the specified object type or nickname. In the above recipe, each `random_reference` specified in `ownedBy` will point to one of the ten `Owner` objects created in the same iteration. If you iterate over the recipe multiple times, in other words, each `Pet` object will be matched with one of the ten `Owner` objects created during the same iteration.
 
-If `random_reference` finds no matches in the current iteration, it looks in previous iterations. This can happen, for example, when you try to create a reference to an object created with the `just_once` flag.
+If `random_reference` finds no matches in the current iteration, it looks in previous iterations. This can happen, for example, when you try to create a reference to an object created with the `just_once` flag. Snowfakery cannot currently generate a `random_reference` to a row that will be created in a future iteration of a recipe.
 
-Snowfakery cannot currently generate a `random_reference` to a row that will be created in a future iteration of a recipe.
+#### Unique random references
 
-Performance tip: Tables and nicknames that are referred to by `random_reference` are indexed, which makes them slightly slower to generate than normal. This should seldom be a problem in practice, but if you experience performance problems you could switch to a normal reference to see if that improves things.
+`random_reference` has a `unique` parameter which ensures that each target row is used only once.
+
+```yaml
+- object: Owner
+  count: 10
+  fields:
+    name:
+      fake: Name
+- object: Pet
+  count: 10
+  fields:
+    ownedBy:
+      random_reference: 
+        to: Owner
+        unique: True
+```
+
+In the case above, the relationship between Owners and Pets will be one-to-one in a random order, rather than a totally random distribution which would tend to have some Owners with multiple pets.
 
+In the case above, it is clear that the scope of the uniqueness should be the Pets, but in the case of join tables, like Salesforce's Campaign Member, this is ambiguous and must be specified like this:
+
+```yaml
+# examples/salesforce/campaign-member.yml
+- object: Campaign
+  count: 5
+  fields:
+    Name: Campaign ${{child_index}}
+- object: Contact
+  count: 3
+  fields:
+    FirstName:
+      fake: FirstName
+    LastName:
+      fake: LastName
+  friends:
+    - object: CampaignMember
+      count: 5
+      fields:
+        ContactId:
+          reference: Contact
+        CampaignId:
+          random_reference:
+            to: Campaign
+            parent: Contact
+            unique: True
+```
+
+The `parent` parameter clarifies that the scope of the uniqueness is the local Contact.
+Each of the Contacts will have CampaignMembers that point to unique campaigns, like
+this:
+
+```sh
+Campaign(id=1, Name=Campaign 0)
+Campaign(id=2, Name=Campaign 1)
+Campaign(id=3, Name=Campaign 2)
+Campaign(id=4, Name=Campaign 3)
+Campaign(id=5, Name=Campaign 4)
+Contact(id=1, FirstName=Catherine, LastName=Hanna)
+CampaignMember(id=1, ContactId=Contact(1), CampaignId=Campaign(2))
+CampaignMember(id=2, ContactId=Contact(1), CampaignId=Campaign(5))
+CampaignMember(id=3, ContactId=Contact(1), CampaignId=Campaign(3))
+CampaignMember(id=4, ContactId=Contact(1), CampaignId=Campaign(4))
+CampaignMember(id=5, ContactId=Contact(1), CampaignId=Campaign(1))
+Contact(id=2, FirstName=Mary, LastName=Valencia)
+CampaignMember(id=6, ContactId=Contact(2), CampaignId=Campaign(1))
+CampaignMember(id=7, ContactId=Contact(2), CampaignId=Campaign(4))
+CampaignMember(id=8, ContactId=Contact(2), CampaignId=Campaign(5))
+CampaignMember(id=9, ContactId=Contact(2), CampaignId=Campaign(2))
+CampaignMember(id=10, ContactId=Contact(2), CampaignId=Campaign(3))
+Contact(id=3, FirstName=Jake, LastName=Mullen)
+CampaignMember(id=11, ContactId=Contact(3), CampaignId=Campaign(1))
+CampaignMember(id=12, ContactId=Contact(3), CampaignId=Campaign(4))
+CampaignMember(id=13, ContactId=Contact(3), CampaignId=Campaign(3))
+CampaignMember(id=14, ContactId=Contact(3), CampaignId=Campaign(5))
+CampaignMember(id=15, ContactId=Contact(3), CampaignId=Campaign(2))
+```
+
+Performance tip: Tables and nicknames that are referred to by `random_reference` are indexed, which makes them slightly slower to generate than normal. This should seldom be a problem in practice, but if you experience performance problems you could switch to a normal reference to see if that improves things.
 ### `fake`
 
 The `fake` function generates fake data. This function is defined further in the [Fake Data Tutorial](fakedata.md)

@@ -0,0 +1,22 @@
+- object: Campaign
+  count: 5
+  fields:
+    Name: Campaign ${{child_index}}
+- object: Contact
+  count: 3
+  fields:
+    FirstName:
+      fake: FirstName
+    LastName:
+      fake: LastName
+  friends:
+    - object: CampaignMember
+      count: 5
+      fields:
+        ContactId:
+          reference: Contact
+        CampaignId:
+          random_reference:
+            to: Campaign
+            parent: Contact
+            unique: True
@@ -505,7 +505,7 @@ class RuntimeContext:
     current_template = None
     local_vars = None
     unique_context_identifier = None
-    recalculate_every_time = False
+    recalculate_every_time = False  # by default, data is recalculated constantly
 
     def __init__(
         self,
@@ -521,6 +521,7 @@ def __init__(
         self.parent = parent_context
         if self.parent:
             self._plugin_context_vars = self.parent._plugin_context_vars.new_child()
+            # are we in a re-calculate everything context?
             self.recalculate_every_time = parent_context.recalculate_every_time
         else:
             self._plugin_context_vars = ChainMap()

@@ -6,8 +6,10 @@
 from random import randint
 
 from snowfakery import data_gen_exceptions as exc
-from snowfakery.object_rows import LazyLoadedObjectReference
+from snowfakery.object_rows import LazyLoadedObjectReference, ObjectReference, ObjectRow
+from snowfakery.plugins import PluginResultIterator
 from snowfakery.utils.pickle import restricted_dumps, restricted_loads
+from snowfakery.utils.randomized_range import UpdatableRandomRange
 
 
 class RowHistory:
@@ -64,7 +66,7 @@ def save_row(self, tablename: str, nickname: T.Optional[str], row: dict):
             (row_id, nickname, nickname_id, data),
         )
 
-    def random_row_reference(self, name: str, scope: str, unique: bool):
+    def random_row_reference(self, name: str, scope: str, randint: callable):
         """Find a random row and load it"""
         if scope not in ("prior-and-current-iterations", "current-iteration"):
             raise exc.DataGenError(
@@ -95,8 +97,6 @@ def random_row_reference(self, name: str, scope: str, unique: bool):
                 self.already_warned = True
             min_id = 1
         elif nickname:
-            # nickname counters are reset every loop, so 1 is the right choice
-            # OR they are just_once in which case 
             min_id = self.local_counters.get(nickname, 0) + 1
         else:
             min_id = self.local_counters.get(tablename, 0) + 1
@@ -161,3 +161,61 @@ def _make_history_table(conn, tablename):
     c.execute(
         f'CREATE UNIQUE INDEX "{tablename}_nickname_id" ON "{tablename}" (nickname, nickname_id);'
     )
+
+
+class RandomReferenceContext(PluginResultIterator):
+    # initialize the object's state.
+    rng = None
+
+    def __init__(
+        self,
+        row_history: RowHistory,
+        to: str,
+        scope: str = "current-iteration",
+        unique: bool = False,
+    ):
+        self.row_history = row_history
+        self.to = to
+        self.scope = scope
+        self.unique = unique
+        if unique:
+            self.random_func = self.unique_random
+        else:
+            self.random_func = randint
+
+    def next(self) -> T.Union[ObjectReference, ObjectRow]:
+        try:
+            return self.row_history.random_row_reference(
+                self.to, self.scope, self.random_func
+            )
+        except StopIteration as e:
+            if self.random_func == self.unique_random:
+                raise exc.DataGenError(
+                    f"Cannot find an unused `{self.to}`` to link to"
+                ) from e
+            else:  # pragma: no cover
+                raise e
+
+    def unique_random(self, a, b):
+        """Goal: use an Uniquifying RNG until all of its values have been
+        used up, then make a new one, with higher values.
+
+        e.g. random_range(1,5) then random_range(5, 10)
+
+        The parent might call it like:
+        unique_random(1,2) -> random_range(1,3) -> 2
+        unique_random(1,4) -> random_range(1,3) -> 1
+        unique_random(1,6) -> random_range(3,7) -> 5  # reset
+        unique_random(1,8) -> random_range(3,7) -> 3
+        unique_random(1,10) -> random_range(3,7) -> 4
+        unique_random(1,12) -> random_range(3,7) -> 6
+        unique_random(1,14) -> random_range(7,14) -> 13 # reset
+        ...
+        """
+        b += 1  # randint uses top-inclusive semantics,
+        # random_range uses top-exclusive semantics
+        if self.rng is None:
+            self.rng = UpdatableRandomRange(a, b)
+        else:
+            self.rng.set_new_range(a, b)
+        return next(self.rng)
@@ -1,24 +1,24 @@
-import sys
 import random
-from functools import lru_cache
+import sys
+from ast import literal_eval
 from datetime import date, datetime
+from functools import lru_cache
+from typing import Any, List, Tuple, Union
+
 import dateutil.parser
 from dateutil.relativedelta import relativedelta
-from ast import literal_eval
-
-from typing import Union, List, Tuple, Any
-
 from faker import Faker
 from faker.providers.date_time import Provider as DateProvider
 
-from .data_gen_exceptions import DataGenError
-
 import snowfakery.data_generator_runtime  # noqa
-from snowfakery.plugins import SnowfakeryPlugin, PluginContext, lazy
-from snowfakery.object_rows import ObjectReference, ObjectRow
-from snowfakery.utils.template_utils import StringGenerator
-from snowfakery.standard_plugins.UniqueId import UniqueId
 from snowfakery.fakedata.fake_data_generator import UTCAsRelDelta, _normalize_timezone
+from snowfakery.object_rows import ObjectReference
+from snowfakery.plugins import PluginContext, SnowfakeryPlugin, lazy, memorable
+from snowfakery.row_history import RandomReferenceContext
+from snowfakery.standard_plugins.UniqueId import UniqueId
+from snowfakery.utils.template_utils import StringGenerator
+
+from .data_gen_exceptions import DataGenError
 
 FieldDefinition = "snowfakery.data_generator_runtime_object_model.FieldDefinition"
 
@@ -256,13 +256,15 @@ def choice(
                 probability = parse_weight_str(self.context, probability)
             return probability or when, pick
 
+        @memorable
         def random_reference(
             self,
             to: str,
             *,
+            parent: str = None,
             scope: str = "current-iteration",
             unique: bool = False,
-        ) -> Union[ObjectReference, ObjectRow]:
+        ) -> "RandomReferenceContext":
             """Select a random, already-created row from 'sobject'
 
             - object: Owner
@@ -278,12 +280,8 @@ def random_reference(
 
             See the docs for more info.
             """
-            if unique:
-                # next feature to implement
-                raise NotImplementedError()
-
-            return self.context.interpreter.row_history.random_row_reference(
-                to, scope, unique
+            return RandomReferenceContext(
+                self.context.interpreter.row_history, to, scope, unique
             )
 
         @lazy