SFDO-Tooling · prescod · Jun 8, 2022 · Apr 1, 2022 · Apr 2, 2022 · Apr 2, 2022
@@ -596,12 +596,67 @@ The `random_reference` property creates a reference to a random, existing row fr
 
 To create a reference, `random_reference` looks for a row created in the current iteration of the recipe and matching the specified object type or nickname. In the above recipe, each `random_reference` specified in `ownedBy` will point to one of the ten `Owner` objects created in the same iteration. If you iterate over the recipe multiple times, in other words, each `Pet` object will be matched with one of the ten `Owner` objects created during the same iteration.
 
-If `random_reference` finds no matches in the current iteration, it looks in previous iterations. This can happen, for example, when you try to create a reference to an object created with the `just_once` flag.
+If `random_reference` finds no matches in the current iteration, it looks in previous iterations. This can happen, for example, when you try to create a reference to an object created with the `just_once` flag. Snowfakery cannot currently generate a `random_reference` to a row that will be created in a future iteration of a recipe.
 
-Snowfakery cannot currently generate a `random_reference` to a row that will be created in a future iteration of a recipe.
+#### Unique random references
 
-Performance tip: Tables and nicknames that are referred to by `random_reference` are indexed, which makes them slightly slower to generate than normal. This should seldom be a problem in practice, but if you experience performance problems you could switch to a normal reference to see if that improves things.
+`random_reference` has a `unique` parameter which ensures that each target row is used only once.
+
+```yaml
+- object: Owner
+  count: 10
+  fields:
+    name:
+      fake: Name
+- object: Pet
+  count: 10
+  fields:
+    ownedBy:
+      random_reference: 
+        to: Owner
+        unique: True
+```
+
+In the case above, the relationship between Owners and Pets will be one-to-one in a random order, rather than a totally random distribution which would tend to have some Owners with multiple pets.
+
+In the case above, it is clear that the scope of the uniqueness should be the Pets, but in the case of join tables, like Salesforce's Campaign Member, this is ambiguous and must be specified like this:
+
+'''yaml
+# examples/salesforce/campaign-member.yml
+
+'''
 
+The `parent` parameter clarifies that the scope of the uniqueness is the local Contact.
+Each of the Contacts will have CampaignMembers that point to unique campaigns, like
+this:
+
+```json
+Campaign(id=1, Name=Campaign 0)
+Campaign(id=2, Name=Campaign 1)
+Campaign(id=3, Name=Campaign 2)
+Campaign(id=4, Name=Campaign 3)
+Campaign(id=5, Name=Campaign 4)
+Contact(id=1, FirstName=Catherine, LastName=Hanna)
+CampaignMember(id=1, ContactId=Contact(1), CampaignId=Campaign(2))
+CampaignMember(id=2, ContactId=Contact(1), CampaignId=Campaign(5))
+CampaignMember(id=3, ContactId=Contact(1), CampaignId=Campaign(3))
+CampaignMember(id=4, ContactId=Contact(1), CampaignId=Campaign(4))
+CampaignMember(id=5, ContactId=Contact(1), CampaignId=Campaign(1))
+Contact(id=2, FirstName=Mary, LastName=Valencia)
+CampaignMember(id=6, ContactId=Contact(2), CampaignId=Campaign(1))
+CampaignMember(id=7, ContactId=Contact(2), CampaignId=Campaign(4))
+CampaignMember(id=8, ContactId=Contact(2), CampaignId=Campaign(5))
+CampaignMember(id=9, ContactId=Contact(2), CampaignId=Campaign(2))
+CampaignMember(id=10, ContactId=Contact(2), CampaignId=Campaign(3))
+Contact(id=3, FirstName=Jake, LastName=Mullen)
+CampaignMember(id=11, ContactId=Contact(3), CampaignId=Campaign(1))
+CampaignMember(id=12, ContactId=Contact(3), CampaignId=Campaign(4))
+CampaignMember(id=13, ContactId=Contact(3), CampaignId=Campaign(3))
+CampaignMember(id=14, ContactId=Contact(3), CampaignId=Campaign(5))
+CampaignMember(id=15, ContactId=Contact(3), CampaignId=Campaign(2))
+```
+
+Performance tip: Tables and nicknames that are referred to by `random_reference` are indexed, which makes them slightly slower to generate than normal. This should seldom be a problem in practice, but if you experience performance problems you could switch to a normal reference to see if that improves things.
 ### `fake`
 
 The `fake` function generates fake data. This function is defined further in the [Fake Data Tutorial](fakedata.md)

@@ -0,0 +1,22 @@
+- object: Campaign
+  count: 5
+  fields:
+    Name: Campaign ${{child_index}}
+- object: Contact
+  count: 3
+  fields:
+    FirstName:
+      fake: FirstName
+    LastName:
+      fake: LastName
+  friends:
+    - object: CampaignMember
+      count: 5
+      fields:
+        ContactId:
+          reference: Contact
+        CampaignId:
+          random_reference:
+            to: Campaign
+            parent: Contact
+            unique: True
@@ -505,7 +505,7 @@ class RuntimeContext:
     current_template = None
     local_vars = None
     unique_context_identifier = None
-    recalculate_every_time = False
+    recalculate_every_time = False  # by default, data is recalculated constantly
 
     def __init__(
         self,
@@ -521,6 +521,7 @@ def __init__(
         self.parent = parent_context
         if self.parent:
             self._plugin_context_vars = self.parent._plugin_context_vars.new_child()
+            # are we in a re-calculate everything context?
             self.recalculate_every_time = parent_context.recalculate_every_time
         else:
             self._plugin_context_vars = ChainMap()

@@ -6,8 +6,10 @@
 from random import randint
 
 from snowfakery import data_gen_exceptions as exc
-from snowfakery.object_rows import LazyLoadedObjectReference
+from snowfakery.object_rows import LazyLoadedObjectReference, ObjectReference, ObjectRow
+from snowfakery.plugins import PluginResultIterator
 from snowfakery.utils.pickle import restricted_dumps, restricted_loads
+from snowfakery.utils.randomized_range import UpdatableRandomRange
 
 
 class RowHistory:
@@ -64,7 +66,7 @@ def save_row(self, tablename: str, nickname: T.Optional[str], row: dict):
             (row_id, nickname, nickname_id, data),
         )
 
-    def random_row_reference(self, name: str, scope: str, unique: bool):
+    def random_row_reference(self, name: str, scope: str, randint: callable):
         """Find a random row and load it"""
         if scope not in ("prior-and-current-iterations", "current-iteration"):
             raise exc.DataGenError(
@@ -95,8 +97,6 @@ def random_row_reference(self, name: str, scope: str, unique: bool):
                 self.already_warned = True
             min_id = 1
         elif nickname:
-            # nickname counters are reset every loop, so 1 is the right choice
-            # OR they are just_once in which case 
             min_id = self.local_counters.get(nickname, 0) + 1
         else:
             min_id = self.local_counters.get(tablename, 0) + 1
@@ -161,3 +161,61 @@ def _make_history_table(conn, tablename):
     c.execute(
         f'CREATE UNIQUE INDEX "{tablename}_nickname_id" ON "{tablename}" (nickname, nickname_id);'
     )
+
+
+class RandomReferenceContext(PluginResultIterator):
+    # initialize the object's state.
+    rng = None
+
+    def __init__(
+        self,
+        row_history: RowHistory,
+        to: str,
+        scope: str = "current-iteration",
+        unique: bool = False,
+    ):
+        self.row_history = row_history
+        self.to = to
+        self.scope = scope
+        self.unique = unique
+        if unique:
+            self.random_func = self.unique_random
+        else:
+            self.random_func = randint
+
+    def next(self) -> T.Union[ObjectReference, ObjectRow]:
+        try:
+            return self.row_history.random_row_reference(
+                self.to, self.scope, self.random_func
+            )
+        except StopIteration as e:
+            if self.random_func == self.unique_random:
+                raise exc.DataGenError(
+                    f"Cannot find an unused `{self.to}`` to link to"
+                ) from e
+            else:  # pragma: no cover
+                raise e
+
+    def unique_random(self, a, b):
+        """Goal: use an Uniquifying RNG until all of its values have been
+        used up, then make a new one, with higher values.
+
+        e.g. random_range(1,5) then random_range(5, 10)
+
+        The parent might call it like:
+        unique_random(1,2) -> random_range(1,3) -> 2
+        unique_random(1,4) -> random_range(1,3) -> 1
+        unique_random(1,6) -> random_range(3,7) -> 5  # reset
+        unique_random(1,8) -> random_range(3,7) -> 3
+        unique_random(1,10) -> random_range(3,7) -> 4
+        unique_random(1,12) -> random_range(3,7) -> 6
+        unique_random(1,14) -> random_range(7,14) -> 13 # reset
+        ...
+        """
+        b += 1  # randint uses top-inclusive semantics,
+        # random_range uses top-exclusive semantics
+        if self.rng is None:
+            self.rng = UpdatableRandomRange(a, b)
+        else:
+            self.rng.set_new_range(a, b)
+        return next(self.rng)
@@ -1,24 +1,24 @@
-import sys
 import random
-from functools import lru_cache
+import sys
+from ast import literal_eval
 from datetime import date, datetime
+from functools import lru_cache
+from typing import Any, List, Tuple, Union
+
 import dateutil.parser
 from dateutil.relativedelta import relativedelta
-from ast import literal_eval
-
-from typing import Union, List, Tuple, Any
-
 from faker import Faker
 from faker.providers.date_time import Provider as DateProvider
 
-from .data_gen_exceptions import DataGenError
-
 import snowfakery.data_generator_runtime  # noqa
-from snowfakery.plugins import SnowfakeryPlugin, PluginContext, lazy
-from snowfakery.object_rows import ObjectReference, ObjectRow
-from snowfakery.utils.template_utils import StringGenerator
-from snowfakery.standard_plugins.UniqueId import UniqueId
 from snowfakery.fakedata.fake_data_generator import UTCAsRelDelta, _normalize_timezone
+from snowfakery.object_rows import ObjectReference
+from snowfakery.plugins import PluginContext, SnowfakeryPlugin, lazy, memorable
+from snowfakery.row_history import RandomReferenceContext
+from snowfakery.standard_plugins.UniqueId import UniqueId
+from snowfakery.utils.template_utils import StringGenerator
+
+from .data_gen_exceptions import DataGenError
 
 FieldDefinition = "snowfakery.data_generator_runtime_object_model.FieldDefinition"
 
@@ -256,13 +256,15 @@ def choice(
                 probability = parse_weight_str(self.context, probability)
             return probability or when, pick
 
+        @memorable
         def random_reference(
             self,
             to: str,
             *,
+            parent: str = None,
             scope: str = "current-iteration",
             unique: bool = False,
-        ) -> Union[ObjectReference, ObjectRow]:
+        ) -> "RandomReferenceContext":
             """Select a random, already-created row from 'sobject'
 
             - object: Owner
@@ -278,12 +280,8 @@ def random_reference(
 
             See the docs for more info.
             """
-            if unique:
-                # next feature to implement
-                raise NotImplementedError()
-
-            return self.context.interpreter.row_history.random_row_reference(
-                to, scope, unique
+            return RandomReferenceContext(
+                self.context.interpreter.row_history, to, scope, unique
             )
 
         @lazy

@@ -0,0 +1,118 @@
+import typing as T
+import random
+import math
+
+
+class UpdatableRandomRange:
+    def __init__(self, start: int, stop: int = None):
+        assert stop > start
+        self.start = start
+        self._set_new_range_immediately(start, stop)
+
+    def set_new_top(self, new_top: int):
+        # do not replace RNG until old one is exhausted
+        assert new_top >= self.cur_stop
+        self.cur_stop = new_top
+
+    def set_new_range(self, new_bottom: int, new_top: int):
+        """Update the range subject to constraints
+
+        There are two modes:
+
+        If you update the range by changing only the top value,
+        the generator will finish generating the first list before
+        expanding its scope.
+
+        So if you configured it with range(0,10) and then
+        range(0,20) you would get
+
+        shuffle(list(range(0,10)) + shuffle(list(range(10,20))
+
+        Not:
+
+        shuffle(list(range(0,10) + list(range(10,20))
+
+        If you update the range by changing both values, the previous
+        generator is just discarded, because you presumably don't
+        want those values anymore. The new bottom must be higher
+        than the old top. This preserves the rule that no value is
+        ever produced twice.
+        """
+        if new_bottom == self.start:
+            self.set_new_top(new_top)
+        else:
+            assert new_bottom >= self.orig_stop, (new_bottom, self.orig_stop)
+            self._set_new_range_immediately(new_bottom, new_top)
+
+    def _set_new_range_immediately(self, new_bottom: int, new_top: int):
+        assert new_top > new_bottom
+        self.start = new_bottom
+        self.orig_stop = self.cur_stop = new_top
+        self.num_generator = random_range(self.start, self.orig_stop)
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        rv = next(self.num_generator, None)
+
+        if rv is not None:
+            return rv
+
+        if self.cur_stop <= self.orig_stop:
+            raise StopIteration()
+
+        self.start = self.orig_stop
+        self.num_generator = random_range(self.start, self.cur_stop)
+        self.orig_stop = self.cur_stop
+        return next(self.num_generator)
+
+
+def random_range(start: int, stop: int) -> T.Generator[int, None, None]:
+    """
+    Return a randomized "range" using a Linear Congruential Generator
+    to produce the number sequence. Parameters are the same as for
+    python builtin "range".
+        Memory  -- storage for 8 integers, regardless of parameters.
+        Compute -- at most 2*"maximum" steps required to generate sequence.
+    Based on https://stackoverflow.com/a/53551417/113477
+
+    # Set a default values the same way "range" does.
+    """
+    step = 1  # step is hard-coded to "1" because it seemed to be buggy
+    # and not important for our use-case
+
+    # Use a mapping to convert a standard range into the desired range.
+    def mapping(i):
+        return (i * step) + start
+
+    # Compute the number of numbers in this range.
+    maximum = (stop - start) // step
+
+    # Seed range with a random integer.
+    value = random.randint(0, maximum)
+    #
+    # Construct an offset, multiplier, and modulus for a linear
+    # congruential generator. These generators are cyclic and
+    # non-repeating when they maintain the properties:
+    #
+    #   1) "modulus" and "offset" are relatively prime.
+    #   2) ["multiplier" - 1] is divisible by all prime factors of "modulus".
+    #   3) ["multiplier" - 1] is divisible by 4 if "modulus" is divisible by 4.
+    #
+    offset = random.randint(0, maximum) * 2 + 1  # Pick a random odd-valued offset.
+    multiplier = (
+        4 * (maximum // 4) + 1
+    )  # Pick a multiplier 1 greater than a multiple of 4.
+    modulus = int(
+        2 ** math.ceil(math.log2(maximum))
+    )  # Pick a modulus just big enough to generate all numbers (power of 2).
+    # Track how many random numbers have been returned.
+    found = 0
+    while found < maximum:
+        # If this is a valid value, yield it in generator fashion.
+        if value < maximum:
+            found += 1
+            yield mapping(value)
+        # Calculate the next value in the sequence.
+        value = (value * multiplier + offset) % modulus