Skip to content

Commit

Permalink
Merge pull request #235 from UC-Davis-molecular-computing/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
dave-doty authored Jul 23, 2023
2 parents 892dee4 + ed35f23 commit 3a66f82
Show file tree
Hide file tree
Showing 5 changed files with 90 additions and 123 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,9 +138,9 @@ In more detail, there are five main types of objects you create to describe your
- `Constraint`: There are several kinds of constraint objects. Not all of them are related in the type hierarchy.

- **"hard" constraints on Domain sequences:**
These are the strictest constraints, which do not even allow certain `Domain` sequences to be considered. They are applied by a `DomainPool` before allowing a sequence to be returned from `DomainPool.generate_sequence()`. These are of two types: `NumpyConstraint` and `SequenceConstraint`. Each of them indicates whether a DNA sequence is allowed or not; for instance a constraint forbidding 4 G's in a row would permit AGGGTT but forbid AGGGGT. The difference between them is that a `NumpyConstraint` operates on many DNA sequences at a time, representing them as a 2D numpy byte array (e.g., a 1000 × 15 array of bytes to represent 1000 sequences, each of length 15), and for operations that numpy is suited for, can evaluate these constraints *much* faster than the equivalent Python code that would loop over each sequence individually. However, if you have a constraint that is not straightforward to express using numpy operations, then a `SequenceConstraint` can be used to express it in plain Python. A `SequenceConstraint` is simply a type alias for a Python function that takes a string as input representing the DNA sequence and returns a Boolean indicating whether the sequence satisfies the constraint. Due to the speed of numpy, it is advised to use `SequenceConstraint`'s only if necessary because it cannot be expressed as a `NumpyConstraint`.
These are the strictest constraints, which do not even allow certain `Domain` sequences to be considered, known as "filters". They are applied by a `DomainPool` before allowing a sequence to be returned from `DomainPool.generate_sequence()`, which is the method called whenever the search algorithm wants to try a new DNA sequence for a `Domain`. These are of two types of filters: `NumpyFilter` and `SequenceFilter`. Each of them indicates whether a DNA sequence is allowed or not; for instance a filter forbidding 4 G's in a row would permit AGGGTT but forbid AGGGGT. The difference between them is that a `NumpyFilter` operates on many DNA sequences at a time, representing them as a 2D numpy byte array (e.g., a 1000 × 15 array of bytes to represent 1000 sequences, each of length 15), and for operations that numpy is suited for, can evaluate these filters *much* faster than the equivalent Python code that would loop over each sequence individually. However, if you have a filter that is not straightforward to express using numpy operations, then a `SequenceFilter` can be used to express it in plain Python. A `SequenceFilter` is simply a type alias for a Python function that takes a string as input representing the DNA sequence and returns a Boolean indicating whether the sequence satisfies the filter. Due to the speed of numpy, it is advised to use `SequenceFilter`'s only if necessary because it cannot be expressed as a `NumpyFilter`.

- **"soft" constraints:** All other constraints are subclasses of the abstract superclass `Constraint`. These constrains are "softer": sequences violating the constraints are allowed to be assigned to `Domain`'s. The sequence design algorithm steadily improves the design by changing sequences until all of these constraints are satisfied. The different subtypes of the base class `Constraint` correspond to different parts of the `Design` that are being evaluated by the `Constraint`. The types are:
- **"soft" constraints:** All other constraints are subclasses of the abstract superclass `Constraint`. These constrains are "softer" than filters as described above: sequences violating the constraints are allowed to be assigned to `Domain`'s. The sequence design algorithm steadily improves the design by changing sequences until all of these constraints are satisfied. The different subtypes of the base class `Constraint` correspond to different parts of the `Design` that are being evaluated by the `Constraint`. The types are:

- `SingularConstraint`: This is an abstract superclass of the following concrete subclasses. The difference with the other abstract superclass `BulkConstraint` is explained in `BulkConstraint` below.

Expand Down Expand Up @@ -175,7 +175,7 @@ In more detail, there are five main types of objects you create to describe your

## Constraint evaluations must be pure functions of their inputs

For all constraints, it is critical that the `evaluate` or `evaluate_bulk` functions be *pure* functions of their inputs: the return value should depend only on the parameters passed to the function. For example, a `StrandPairConstraint` takes two strands as input, and its `(excess, summary)` return values should depend *only* on those two strands. Similarly, a `StrandsConstraint`, whose `evaluate_bulk` function takes a list of strands as input, should return a list of tuples, where each tuple represents a violation of a strand that depends only on that strand. This is required because nuad does an optimization in which constraints are only evaluated if they depend on parts of the design that contain the domain(s) that changed in the current iteration.
For all constraints, it is critical that the `evaluate` or `evaluate_bulk` functions be *pure* functions of their inputs: the return value should depend only on the parameters passed to the function. For example, a `StrandPairConstraint` takes two strands as input, and its `Result` return values should depend *only* on those two strands. Similarly, a `StrandsConstraint`, whose `evaluate_bulk` function takes a list of strands as input, should return a list of tuples, where each tuple represents a violation of a strand that depends only on that strand. This is required because nuad does an optimization in which constraints are only evaluated if they depend on parts of the design that contain the domain(s) that changed in the current iteration.

For example, suppose there are 100 strands, but only 3 strands contain the domain `x`, and `x` is the domain whose DNA sequence is changed in the current search iteration. Then each `StrandConstraint` `s` will be evaluated only on those 3 strands, on the assumption that the other 97 strands would have the same output of the function `s.evaluate` as before.

Expand Down
24 changes: 17 additions & 7 deletions notebooks/Untitled.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,33 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 20,
"id": "cf3567b8-b41b-4ce0-aa83-f8cbd4dd45b3",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"(0.796078431372549, 0.3764705882352941, 0.08235294117647059)"
"<Figure size 1296x576 with 1 Axes>"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"(203/255, 96/255, 21/255)"
"import nuad.np as nn\n",
"import matplotlib.pyplot as plt\n",
"\n",
"s = nn.DNASeqList(length=21, num_random_seqs=10**5)\n",
"energies = s.energies(37)\n",
"# print(f'{min(energies)=}')\n",
"# print(f'{max(energies)=}')\n",
"plt.figure(figsize=(18,8))\n",
"_ = plt.hist(energies, bins=20)"
]
},
{
Expand Down Expand Up @@ -82,7 +92,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.8.16"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion nuad/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
version = '0.4.2' # version line; WARNING: do not remove or change this line or comment
version = '0.4.3' # version line; WARNING: do not remove or change this line or comment
Loading

0 comments on commit 3a66f82

Please sign in to comment.