Skip to content

Commit

Permalink
added documentation for multi region overlap
Browse files Browse the repository at this point in the history
  • Loading branch information
Mittmich committed Mar 19, 2024
1 parent f8096d4 commit 49cb6de
Showing 1 changed file with 231 additions and 15 deletions.
246 changes: 231 additions & 15 deletions notebooks/query_engine_usage.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -135,7 +135,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -151,16 +151,16 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<spoc.query_engine.QueryPlan at 0x2177da313d0>"
"<spoc.query_engine.QueryPlan at 0x2c240ee4310>"
]
},
"execution_count": 7,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -174,12 +174,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The `.load_result` method of the `QueryResult` object can be executed using `.load_result`, which returns a `pd.DataFrame`. The resulting dataframe has additional columns that represent the regions, with which the input contacts overlapped."
"The `.compute` method of the `QueryPlan` object can be executed using `.compute()`, which returns a `pd.DataFrame`. The resulting dataframe has additional columns that represent the regions, with which the input contacts overlapped."
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 6,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -278,7 +278,7 @@
"2 400 0 "
]
},
"execution_count": 8,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -298,7 +298,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 7,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -360,7 +360,7 @@
"0 400 0 "
]
},
"execution_count": 9,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -393,8 +393,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting a subset of contacts at multiple genomic regions\n",
"The Overlap class is also capable of selecting contacts at multiple genomic regions. Here, the behavior of `Overlap` deviates from a simple filter, because if a given contact overlaps with multiple regions, it will be returned multiple times."
"## Selecting a subset of contacts at a set of genomic regions\n",
"The Overlap class is also capable of selecting contacts at a set of genomic regions. Here, the default behavior of `Overlap` deviates from a simple filter, because if a given contact overlaps with multiple regions in the set, it will be returned multiple times."
]
},
{
Expand All @@ -406,7 +406,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -419,7 +419,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 9,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -496,7 +496,7 @@
"1 200 1 "
]
},
"execution_count": 11,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -518,13 +518,229 @@
"In this example, the contact overlapping both regions is duplicated."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we, however, only want to filter contacts that overlap a given set of regions without duplicates being returned, we can pass the `add_overlap_columns` argument to the `Overlap` constructor to as `False`. This will only return the respective contacts that have been deduplicated:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>chrom_1</th>\n",
" <th>start_1</th>\n",
" <th>end_1</th>\n",
" <th>chrom_2</th>\n",
" <th>start_2</th>\n",
" <th>end_2</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>chr1</td>\n",
" <td>100</td>\n",
" <td>200</td>\n",
" <td>chr1</td>\n",
" <td>1000</td>\n",
" <td>2000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" chrom_1 start_1 end_1 chrom_2 start_2 end_2\n",
"0 chr1 100 200 chr1 1000 2000"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_steps = [\n",
" Overlap(target_regions, anchor_mode=Anchor(fragment_mode=\"ANY\", positions=[1]),\n",
" add_overlap_columns=False)\n",
"]\n",
"Query(query_steps=query_steps)\\\n",
" .build(contacts)\\\n",
" .compute()\\\n",
" .filter(regex=r\"chrom|start|end|id\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, only a a ingle contact is returned, without the corresponding region columns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The same functionality is implemented also for the pixels class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selecting a subset of contacts at multiple set of genomic regions\n",
"The `Overlap` class capable of selecting conatcts and pixels base don overlap with multiple sets of genomic regions. Here, the `Anchor` class specifies how the differnet overlap possibilities should be handled: The `fragment_mode` parameter specifies whether we require all (value `ALL`) fragments to overlap, or whether we require any (value `ANY`) fragments to overlap. The `region_mode` parameter then specifies whether we require the fragments to overlap any (`ANY`) of the passed sets of genomic regions or all (`ALL`). The `positions` parameter specifies - as in the case with a single set of genomic regions - which fragments we apply this logic to."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"target_regions = pd.DataFrame({\n",
" \"chrom\": ['chr1'],\n",
" \"start\": [110],\n",
" \"end\": [140],\n",
"})\n",
"\n",
"target_regions_2 = pd.DataFrame({\n",
" \"chrom\": ['chr1'],\n",
" \"start\": [1000],\n",
" \"end\": [1030],\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, we require that any of the fragments overlap all of the passed regions"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>chrom_1</th>\n",
" <th>start_1</th>\n",
" <th>end_1</th>\n",
" <th>chrom_2</th>\n",
" <th>start_2</th>\n",
" <th>end_2</th>\n",
" <th>region_chrom</th>\n",
" <th>region_start</th>\n",
" <th>region_end</th>\n",
" <th>region_id</th>\n",
" <th>region_chrom_1</th>\n",
" <th>region_start_1</th>\n",
" <th>region_end_1</th>\n",
" <th>region_id_1</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>chr1</td>\n",
" <td>100</td>\n",
" <td>200</td>\n",
" <td>chr1</td>\n",
" <td>1000</td>\n",
" <td>2000</td>\n",
" <td>chr1</td>\n",
" <td>110</td>\n",
" <td>140</td>\n",
" <td>0</td>\n",
" <td>chr1</td>\n",
" <td>1000</td>\n",
" <td>1030</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" chrom_1 start_1 end_1 chrom_2 start_2 end_2 region_chrom region_start \\\n",
"0 chr1 100 200 chr1 1000 2000 chr1 110 \n",
"\n",
" region_end region_id region_chrom_1 region_start_1 region_end_1 \\\n",
"0 140 0 chr1 1000 1030 \n",
"\n",
" region_id_1 \n",
"0 0 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_steps = [\n",
" Overlap([target_regions, target_regions_2],\n",
" anchor_mode=Anchor(fragment_mode=\"ANY\",region_mode='ALL')\n",
" )\n",
"]\n",
"Query(query_steps=query_steps)\\\n",
" .build(contacts)\\\n",
" .compute()\\\n",
" .filter(regex=r\"chrom|start|end|id\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This way we can filter for contacts between specific target regions, e.g. loop bases or promoters and enhancers."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down

0 comments on commit 49cb6de

Please sign in to comment.