Merge pull request #61 from STRIDES/rework-readme-structure

added tutorials dir back to accomodate ms flows emails for a few more months
STRIDES · Mar 22, 2024 · 0006ec8 · 0006ec8
2 parents bd70319 + 6ca15c4
commit 0006ec8
Show file tree

Hide file tree

Showing 21 changed files with 6,619 additions and 1 deletion.
diff --git a/docs/create_athena_database.md b/docs/create_athena_database.md
@@ -62,7 +62,7 @@ Click **Next**
 
 ## Query the SRA metadata using Athena
 
-You can query the SRA database directly in the Athena user interface or you can use the API to query via a Jupyter Notebook. We recommend the Jupyter notebook approach, and provide an example [here](/tutorials/notebooks/SRADownload), as well as [these examples](https://github.com/ncbi/ASHG-Workshop-2021) produced by the SRA team. In that GitHub repo, you can view notebook 2 and adapt it from BigQuery to Athena, and then notebook 3 is a great example or different kinds of Athena queries you can run. If you want to use the Athena console directly, we recommend learning the SQL query structure from our notebook or the SRA team ones, then using this [AWS guide](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) to how to search directly in Athena. Skip to #3 since we have already done #1-2 above. 
+You can query the SRA database directly in the Athena user interface or you can use the API to query via a Jupyter Notebook. We recommend the Jupyter notebook approach, and provide an example [here](/tutorials/notebooks/SRADownload/SRA-Download.ipynb), as well as [these examples](https://github.com/ncbi/ASHG-Workshop-2021) produced by the SRA team. In that GitHub repo, you can view notebook 2 and adapt it from BigQuery to Athena, and then notebook 3 is a great example or different kinds of Athena queries you can run. If you want to use the Athena console directly, we recommend learning the SQL query structure from our notebook or the SRA team ones, then using this [AWS guide](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) to how to search directly in Athena. Skip to #3 since we have already done #1-2 above. 
 
 
 

diff --git a/tutorials/README.md b/tutorials/README.md
diff --git a/tutorials/notebooks/ElasticBLAST/run_elastic_blast.ipynb b/tutorials/notebooks/ElasticBLAST/run_elastic_blast.ipynb
@@ -0,0 +1,202 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8c3f3bb2",
+   "metadata": {},
+   "source": [
+    "# Run ElasticBLAST using AWS Batch"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aee3b229",
+   "metadata": {},
+   "source": [
+    "This notebook is based on the [this tutorial](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-aws.html). Make sure you select a kernel with Python 3.7 for the Elastic BLAST install. One good option is `conda_mxnet_latest_p37`. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38dfb579",
+   "metadata": {},
+   "source": [
+    "### 1) Install elastic blast"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d96bb988",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip3 install elastic-blast"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "684e79f6",
+   "metadata": {},
+   "source": [
+    "Test your install, it should print out a version and full help menu."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2aa11ccc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!elastic-blast --version\n",
+    "!elastic-blast --help"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58b59cb0",
+   "metadata": {},
+   "source": [
+    "### 2) Optionally, create a bucket for this tutorial if one does not yet exist"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "319ff226",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!aws s3 mb s3://elasticblast-sagemaker"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "449d7511",
+   "metadata": {},
+   "source": [
+    "### 3) Create a config file that defines the job parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b578c1ea",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!touch BDQA.ini"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1b0a866",
+   "metadata": {},
+   "source": [
+    "Open the config file and add the following:\n",
+    "```\n",
+    "[cloud-provider]\n",
+    "aws-region = us-east-1\n",
+    "aws-vpc = vpc-0eaafe0236e351a36\n",
+    "aws-subnet = subnet-043d7614ae5dc30c9\n",
+    "aws-key-pair = cloud-lab-testing\n",
+    "\n",
+    "[cluster]\n",
+    "num-nodes = 3\n",
+    "labels = owner=ec2-user\n",
+    "\n",
+    "[blast]\n",
+    "program = blastp\n",
+    "db = refseq_protein\n",
+    "queries = s3://elasticblast-test/queries/BDQA01.1.fsa_aa\n",
+    "results = s3://elasticblast-sagemaker/results/BDQA\n",
+    "options = -task blastp-fast -evalue 0.01 -outfmt \"7 std sskingdoms ssciname\"\n",
+    "```\n",
+    "\n",
+    "You can add additional configuration values from [this guide](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/configuration.html). If you need to run this a few times, make sure you either rename the ouput folder, or delete the results folder from the S3 bucket. If you are using your own data, make sure to modify the database and the S3 queries path."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a9f8192",
+   "metadata": {},
+   "source": [
+    "### 4) Submit the job"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "398253e8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!elastic-blast submit --cfg BDQA.ini"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a8e7716",
+   "metadata": {},
+   "source": [
+    "### 5) Check results and troubleshoot"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94a43c5e",
+   "metadata": {},
+   "source": [
+    "+ You can monitor the job initially by going to `CloudFormation` and viewing the events tab of the elastic blast stack. If there is an error, you should be able to pinpoint it in these event logs.\n",
+    "+ You can view the progress by going to `AWS Batch`, select the Job queue that begins with `elasticblast`, and then make sure jobs are moving from Runnable to Running to Succeeded. The number of jobs that run together will be the number of nodes you selected in the config file. To run more jobs at once, increase the `cluster` parameter `num-nodes`. \n",
+    "+ Finally, to view your outputs, look at the files in your S3 output bucket, something like `aws s3 ls s3://elasticblast-sagemaker/results/BDQA/`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "292947f1-5247-4da5-81bd-7fc8fc420ca4",
+   "metadata": {},
+   "source": [
+    "### 6) Clean up cloud resources"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e677ba64-38e0-49d4-919b-4bb51de83cdd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!elastic-blast delete --cfg BDQA.ini"
+   ]
+  }
+ ],
+ "metadata": {
+  "environment": {
+   "kernel": "python3",
+   "name": "common-cpu.m93",
+   "type": "gcloud",
+   "uri": "gcr.io/deeplearning-platform-release/base-cpu:m93"
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
Original file line number	Diff line number	Diff line change
Expand Up		@@ -62,7 +62,7 @@ Click Next

		## Query the SRA metadata using Athena

		You can query the SRA database directly in the Athena user interface or you can use the API to query via a Jupyter Notebook. We recommend the Jupyter notebook approach, and provide an example [here](/tutorials/notebooks/SRADownload), as well as [these examples](https://github.com/ncbi/ASHG-Workshop-2021) produced by the SRA team. In that GitHub repo, you can view notebook 2 and adapt it from BigQuery to Athena, and then notebook 3 is a great example or different kinds of Athena queries you can run. If you want to use the Athena console directly, we recommend learning the SQL query structure from our notebook or the SRA team ones, then using this [AWS guide](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) to how to search directly in Athena. Skip to #3 since we have already done #1-2 above.
		You can query the SRA database directly in the Athena user interface or you can use the API to query via a Jupyter Notebook. We recommend the Jupyter notebook approach, and provide an example [here](/tutorials/notebooks/SRADownload/SRA-Download.ipynb), as well as [these examples](https://github.com/ncbi/ASHG-Workshop-2021) produced by the SRA team. In that GitHub repo, you can view notebook 2 and adapt it from BigQuery to Athena, and then notebook 3 is a great example or different kinds of Athena queries you can run. If you want to use the Athena console directly, we recommend learning the SQL query structure from our notebook or the SRA team ones, then using this [AWS guide](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) to how to search directly in Athena. Skip to #3 since we have already done #1-2 above.



Expand Down