Skip to content

Commit

Permalink
Merge pull request #61 from STRIDES/rework-readme-structure
Browse files Browse the repository at this point in the history
added tutorials dir back to accomodate ms flows emails for a few more months
  • Loading branch information
zbyosufzai authored Mar 22, 2024
2 parents bd70319 + 6ca15c4 commit 0006ec8
Show file tree
Hide file tree
Showing 21 changed files with 6,619 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/create_athena_database.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Click **Next**

## Query the SRA metadata using Athena

You can query the SRA database directly in the Athena user interface or you can use the API to query via a Jupyter Notebook. We recommend the Jupyter notebook approach, and provide an example [here](/tutorials/notebooks/SRADownload), as well as [these examples](https://github.com/ncbi/ASHG-Workshop-2021) produced by the SRA team. In that GitHub repo, you can view notebook 2 and adapt it from BigQuery to Athena, and then notebook 3 is a great example or different kinds of Athena queries you can run. If you want to use the Athena console directly, we recommend learning the SQL query structure from our notebook or the SRA team ones, then using this [AWS guide](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) to how to search directly in Athena. Skip to #3 since we have already done #1-2 above.
You can query the SRA database directly in the Athena user interface or you can use the API to query via a Jupyter Notebook. We recommend the Jupyter notebook approach, and provide an example [here](/tutorials/notebooks/SRADownload/SRA-Download.ipynb), as well as [these examples](https://github.com/ncbi/ASHG-Workshop-2021) produced by the SRA team. In that GitHub repo, you can view notebook 2 and adapt it from BigQuery to Athena, and then notebook 3 is a great example or different kinds of Athena queries you can run. If you want to use the Athena console directly, we recommend learning the SQL query structure from our notebook or the SRA team ones, then using this [AWS guide](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) to how to search directly in Athena. Skip to #3 since we have already done #1-2 above.



Expand Down
96 changes: 96 additions & 0 deletions tutorials/README.md

Large diffs are not rendered by default.

202 changes: 202 additions & 0 deletions tutorials/notebooks/ElasticBLAST/run_elastic_blast.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8c3f3bb2",
"metadata": {},
"source": [
"# Run ElasticBLAST using AWS Batch"
]
},
{
"cell_type": "markdown",
"id": "aee3b229",
"metadata": {},
"source": [
"This notebook is based on the [this tutorial](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-aws.html). Make sure you select a kernel with Python 3.7 for the Elastic BLAST install. One good option is `conda_mxnet_latest_p37`. "
]
},
{
"cell_type": "markdown",
"id": "38dfb579",
"metadata": {},
"source": [
"### 1) Install elastic blast"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d96bb988",
"metadata": {},
"outputs": [],
"source": [
"!pip3 install elastic-blast"
]
},
{
"cell_type": "markdown",
"id": "684e79f6",
"metadata": {},
"source": [
"Test your install, it should print out a version and full help menu."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2aa11ccc",
"metadata": {},
"outputs": [],
"source": [
"!elastic-blast --version\n",
"!elastic-blast --help"
]
},
{
"cell_type": "markdown",
"id": "58b59cb0",
"metadata": {},
"source": [
"### 2) Optionally, create a bucket for this tutorial if one does not yet exist"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "319ff226",
"metadata": {},
"outputs": [],
"source": [
"!aws s3 mb s3://elasticblast-sagemaker"
]
},
{
"cell_type": "markdown",
"id": "449d7511",
"metadata": {},
"source": [
"### 3) Create a config file that defines the job parameters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b578c1ea",
"metadata": {},
"outputs": [],
"source": [
"!touch BDQA.ini"
]
},
{
"cell_type": "markdown",
"id": "a1b0a866",
"metadata": {},
"source": [
"Open the config file and add the following:\n",
"```\n",
"[cloud-provider]\n",
"aws-region = us-east-1\n",
"aws-vpc = vpc-0eaafe0236e351a36\n",
"aws-subnet = subnet-043d7614ae5dc30c9\n",
"aws-key-pair = cloud-lab-testing\n",
"\n",
"[cluster]\n",
"num-nodes = 3\n",
"labels = owner=ec2-user\n",
"\n",
"[blast]\n",
"program = blastp\n",
"db = refseq_protein\n",
"queries = s3://elasticblast-test/queries/BDQA01.1.fsa_aa\n",
"results = s3://elasticblast-sagemaker/results/BDQA\n",
"options = -task blastp-fast -evalue 0.01 -outfmt \"7 std sskingdoms ssciname\"\n",
"```\n",
"\n",
"You can add additional configuration values from [this guide](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/configuration.html). If you need to run this a few times, make sure you either rename the ouput folder, or delete the results folder from the S3 bucket. If you are using your own data, make sure to modify the database and the S3 queries path."
]
},
{
"cell_type": "markdown",
"id": "9a9f8192",
"metadata": {},
"source": [
"### 4) Submit the job"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "398253e8",
"metadata": {},
"outputs": [],
"source": [
"!elastic-blast submit --cfg BDQA.ini"
]
},
{
"cell_type": "markdown",
"id": "9a8e7716",
"metadata": {},
"source": [
"### 5) Check results and troubleshoot"
]
},
{
"cell_type": "markdown",
"id": "94a43c5e",
"metadata": {},
"source": [
"+ You can monitor the job initially by going to `CloudFormation` and viewing the events tab of the elastic blast stack. If there is an error, you should be able to pinpoint it in these event logs.\n",
"+ You can view the progress by going to `AWS Batch`, select the Job queue that begins with `elasticblast`, and then make sure jobs are moving from Runnable to Running to Succeeded. The number of jobs that run together will be the number of nodes you selected in the config file. To run more jobs at once, increase the `cluster` parameter `num-nodes`. \n",
"+ Finally, to view your outputs, look at the files in your S3 output bucket, something like `aws s3 ls s3://elasticblast-sagemaker/results/BDQA/`."
]
},
{
"cell_type": "markdown",
"id": "292947f1-5247-4da5-81bd-7fc8fc420ca4",
"metadata": {},
"source": [
"### 6) Clean up cloud resources"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e677ba64-38e0-49d4-919b-4bb51de83cdd",
"metadata": {},
"outputs": [],
"source": [
"!elastic-blast delete --cfg BDQA.ini"
]
}
],
"metadata": {
"environment": {
"kernel": "python3",
"name": "common-cpu.m93",
"type": "gcloud",
"uri": "gcr.io/deeplearning-platform-release/base-cpu:m93"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 0006ec8

Please sign in to comment.