Skip to content

Commit

Permalink
describe check-transcripts step in readme/sbin/src
Browse files Browse the repository at this point in the history
  • Loading branch information
nvta1209 committed Dec 6, 2024
1 parent 348e5a1 commit 6fa5092
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 29 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ See 2A for nuclear transcripts and 2B for mitochondrial transcripts.
```
docker compose run ncbi-download
docker compose run uta-extract
[OPTIONAL] docker compose run uta-check-transcripts
docker compose run seqrepo-load
docker compose run uta-load
```
Expand All @@ -351,7 +352,7 @@ docker compose run uta-load
```

#### 2C. Manual splign transcripts
To load splign-manual transcripts, the workflow expects an input txdata.yaml file and splign alignments. Define this path
To load splign-manual transcripts, the workflow expects an input txdata.yaml file and splign alignments. Define this path
using the environment variable $UTA_SPLIGN_MANUAL_DIR. These file paths should exist:
- `$UTA_SPLIGN_MANUAL_DIR/splign-manual/txdata.yaml`
- `$UTA_SPLIGN_MANUAL_DIR/splign-manual/alignments/*.splign`
Expand Down
23 changes: 0 additions & 23 deletions docker-compose-alt.yml

This file was deleted.

10 changes: 10 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,16 @@ services:
interval: 10s
retries: 80
network_mode: host
uta-check-transcripts:
image: uta-update
command: sbin/uta-check-transcripts ${UTA_ETL_OLD_UTA_VERSION} /uta-check-transcripts/work /uta-check-transcripts/logs
depends_on:
uta:
condition: service_healthy
volumes:
- ${UTA_ETL_WORK_DIR}:/uta-check-transcripts/work
- ${UTA_ETL_LOG_DIR}:/uta-check-transcripts/logs
network_mode: host
uta-load:
image: uta-update
command: sbin/uta-load ${UTA_ETL_OLD_UTA_VERSION} ${UTA_ETL_NEW_UTA_VERSION} /ncbi-dir /uta-load/work /uta-load/logs
Expand Down
10 changes: 6 additions & 4 deletions sbin/uta-check-transcripts
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
#!/usr/bin/env bash

# This script ...
# Find transcripts in the current UTA database version which are not in the txinfo file,
# and write those transcripts to the check-transcripts.txt file.
#
# source_uta_v is the UTA version before the update procedure.
# uta-extract, which produces the needed txinfo file, must run before this script.
# Any action taken with respect to the identified transcripts is case-dependent and optional.
#
# source_uta_v is the current UTA database version.
# working_dir stores input and output files.
# log_dir stores log files.

Expand All @@ -20,7 +24,5 @@ fi

mkdir -p "$log_dir"

# Report transcripts which have already loaded into UTA but are not part of the incoming transcript set.
# These transcripts are at risk of missing updates to information such as transl_except. ???
UTA_USE_SCHEMA=false uta --conf=etc/global.conf --conf=etc/[email protected] check-transcripts --prefixes=NM,NR "$working_dir/txinfo.gz" "$source_uta_v" "$working_dir/check-transcripts.txt" 2>&1 | \
tee "$log_dir/check-transcripts.log"
3 changes: 2 additions & 1 deletion src/uta/loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,8 @@ def analyze(session, opts, cf):

def check_transcripts(session: Session, opts: Dict, cf: ConfigParser):
"""
Find transcripts in the given UTA database version which are not in the given txinfo file,
and write those transcripts to the specified file.
"""
# required opts
txinfo_file = opts['TXINFO_FILE']
Expand Down

0 comments on commit 6fa5092

Please sign in to comment.