Skip to content

Commit

Permalink
beta release
Browse files Browse the repository at this point in the history
  • Loading branch information
hsnguyen committed Aug 14, 2020
2 parents be18702 + 5658552 commit fe90e8b
Show file tree
Hide file tree
Showing 61 changed files with 7,142 additions and 1,557 deletions.
Binary file added docs/npgraph.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 5 additions & 9 deletions docs/npgraph.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
# *npGraph* - Resolve assemblgraph in real-time using nanopore data
# *npGraph* - Resolve assembly graph in real-time using nanopore data
This is another real-time scaffolder beside [npScarf](https://github.com/mdcao/npScarf). Instead of using contig sequences as pre-assemblies, this tool is able to work on assembly graph (from [SPAdes](http://cab.spbu.ru/software/spades/)).
The batch algorithm has been implemented in hybrid assembler module of [Unicycler](https://github.com/rrwick/Unicycler) and others.

<p align="center">
<img src="http://drive.google.com/uc?export=view&id=1eGn-FfDoLHPMbt4i_awFXF-DYDe36GoR" alt="npGraph"/>
</p>
![npGraph GUI](npgraph.gif)

## Introduction
*npScarf* is the real-time hybrid assembler that use the stream of long reads to bridge the Illumina contigs together, expecting to give more complete genome sequences while the sequencing process is still ongoing. The pipeline has been applied sucessfully for microbial genomics and even bigger data sets. However, due to its greedy approach over the noisy data, it is difficult to eliminate all mis-assemblies without further pre-processing and parameter tuning. To help prevent this issue, the assembly graph - bulding block graph structure for the contigs - should be used as the source for bridging algorithm.
Expand Down Expand Up @@ -83,7 +81,7 @@ More features would be added later to the GUI but it's not the focus of this pro
All settings from the GUI can be set beforehand via commandline interface.
Without using GUI, the mandatory inputs are assembly graph file (*-si*) and long-read data (*-li*).
The assembly graph must be output from SPAdes in either FASTG or GFA format (normally *assembly_graph.fastg* or *assembly_graph.gfa*).
From new version of SPAdes, the output GFA file is *assembly_graph_with_scaffolds.gfa* which includes SPAdes path finding and scaffolding results. Sometimes, this might give additional mis-assemblies so the original graph of the building-block contigs (fastg file) is preferred.
From new version of SPAdes, the output GFA file is *assembly_graph_with_scaffolds.gfa* which includes SPAdes path finding and scaffolding results. Sometimes, this might give additional mis-assemblies so the original graph of the building-block contigs is preferred.

The long-read data will be used for bridging and can be given as DNA sequences (FASTA/FASTQ format, possible .gz) or alignment records (SAM/BAM) as mentioned above. *npGraph* will try to guess the format of the inputs based on the extensions, but sometimes you'll have to specify it yourself (e.g. when "-" is provided to read from *stdin*).
If the sequences are given, then it's mandatory to have either minimap2 (recommended) or BWA-MEM installed in your system to do the alignment between long reads and the pre-assemblies.
Expand All @@ -96,7 +94,7 @@ or if the graph file is GFA v1 we can use
```
awk '/^S/{print ">"$2; print $3;}' assembly_graph.gfa | fold > assembly_graph.fasta
```
Note that GFA file from SPAdes is preferred over FASTG since the former gives hint about the k-mer parameter and others, also it is becoming the standard for assembly graph that adapted by many other software.
Note that GFA format from SPAdes is preferred over FASTG since the former gives hint about the k-mer parameter and others, also it is becoming the standard for assembly graph that adapted by many other software.
And then you can generate SAM/BAM file with our recommended parameters:
```
minimap2 -t16 -k15 -w5 -a assembly_graph.fasta nnp.fastq ...
Expand Down Expand Up @@ -150,9 +148,7 @@ awk -F'[:;]' -v q="'" '/^>/{if(index($1,q) ==0 ) flag=1; else flag=0;} {if(flag)

Below is how it looked like using *npGraph* with a mock community of 11 species from PoreCamp.

<p align="center">
<img src="http://drive.google.com/uc?export=view&id=1c29S6cSNwEg9JpXcy28ngo8bFsuF2SXi" alt="npGraph"/>
</p>
![Metagenomics](npgraph_pc.gif)


### Note
Expand Down
Binary file added docs/npgraph_pc.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/npgraph_pc_short.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
153 changes: 118 additions & 35 deletions pom.xml
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>np</groupId>
<artifactId>assembly</artifactId>
<version>0.1.1-SNAPSHOT</version>
<version>0.2.1-beta</version>
<packaging>jar</packaging>

<name>assembly</name>
Expand All @@ -14,6 +15,8 @@
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.release>11</maven.compiler.release>
<javafx.version>13</javafx.version>
<log4j.version>2.11.2</log4j.version>
<disruptor.version>3.4.2</disruptor.version>
</properties>

<repositories>
Expand Down Expand Up @@ -45,6 +48,7 @@
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
</dependency>
<!-- GraphStream library, javafx supported UI -->
<dependency>
<groupId>com.github.graphstream</groupId>
<artifactId>gs-algo</artifactId>
Expand All @@ -57,22 +61,35 @@
<groupId>com.github.graphstream</groupId>
<artifactId>gs-core</artifactId>
</dependency>
<!--<dependency>
<groupId>org.graphstream</groupId>
<artifactId>gs-ui</artifactId>
</dependency>-->

<dependency>
<groupId>javax.json</groupId>
<artifactId>javax.json-api</artifactId>
</dependency>

<!-- slf4j binding with log4j2
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
</dependency>
-->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
</dependency>
<dependency>
<groupId>com.lmax</groupId>
<artifactId>disruptor</artifactId>
</dependency>

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
Expand All @@ -93,24 +110,57 @@
<dependency>
<groupId>org.openjfx</groupId>
<artifactId>javafx-controls</artifactId>
<version>${javafx.version}</version>
</dependency>
<dependency>
<groupId>org.openjfx</groupId>
<artifactId>javafx-fxml</artifactId>
<version>${javafx.version}</version>
</dependency>
<!-- For gRPC -->
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty-shaded</artifactId>
<version>1.28.0</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-protobuf</artifactId>
<version>1.28.0</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-stub</artifactId>
<version>1.28.0</version>
</dependency>

<dependency>
<groupId>javax.annotation</groupId>
<artifactId>javax.annotation-api</artifactId>
<version>1.3.2</version>
</dependency>

</dependencies>

<build>
<resources>
<resource>
<directory>resources/icons</directory>
</resource>
<resource>
<directory>resources/css</directory>
</resource>
</resources>
<resources>
<resource>
<directory>resources/icons</directory>
</resource>
<resource>
<directory>resources/css</directory>
</resource>
<resource>
<directory>resources</directory>
</resource>
</resources>

<extensions>
<extension>
<groupId>kr.motd.maven</groupId>
<artifactId>os-maven-plugin</artifactId>
<version>1.6.2</version>
</extension>
</extensions>

<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
Expand All @@ -124,6 +174,12 @@

<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>foo.bar.Generate</mainClass>
<manifestEntries>
<Multi-Release>true</Multi-Release>
</manifestEntries>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
<filters>
Expand Down Expand Up @@ -155,10 +211,32 @@
<release>${maven.compiler.release}</release>
</configuration>
</plugin>


<!-- Uncomment to generate gRPC code
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocArtifact>com.google.protobuf:protoc:3.11.0:exe:${os.detected.classifier}</protocArtifact>
<pluginId>grpc-java</pluginId>
<pluginArtifact>io.grpc:protoc-gen-grpc-java:1.28.0:exe:${os.detected.classifier}</pluginArtifact>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>compile-custom</goal>
</goals>
</execution>
</executions>
</plugin>
-->

</plugins>

</build>

<dependencyManagement>
<dependencies>
<dependency>
Expand Down Expand Up @@ -202,30 +280,35 @@
<artifactId>gs-core</artifactId>
<version>2.0-alpha</version>
</dependency>
<!--<dependency>
<groupId>org.graphstream</groupId>
<artifactId>gs-ui</artifactId>
<version>1.3</version>
</dependency> -->

<dependency>
<groupId>javax.json</groupId>
<artifactId>javax.json-api</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.25</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.25</version>
</dependency>

<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j.version}</version>
</dependency>

<!-- https://logging.apache.org/log4j/2.x/manual/async.html -->
<dependency>
<groupId>com.lmax</groupId>
<artifactId>disruptor</artifactId>
<version>${disruptor.version}</version>
</dependency>

<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>18.0</version>
<version>29.0-jre</version>
</dependency>
<dependency>
<groupId>com.github.samtools</groupId>
Expand Down
36 changes: 36 additions & 0 deletions resources/log4j2.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
<Appenders>
<Console name="LogToConsole" target="SYSTEM_OUT">
<PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"/>
</Console>

<RollingRandomAccessFile name="LogToRollingRandomAccessFile" fileName="logs/app.log"
filePattern="logs/$${date:yyyy-MM}/app-%d{MM-dd-yyyy}-%i.log.gz">
<PatternLayout>
<Pattern>%d %p %c{1.} [%t] %m%n</Pattern>
</PatternLayout>
<Policies>
<OnStartupTriggeringPolicy/>
<SizeBasedTriggeringPolicy size="100 MB"/>
</Policies>
<DefaultRolloverStrategy max="10"/>
</RollingRandomAccessFile>

</Appenders>
<Loggers>

<!-- asynchronous loggers -->
<AsyncLogger name="org.rtassembly" level="debug" additivity="false">
<AppenderRef ref="LogToRollingRandomAccessFile"/>
<!--
<AppenderRef ref="LogToConsole"/>
-->
</AsyncLogger>

<!-- synchronous loggers -->
<Root level="info">
<AppenderRef ref="LogToConsole"/>
</Root>
</Loggers>
</Configuration>
Binary file added scripts/.fastg_to_fasta.sh.un~
Binary file not shown.
2 changes: 1 addition & 1 deletion scripts/fastg_to_fasta.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@
if [ "$#" -ne 1 ]; then
echo "Usage: $0 <input.fastg>"
else
awk -F '[:;]' -v q=\' 'BEGIN{flag=0;}/^>/{if(index($1,q)!=0) flag=0; else flag=1;}{if(flag==1) print $1;}' assembly_graph.fastg
awk -F '[:;]' -v q=\' 'BEGIN{flag=0;}/^>/{if(index($1,q)!=0) flag=0; else flag=1;}{if(flag==1) print $1;}' $1
fi

Loading

0 comments on commit fe90e8b

Please sign in to comment.