Skip to content

wenxi-yue/CMTNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[TMI2023] Cascade Multi-Level Transformer Network for Surgical Workflow Analysis

Wenxi Yue, Hongen Liao, Yong Xia, Vincent Lam, Jiebo Luo, Zhiyong Wang

News | Abstract | Installation | Data | Checkpoints | Train |

News

Our paper is accepted to IEEE Transactions on Medical Imaging 2023.

Abstract

Surgical workflow analysis aims to recognise surgical phases from untrimmed surgical videos. It is an integral component for enabling context-aware computer-aided surgical operating systems. Many deep learning-based methods have been developed for this task. However, most existing works aggregate homogeneous temporal context for all frames at a single level and neglect the fact that each frame has its specific need for information at multiple levels for accurate phase prediction. To fill this gap, in this paper we propose Cascade Multi-Level Transformer Network (CMTNet) composed of cascaded Adaptive Multi-Level Context Aggregation (AMCA) modules. Each AMCA module first extracts temporal context at the frame level and the phase level and then fuses frame-specific spatial feature, frame-level temporal context, and phase-level temporal context for each frame adaptively. By cascading multiple AMCA modules, CMTNet is able to gradually enrich the representation of each frame with the multi-level semantics that it specifically requires, achieving better phase prediction in a frame-adaptive manner. In addition, we propose a novel refinement loss for CMTNet, which explicitly guides each AMCA module to focus on extracting the key context for refining the prediction of the previous stage in terms of both prediction confidence and smoothness. This further enhances the quality of the extracted context effectively. Extensive experiments on the Cholec80 and the M2CAI datasets demonstrate that CMTNet achieves state-of-the-art performance.

Figure 1: Overview of CMTNet.

Installation

  1. Clone the repository.

    git clone https://github.com/wenxi-yue/CMTNet.git
    cd CMTNet
    
  2. Create a virtual environment for CMTNet and and activate the environment.

    conda create -n cmtnet python=3.9 -y
    conda activate cmtnet
    
  3. Install Pytorch. In our case, we use pip install torch==1.11.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113. Please follow the instructions here for installation in your specific condition.

  4. Install other dependencies.

    pip install -r requirements.txt
    

Data

We use the Cholec80 [1] and M2CAI [2] datasets in our experiments.

We provide the pre-computed ResNet features and ground-truth annotations here.

Checkpoints

We provide the checkpoints for reproducing the results in our paper here. Please note that we repeat the experiments four times and report the average results in the paper. Accordingly, we have included the checkpoints for all four repetitions.

File Organisation

After downloading the data, the files should be organised as follows.

CMTNet
    |__assets
    |    ...
    |__data
    |    |__cholec80
    |    |       |__groundtruth
    |    |       |__resnet_features
    |    |       |__mapping.txt
    |    |       |__test.bundle
    |    |                   
    |    |__m2cai
    |            |__groundtruth
    |            |__resnet_features
    |            |__mapping.txt
    |            |__test.bundle
    |                   
    |__data.py
    |__eval.py
    |__loss.py
    |__main.py
    |__model.py
    |__train.py
    |__ ...

Train

To train the model:

cd CMTNet/
python main.py  --dataset cholec80 | tee  log_cholec80.out
python main.py  --dataset m2cai | tee  log_m2cai.out

Citing CMTNet

If you find our work helpful, please consider citing:

@article{yue_cmtnet,
  title={Cascade Multi-Level Transformer Network for Surgical Workflow Analysis},
  author={Yue, Wenxi and Liao, Hongen and Xia, Yong and Lam, Vincent and Luo, Jiebo and Wang, Zhiyong},
  journal={IEEE Transactions on Medical Imaging},
  year={2023},
  publisher={IEEE}
}

References

[1] A. P. Twinanda, S. Shehata, D. Mutter, J. Marescaux, M. de Mathelin, and N. Padoy, “EndoNet: A deep architecture for recognition tasks on laparoscopic videos,” IEEE Trans. Med. Imag., vol. 36, no. 1, pp. 86–97, Jan. 2017.

[2] A. P. Twinanda, S. Shehata, D. Mutter, J. Marescaux, M. De Mathelin, and N. Padoy. (2016). Workshop and Challenges on Modeling and Monitoring of Computer Assisted Interventions. [Online]. Available: http://camma.u-strasbg.fr/m2cai2016/.

About

[TMI2023] Official implementation of CMTNet

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages