Enhancing open-surgery gesture recognition using 3D pose estimation

Ahad MAR, Tan JK, Kim H, Ishikawa S (2012) Motion history image: its variants and applications. Mach Vis Appl 23:255–281. https://doi.org/10.1007/s00138-010-0298-4

Article  Google Scholar 

Bahrami E, Francesca G, Gall J (2023) How much temporal long-term context is needed for action segmentation? In: IEEE International Conference on Computer Vision (ICCV), https://doi.org/10.1109/ICCV51070.2023.00950

Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding? In: ICML, pp. 4, https://doi.org/10.48550/arXiv.2102.05095

Bkheet E, D’Angelo AL, Goldbraikh A, Laufer S (2023) Using hand pose estimation to automate open surgery training feedback. International Journal of Computer Assisted Radiology and Surgery 18(7):1279–1285

Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6299–6308, https://doi.org/10.1109/CVPR.2017.502

Chen Y, Zhang Z, Yuan C, Li B, Deng Y, Hu W (2021) Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 13359–13368, https://doi.org/10.1109/ICCV48922.2021.01311

Chi Hg, Ha MH, Chi S, Lee SW, Huang Q, Ramani K (2022) Infogcn: Representation learning for human skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 20186–20196, https://doi.org/10.1109/CVPR52688.2022.01955

Czempiel T, Paschali M, Keicher M, Simson W, Feussner H, Kim ST, Navab N (2020) Tecno: Surgical phase recognition with multi-stage temporal convolutional networks. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III 23, Springer, pp 343–352, https://doi.org/10.1007/978-3-030-59716-0_33

De Rossi G, Roin S, Setti F, Muradore R (2020) A multi-modal learning system for on-line surgical action segmentation. In: 2020 International Symposium on Medical Robotics (ISMR), IEEE, pp 132–138, https://doi.org/10.1109/ISMR48331.2020.9312950

De Smedt Q, Wannous H, Vandeborre JP (2016) Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, https://doi.org/10.1109/CVPRW.2016.153

Friard O, Gamba M (2016) Boris: a free, versatile open-source event-logging software for video/audio coding and live observations. Methods Ecol Evol 7(11):1325–1330. https://doi.org/10.1111/2041-210X.12584

Article  Google Scholar 

Gao Y, Vedula SS, Reiley CE, Ahmidi N, Varadarajan B, Lin HC, Tao L, Zappella L, Béjar B, Yuh DD, Chen C, Vidal R, Khudanpur S, Hager G (2014) Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In: MICCAI workshop: M2cai, p 3

Goldbraikh A, Avisdris N, Pugh CM, Laufer S (2022) Bounded future ms-tcn++ for surgical gesture recognition. In: European Conference on Computer Vision, Springer, pp 406–421, https://doi.org/10.1007/978-3-031-25066-8_22

Hamoud I, Srivastav V, Jamal MA, Mutter D, Mohareri O, Padoy N (2025) Multi-view video-pose pretraining for operating room surgical activity recognition. arXiv preprint arXiv:2502.13883https://doi.org/10.48550/arXiv.2502.13883

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778, https://doi.org/10.1109/CVPR.2016.90

Lea C, Hager GD, Vidal R (2015) An improved model for segmentation and recognition of fine-grained activities with application to surgical training tasks. In: 2015 IEEE winter conference on applications of computer vision, IEEE, pp 1123–1129, https://doi.org/10.1109/WACV.2015.154

Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 156–165, https://doi.org/10.1109/CVPR.2017.113

Li S, Farha YA, Liu Y, Cheng MM, Gall J (2020) Ms-tcn++: multi-stage temporal convolutional network for action segmentation. IEEE Trans Pattern Anal Mach Intell 45(6):6647–6658. https://doi.org/10.1109/TPAMI.2020.3021756

Article  Google Scholar 

Little RJ, Rubin DB (2019) Statistical analysis with missing data, vol 793. John Wiley & Sons. https://doi.org/10.1002/9781119482260

Article  Google Scholar 

Men Y, Luo J, Zhao Z, Wu H, Zhang G, Luo F, Yu M (2024) Research on surgical gesture recognition in open surgery based on fusion of r3d and multi-head attention mechanism. Appl Sci 14(17):8021. https://doi.org/10.3390/app14178021

Article  CAS  Google Scholar 

Özsoy E, Örnek EP, Eck U, Czempiel T, Tombari F, Navab N (2022) 4d-or: Semantic scene graphs for or domain modeling. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp 475–485, https://doi.org/10.48550/arXiv.2203.11937

Pavlakos G, Shan D, Radosavovic I, Kanazawa A, Fouhey D, Malik J (2024) Reconstructing hands in 3D with transformers. In: CVPR, https://doi.org/10.1109/CVPR52733.2024.00938

Saggio G, Santosuosso GL, Cavallo P, Pinto CA, Petrella M, Giannini F, Di Lorenzo N, Lazzaro A, Corona A, D’Auria F, Iezzi L, Gaspari AL (2011) Gesture recognition and classification for surgical skill assessment. In: 2011 IEEE International Symposium on Medical Measurements and Applications, IEEE, pp 662–666, https://doi.org/10.1109/MeMeA.2011.5966681

Saun TJ, Zuo KJ, Grantcharov TP (2019) Video technologies for recording open surgery: a systematic review. Surg Innov 26(5):599–612. https://doi.org/10.1177/1553350619853099

Article  PubMed  Google Scholar 

Savitzky A, Golay MJ (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639. https://doi.org/10.1021/ac60214a047

Article  CAS  Google Scholar 

Sharghi A, Haugerud H, Oh D, Mohareri O (2020) Automatic operating room surgical activity recognition for robot-assisted surgery. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III 23, Springer, pp 385–395, https://doi.org/10.1007/978-3-030-59716-0_37

Spektor R, Friedman T, Or I, Bolotin G, Laufer S (2025) Monocular pose estimation of articulated open surgery tools-in the wild. Medical Image Analysis. https://doi.org/10.1016/j.media.2025.103618

Tan M, Le Q (2021) Efficientnetv2: Smaller models and faster training. In: International conference on machine learning, PMLR, pp 10096–10106

Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97. https://doi.org/10.1109/TMI.2016.2593957

Article  PubMed  Google Scholar 

Wei H, Xie R, Cheng H, Feng L, An B, Li Y (2022) Mitigating neural network overconfidence with logit normalization. In: International conference on machine learning, PMLR, pp 23631–23644, https://doi.org/10.48550/arXiv.2205.09310

Wilcoxon F (1992) Individual comparisons by ranking methods. In: Breakthroughs in statistics: Methodology and distribution. Springer, p 196–202, https://doi.org/10.1007/978-1-4612-4380-9_16

Yi F, Wen H, Jiang T (2021) Asformer: Transformer for action segmentation. In: The British Machine Vision Conference (BMVC)

Zhang B, Goel B, Sarhan MH, Goel VK, Abukhalil R, Kalesan B, Stottler N, Petculescu S (2023) Surgical workflow recognition with temporal convolution and transformer for action segmentation. Int J Comput Assist Radiol Surg 18(4):785–794. https://doi.org/10.1007/s11548-022-02811-z

Article  PubMed  Google Scholar 

Zhang J, Wang Y, Tang J, Zou J, Fan S (2021) Ms-tcn: A multiscale temporal convolutional network for fault diagnosis in industrial processes. In: 2021 American Control Conference (ACC), IEEE, pp 1601–1606, https://doi.org/10.23919/ACC50511.2021.9482728

Comments (0)

No login
gif