Ahad MAR, Tan JK, Kim H, Ishikawa S (2012) Motion history image: its variants and applications. Mach Vis Appl 23:255–281. https://doi.org/10.1007/s00138-010-0298-4
Bahrami E, Francesca G, Gall J (2023) How much temporal long-term context is needed for action segmentation? In: IEEE International Conference on Computer Vision (ICCV), https://doi.org/10.1109/ICCV51070.2023.00950
Bertasius G, Wang H, Torresani L (2021) Is space-time attention all you need for video understanding? In: ICML, pp. 4, https://doi.org/10.48550/arXiv.2102.05095
Bkheet E, D’Angelo AL, Goldbraikh A, Laufer S (2023) Using hand pose estimation to automate open surgery training feedback. International Journal of Computer Assisted Radiology and Surgery 18(7):1279–1285
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6299–6308, https://doi.org/10.1109/CVPR.2017.502
Chen Y, Zhang Z, Yuan C, Li B, Deng Y, Hu W (2021) Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 13359–13368, https://doi.org/10.1109/ICCV48922.2021.01311
Chi Hg, Ha MH, Chi S, Lee SW, Huang Q, Ramani K (2022) Infogcn: Representation learning for human skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 20186–20196, https://doi.org/10.1109/CVPR52688.2022.01955
Czempiel T, Paschali M, Keicher M, Simson W, Feussner H, Kim ST, Navab N (2020) Tecno: Surgical phase recognition with multi-stage temporal convolutional networks. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III 23, Springer, pp 343–352, https://doi.org/10.1007/978-3-030-59716-0_33
De Rossi G, Roin S, Setti F, Muradore R (2020) A multi-modal learning system for on-line surgical action segmentation. In: 2020 International Symposium on Medical Robotics (ISMR), IEEE, pp 132–138, https://doi.org/10.1109/ISMR48331.2020.9312950
De Smedt Q, Wannous H, Vandeborre JP (2016) Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, https://doi.org/10.1109/CVPRW.2016.153
Friard O, Gamba M (2016) Boris: a free, versatile open-source event-logging software for video/audio coding and live observations. Methods Ecol Evol 7(11):1325–1330. https://doi.org/10.1111/2041-210X.12584
Gao Y, Vedula SS, Reiley CE, Ahmidi N, Varadarajan B, Lin HC, Tao L, Zappella L, Béjar B, Yuh DD, Chen C, Vidal R, Khudanpur S, Hager G (2014) Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In: MICCAI workshop: M2cai, p 3
Goldbraikh A, Avisdris N, Pugh CM, Laufer S (2022) Bounded future ms-tcn++ for surgical gesture recognition. In: European Conference on Computer Vision, Springer, pp 406–421, https://doi.org/10.1007/978-3-031-25066-8_22
Hamoud I, Srivastav V, Jamal MA, Mutter D, Mohareri O, Padoy N (2025) Multi-view video-pose pretraining for operating room surgical activity recognition. arXiv preprint arXiv:2502.13883https://doi.org/10.48550/arXiv.2502.13883
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778, https://doi.org/10.1109/CVPR.2016.90
Lea C, Hager GD, Vidal R (2015) An improved model for segmentation and recognition of fine-grained activities with application to surgical training tasks. In: 2015 IEEE winter conference on applications of computer vision, IEEE, pp 1123–1129, https://doi.org/10.1109/WACV.2015.154
Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 156–165, https://doi.org/10.1109/CVPR.2017.113
Li S, Farha YA, Liu Y, Cheng MM, Gall J (2020) Ms-tcn++: multi-stage temporal convolutional network for action segmentation. IEEE Trans Pattern Anal Mach Intell 45(6):6647–6658. https://doi.org/10.1109/TPAMI.2020.3021756
Little RJ, Rubin DB (2019) Statistical analysis with missing data, vol 793. John Wiley & Sons. https://doi.org/10.1002/9781119482260
Men Y, Luo J, Zhao Z, Wu H, Zhang G, Luo F, Yu M (2024) Research on surgical gesture recognition in open surgery based on fusion of r3d and multi-head attention mechanism. Appl Sci 14(17):8021. https://doi.org/10.3390/app14178021
Özsoy E, Örnek EP, Eck U, Czempiel T, Tombari F, Navab N (2022) 4d-or: Semantic scene graphs for or domain modeling. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp 475–485, https://doi.org/10.48550/arXiv.2203.11937
Pavlakos G, Shan D, Radosavovic I, Kanazawa A, Fouhey D, Malik J (2024) Reconstructing hands in 3D with transformers. In: CVPR, https://doi.org/10.1109/CVPR52733.2024.00938
Saggio G, Santosuosso GL, Cavallo P, Pinto CA, Petrella M, Giannini F, Di Lorenzo N, Lazzaro A, Corona A, D’Auria F, Iezzi L, Gaspari AL (2011) Gesture recognition and classification for surgical skill assessment. In: 2011 IEEE International Symposium on Medical Measurements and Applications, IEEE, pp 662–666, https://doi.org/10.1109/MeMeA.2011.5966681
Saun TJ, Zuo KJ, Grantcharov TP (2019) Video technologies for recording open surgery: a systematic review. Surg Innov 26(5):599–612. https://doi.org/10.1177/1553350619853099
Savitzky A, Golay MJ (1964) Smoothing and differentiation of data by simplified least squares procedures. Anal Chem 36(8):1627–1639. https://doi.org/10.1021/ac60214a047
Sharghi A, Haugerud H, Oh D, Mohareri O (2020) Automatic operating room surgical activity recognition for robot-assisted surgery. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III 23, Springer, pp 385–395, https://doi.org/10.1007/978-3-030-59716-0_37
Spektor R, Friedman T, Or I, Bolotin G, Laufer S (2025) Monocular pose estimation of articulated open surgery tools-in the wild. Medical Image Analysis. https://doi.org/10.1016/j.media.2025.103618
Tan M, Le Q (2021) Efficientnetv2: Smaller models and faster training. In: International conference on machine learning, PMLR, pp 10096–10106
Twinanda AP, Shehata S, Mutter D, Marescaux J, De Mathelin M, Padoy N (2016) Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 36(1):86–97. https://doi.org/10.1109/TMI.2016.2593957
Wei H, Xie R, Cheng H, Feng L, An B, Li Y (2022) Mitigating neural network overconfidence with logit normalization. In: International conference on machine learning, PMLR, pp 23631–23644, https://doi.org/10.48550/arXiv.2205.09310
Wilcoxon F (1992) Individual comparisons by ranking methods. In: Breakthroughs in statistics: Methodology and distribution. Springer, p 196–202, https://doi.org/10.1007/978-1-4612-4380-9_16
Yi F, Wen H, Jiang T (2021) Asformer: Transformer for action segmentation. In: The British Machine Vision Conference (BMVC)
Zhang B, Goel B, Sarhan MH, Goel VK, Abukhalil R, Kalesan B, Stottler N, Petculescu S (2023) Surgical workflow recognition with temporal convolution and transformer for action segmentation. Int J Comput Assist Radiol Surg 18(4):785–794. https://doi.org/10.1007/s11548-022-02811-z
Zhang J, Wang Y, Tang J, Zou J, Fan S (2021) Ms-tcn: A multiscale temporal convolutional network for fault diagnosis in industrial processes. In: 2021 American Control Conference (ACC), IEEE, pp 1601–1606, https://doi.org/10.23919/ACC50511.2021.9482728
Comments (0)