Redefining parameter-efficiency in ADHD diagnosis: A lightweight attention-driven kolmogorov-arnold network with reduced parameter complexity and a novel activation function

Attention Deficit Hyperactivity Disorder (ADHD) is one of the most common and highly complex neurodevelopmental disorders marked by diminished focus, excessive restlessness, and spontaneous, impulsive actions. Around 5–9 % of school-aged children and 2–4 % of adults are affected by this disorder worldwide (Polanczyk and Jensen, 2008). ADHD is diagnosed through a comprehensive clinical assessment by an experienced child psychiatrist using the insights gathered from the child’s parents and teachers. However, this method is quite subjective and time-intensive, so functional magnetic resonance imaging (fMRI) is gaining recognition as a more effective alternative for ADHD diagnosis (Dey et al., 2014). It measures the brain connectivity of different brain regions, providing a more objective and efficient approach to analyze this disorder.

In the last decade, deep learning has revolutionized the healthcare sector, driving substantial advancements in medical research, clinical practices, and diagnostics. Multilayer Perceptron (MLP) forms the foundational block of deep learning models (Haykin, 1999) and using this foundation, numerous Convolutional Neural Networks (CNNs) and their variants have been designed to diagnose neurological disorders. A CNN consists of multiple layers of neurons, where each layer is connected to the next through weighted links, which are optimized during training to improve performance. These layers also contain activation functions that introduce non-linearity (Sarkar et al., 2022), enabling the network to capture complex, intricate patterns inherent in the data. Collectively, these elements empower CNNs to learn, adapt, and generate precise predictions, making them an indispensable tool in medical image analysis and diagnosis.

In modern healthcare, the complexity of medical data has increased significantly, especially with multi-dimensional imaging modalities like fMRI, Computed Tomography, and histopathology, etc. To handle this growing complexity, researchers have developed increasingly deep CNN architectures to extract inherent relationships and finer details in the data. While these deep networks have shown impressive performances, they bring associated challenges as well. The increased depth leads to millions of network parameters, significantly increasing the computational complexity and making CNNs resource-intensive in terms of memory and processing power (Zhao et al., 2024). Consequently, specialized hardware like Field Programmable Gate Array, Coarse-Grained Reconfigurable Array, and Graphical Processing Units, etc., is often required to manage the extensive processing demands (Dhilleswararao et al., 2022). Reliance on such hardware escalates costs significantly and limits accessibility, creating barriers for resource-constrained researchers. Furthermore, the vast number of parameters increases the risk of overfitting, especially when the training data is limited, impacting the model's generalization. It also challenges the model's interpretability, posing substantial obstacles to its real-world clinical deployment.

This paper proposes a robust and parameter-efficient approach for ADHD diagnosis using the Kolmogorov-Arnold Network (KAN) (Ziming Liu et al., 2024). Recently, KAN has been proposed as a promising alternative to MLPs due to its unique spline-based structure. Unlike traditional MLPs that use fixed activation functions at nodes, KAN integrates learnable, adaptive spline functions (Dubey et al., 2022) along its edges. This paradigm shift from static to dynamic mechanism makes KAN outperform MLP, achieving comparable or superior accuracy with significantly fewer parameters (Minjong Cheon, 2024; Yuntian Hou, 2024).

Neural networks generally require large amounts of data for effective training; the scarcity of labeled medical data limits their generalization. To overcome this challenge, this paper utilizes an efficient sliding window approach (Zhu et al., 2023) to systematically generate diverse fMRI segments from the original dataset. This approach not only expands and diversifies the training data but also enhances model robustness and generalization. Widely utilized in recent research studies (Lashgari et al., 2020; Mao et al., 2019; Pei et al., 2023; Yu and hui, 2024), the sliding window approach has proven to be a powerful strategy for improving the model’s performance and generalization across various medical applications.

To further overcome computational challenges, this study proposes integrating an attention mechanism (Soydaner, 2022) in the initial layers, as an intelligent feature selection approach to filter out irrelevant, redundant, or misleading features early in the process. Many contemporary neurological studies (Bakhtyari and Mirzaei, 2022; Kim et al., 2023; Yang et al., 2021; Zhang et al., 2020) embedded attention mechanism in later stages, causing the network to process the entire feature set, including irrelevant data as well. The proposed approach prevents the unnecessary propagation of irrelevant features, ensuring a more efficient and focused learning process.

A key contribution of this study is the introduction of the Enhanced Mish activation function, designed to improve the efficiency, adaptability, and accuracy of KAN in analyzing complex brain connectivity data. Traditional KAN models rely on a composite activation function, primarily built on Sigmoid Linear Unit (SiLU) (Le, 2017), combined with a spline function (Ziming Liu et al., 2024). While SiLU offers smooth gradients and non-monotonicity, it suffers from lower gradient saturation (L Zhang et al., 2023) and limited efficiency in the tasks requiring enhanced feature extraction. Given the complex and dynamic nature of brain connectivity data, it can restrict the model’s effectiveness in capturing crucial insights. To address this issue, the present study proposes a novel activation function, Enhanced Mish, that builds upon Mish activation (Misra, 2019) by introducing learnable scale and shift coefficients. These coefficients are dynamically optimized during training, enabling the activation functions to adapt to ever-evolving brain connectivity patterns and extract complex and nonlinear insights (SI, 2022) more effectively.

By amalgamating these components into a unified architecture, as illustrated in Fig. 1, the proposed model achieves a substantial reduction in network parameters and minimizes the computational complexity while maintaining optimal performance for ADHD diagnosis.

The key contribution of this paper is presented below:(i)

A highly precise and parameter-efficient model leveraging KAN is proposed for ADHD diagnosis, prioritizing computation efficiency and resource optimization.

(ii)

To counter the data-eager facet of neural networks, an effective sliding window-based data augmentation technique is integrated, generating a more diverse and enriched dataset and enhancing the model’s generalization capabilities.

(iii)

An advanced feature selection strategy, guided by an attention mechanism, is implemented to refine the model’s focus on meaningful patterns while mitigating the propagation of irrelevant data into the network, thereby significantly reducing computational overhead.

(iv)

A novel activation function, Enhanced Mish, is introduced by integrating learnable scale and shift coefficients into the Mish activation. This enhancement empowers the model to dynamically adjust to diverse data patterns, significantly improving its performance and adaptability.

(v)

The proposed architecture significantly reduces the parameter count to just a few thousand, compared to millions of parameters required in existing ADHD studies, making it ideal for resource-constrained researchers and real-world clinical use.

(vi)

With its shallow architecture, the proposed model demonstrates a superior accuracy of 79.25 %, an F1-score of 78.75 %, and a precision of 78.23 %, outperforming many state-of-the-art deep learning ADHD approaches.

The remaining part of this paper is structured as follows: second section gives a brief description of the related work, third section explains methods and materials used in the current study, fourth section provides details of the experimental setup and experimental results of proposed model, fifth section discusses the empirical outcomes while last section concludes the paper.

Comments (0)

No login
gif