In this section, the CP-SBI-DCNN-based CVD classifier is proposed by using both ECG and PCG signals and is depicted in Fig. 1. The workflow initiates from the accessing of ECG and PCG signals. Then, the ECG signals are pre-processed to remove the signal artifacts and improve the signal quality. Here, the denoising, Baseline wander removal, and Isoelectric line correction are performed to remove the noisy components, discard the baseline deviations, and attain the exact signal amplitudes, respectively. Further, the PCG signal is pre-processed through windowing, variation evaluation, and denoising to retrieve accurate signal information. Thus, the signal discontinuities, signal point variation, and noisy factors are eliminated from the PCG signal. Subsequently, both the pre-processed ECG and PCG signals are fused for analyzing the waves, heartbeats, and signal features in time series. From the fused signal, the wave detection is carried out by using PTA to analyze the ventricular depolarization of the heart. Then, the waveforms of Systole, Diastole, S1, and S2 are localized by examining the signal patterns using the proposed AI-SWT. Here, the complexity of analyzing patterns while using SWT is resolved by introducing the AI method along with the SWT. Afterward, the heart rate is extracted by using the Sum slope technique from the localized waveforms to determine the signal amplitude. At the same time, the peak signal of the heart rate is identified by employing the Bayesian decision method.Thereafter, both the extracted heart rate and the detected peak signals are given to the clustering process, which groups the normal and abnormal signal characteristics using the LK-DBSCAN algorithm. Here, the LK technique is included in the DBSCAN to properly cluster the similar dense signal values. Further, the features are extracted from the abnormal signal cluster and then the optimal features from the extracted features are selected by using the HeWaBPO algorithm. Here, the HeWa measure is utilized for calculating the bear movement accurately without scale invariance during exploration. Thus, the optimal solution is attained for feature selection. Meanwhile, the correlation signal patterns between the fused signals are identified by formulating the correlation matrix. Finally, the correlated patterns, extracted heart rate, and the selected abnormal signal features are fed to the introduced CP-SBI-DCNN to categorize the multiple classes, such as AV, TV, MV PV, AF, and Ischemic of the heart disease.

Block diagram of the multi-class heart disease classification model by utilizing Noise Filtering (NF), Moving Average Filter (MAF), Pan Tompkins Algorithm (PTA), Algebraic Integer quantized Stationary Wavelet Transform (AI-SWT), Low-rank Kernelized Density-Based Spatial Clustering of Applications with Noise (LK-DBSCAN), Heming Wayed Polar Bear Optimization (HeWaPBO), and C squared Pool Sign BI-power-activated Deep Convolutional Neural Network (CP-SBI-DCNN) techniques.
Signal fusion
The fusion is accomplished by identifying common reference points, such as R peaks in the ECG signal and the onset of heart sounds in the PCG signal, to establish a consistent time scale for both signals followed by feature concatenation. In this phase, both pre-processed ECG \((\chi _p)\) and PCG \((\beta _p)\) signals are fused based on the time series of the signals, waves, and heartbeats of the ECG and PCG signals. Then, the fused signal \(S_F(t)\) is determined by,
$$\begin{aligned} S_F(t)=\chi _p(PQRST)+\beta _p(S_1,S_2,S_3,S_4)+\gamma (\chi _p,\beta _p) \end{aligned}$$
Here, PQRST represents the waves present in the ECG signal,\(S_1,S_2,S_3,S_4\) denotes the recorded heartbeats in \((\beta _p)\) , and \(\gamma\) denotes the other components present in both \((\beta _p)\),\((\chi _p)\) signals.
Wave detection
In this phase, \(S_F(t)\) input signals undergo wave detection, particularly focusing on identifying the QRS complex using the Pan Tompkins Algorithm (PTA), which signifies the ventricular depolarization in the heart. It is computed as follows,
$$\begin{aligned} W(I)=\frac{\lambda (T)}{NS} \end{aligned}$$
Here, \(\lambda\) denotes the windowing function, which is multiplied by the signal waveform feature (T) and NS represents the number of samples in the width of the integration window.
R peak detection
After the window integration, two thresholds are chosen: a higher threshold is employed to detect the R peaks, and a lower threshold is set for a search-back process. The R peak detection rate \(\Re\) is determined by,
$$\begin{aligned} \Re =\frac{F_p+D_f}{NR} \end{aligned}$$
Here,\(F_p\) represents the false peak detection, \(D_f\) denotes the failure detection, and NR is the total number of R peaks. Finally, R peak detected signals \((\Re _{sig})\) are obtained.
Following the wave detection process, the Systole, Diastole, S1, and S2 waveforms are precisely located from the \((\Re _{sig})\) signal by using the AI-SWT technique to identify the pattern of the signal. The AI-SWT solves the shift-invariance and non-redundancy issues in the signal. The conventional SWT calculated the length of the signal by calculating the length of both the approximation and detailed coefficients at each level of analysis32. However, the random length calculation leads to an increase in the complexity of the process. Therefore, Algebraic Integer Quantization is introduced to calculate the approximation and detail coefficients. The AI-SWT is derived by,
$$\begin{aligned} SW(\Re _{sig})=TF(\Re _{sig}*L_{sig}*WT) \end{aligned}$$
Where, \(SW(\Re _{sig})\) represents the wavelet transformation of the signal, TF denotes the transformation function, \(L_{sig}\) represents the length of the signal at each level, and WT is the applied wavelet. The Length of the signal is determined by,
$$\begin{aligned} L_{sig}=A_K *d_k \end{aligned}$$
Where, \(A_k\) represents the approximate coefficient of the signal and \(d_K\) represents the detail coefficient of the signal. The heart sound localized signals\((\lambda _l)\) are obtained, and it is expressed by,
$$\begin{aligned} \lambda _l=(\lambda _{S1},\lambda _{S2},\lambda _{SS},\lambda _{DS}) \end{aligned}$$
Where, \(\lambda _{S1},\lambda _{S2},\lambda _{SS},\lambda _{DS}\) represents the S1, S2, Systole, and Diastole heart sound localized signal, respectively.
Consider the two valvular conditions, AV and PV. In Aortic Valve (AV) illness, increasing pressure on the left side of the heart causes ECG alterations such as left ventricular hypertrophy or left atrial enlargement. Pulmonary Valve (PV) disease affects the right side of the heart and can induce right ventricular overload and right atrial hypertrophy, which can influence the PCG signal. This physiological variance is reflected in the ECG and PCG signals’ waveform timing (e.g., QRS complex or heart sounds like S1, S2). The novel heartbeat localization technique, AI-SWT is implemented to analyze these signal deviations for efficient classification using the CP-SBI-DCNN classifier.
Heart rate extraction
After localizing the heart sounds \((\lambda _l)\), the heart rate can be extracted using the Sum Slope technique. This method involves analyzing the signal to determine the rate of change in amplitude, often by measuring the slope of certain features related to the heartbeats. The heart rate \(H_{RE}\) is extracted by,
$$\begin{aligned} H_{RE}=\Gamma (\lambda _l) \Gamma =\sum \limits _{q=w}^n\Delta (\lambda _l)_q \end{aligned}$$
Where, \(\Gamma\) represents the Sum Slope Function,\(\Delta (\lambda _l)_q\) represents the slope of the signal, and w represents the window of the signal.
Peak signal detection
The peak of the signal is detected from \((\lambda _l)\) by employing the Bayesian decision technique. Peak detection involves the identification of the positions of peaks within a signal. The \(S_p\) number of peak detected signals \(S_{peak}\) is obtained as,
$$\begin{aligned} s_{peak}=(S_1,S_2,S_3,…S_p) \end{aligned}$$
where, \((S_1,S_2,S_3,…S_p)\) denotes the p number of peak signals present in the localized heartbeat sound.
In the clustering phase, both the \(H_{RE}\) and \(S_{peak}\) are clustered into two distinct groups: normal, and abnormal using the Low-rank Kernelized Density-Based Spatial Clustering of Applications with Noise (LK-DBSCAN) technique. The algorithm operates by identifying regions of high data point density and extracting neighbouring points within a certain distance. This process is repeated iteratively until all the relevant neighbourhood points are grouped together, effectively forming distinct clusters33. The LK-DBSCAN-based clustering model initializes the data points of the signal and is expressed by,
$$\begin{aligned} DP(H_{RE},S_{peak})=({D_1,D_2,D_3,…D_{np}}) \end{aligned}$$
Where,np represents the number of data points in the signal. At first, a randomly selected data point is referred to as an unvisited data point \((DP)_\upsilon\). Then, find the all neighboring points by,
$$\begin{aligned} ND=\varepsilon (DP(H_{RE}),S_{peak}) \varepsilon =\sum \limits _{j=1}^{np}\phi _jK_{pq} \end{aligned}$$
Here, ND represents the neighboring data points, \(\varepsilon\) represents the radius of the particular density of the data point area, Meanwhile, \(\varepsilon\) is estimated by multiplying the \(\phi _j\) (weight parameter) with a low-rank kernel matrix \(K_{pq}\) , which is determined by,
$$\begin{aligned} K_{pq}=K(y_i,y_j) \end{aligned}$$
Here \(y_i,y_j\) represents the data point coordinates. Next, the minimum number of neighboring points Minpts is defined by,
$$\begin{aligned} \textit{Minpts}=\sum \limits _{j=1}^{np}\rho _j \end{aligned}$$
Where,\(\rho _j\) represents the \(j^{th}\)value of the density measured in a specific data point. The distance \((\textit{dist})\) between the neighboring points (u,v) is calculated by,
$$\begin{aligned} \textit{dist}=\sum \limits _{j=1}^{np}(u_j-v_j)^2 \end{aligned}$$
Where,\(u_j\) and \(v_j\) indicate the two neighbouring data points having j density value in the fused signal. After that, when the distance between the two neighbouring points among the fused ECG and PCG signal is equal or less than \(\varepsilon\) , these points are considered as neighbours; otherwise, it is denoted as noise. If the visited data point is not assigned to the cluster, then it is created in the new cluster. This process is repeated until all visited data points move on to the cluster or noise. Finally, the data points of the signals are clustered \(DP_{cluster}\) into normal \((N_{signal})\) and abnormal \((CD_{signal})\) signals. Then, it is expressed as,
$$\begin{aligned} DP_{cluster}=(N_{sig},DP_{sig}) \end{aligned}$$
The Low-rank Kernelized Density-Based Spatial Clustering (LK-DBSC) plays a pivotal role in distinguishing between normal and abnormal signals, which is critical for effective cardiovascular disease classification. Here’s a detailed clarification of its role and impact:
Separation of Normal and Abnormal Signals: LK-DBSC is employed to analyze the density and spatial distribution of signal data points. By utilizing kernelized techniques, it captures non-linear relationships in the feature space, enabling more precise clustering of signals. Low-rank approximations reduce computational complexity while preserving essential features, enhancing the clustering process.
Noise Reduction and Signal Purification: This method effectively isolates outliers and noise within the signal data, which often interfere with accurate classification. The separation ensures that only high-quality signal features are used for downstream processing.
Feature Enhancement for Classification: By segregating normal and abnormal signals, LK-DBSC ensures that the subsequent classification models operate on more homogeneous and well-defined data groups. This separation improves the accuracy of identifying specific disease categories, such as valvular diseases, atrial fibrillation, and coronary artery diseases.
Impact on Overall Performance: Improved classification accuracy, efficiency in processing, enhanced interpretability
Feature extraction
From the \((CD_{sig})\) signals, important features are extracted. Here, the PCG signal features, such as Mean Absolute Deviation (MAD), Inter Quartile Range (IQR), skewness, Shannon’s entropy, maximum frequency, dynamic range, total harmonic distortion, maximum amplitude, power, mean, variance, root mean square error, bandwidth, mid-frequency, average frequency, Cepstrum peak amplitude, Mel-frequency cepstral coefficients (MFCCs), and kurtosis are extracted from the signals. These features provide valuable information about the characteristics of the phonocardiogram signals. Then, the ECG signal features like interval measurement, morphology features, standard deviation, mean, correlation, and temporal features were also extracted.
Feature selection
From the extracted features \(( \vartheta )\), the important features are selected by using the HeWaPBO algorithm. The conventional PBO algorithm is based on the Polar Bear’s (PB) food searching and hunting behaviour. This algorithm effectively selects the features, and the bear motion is calculated based on the Euclidean distance measure, which causes a sudden change in the direction of the bear and leads to a scale-invariant problem. To address this issue, the model includes the Heming Way distance calculation instead of the Euclidean distance calculation, which solves the sudden change in the direction issue. The HeWaPBO is derived by, The population of the PB (features \(( \vartheta )\) ) is initialized and the optimum solution is found in the global and local search space. The population of a number of PB \((\partial\)B) is expressed by,
$$\begin{aligned} \partial B=(\partial B_1,\partial B_2,……\partial B_s) \end{aligned}$$
The PB’s nature to glide on an iceberg in the exploration of food in the global search space is expressed by,
$$\begin{aligned} \partial B_M=\partial B^{I}_{s,r}+\tau *\gamma _1+\gamma _2 \end{aligned}$$
Where, \(\partial B_M\) is the movement of the PB,\(\partial B^{I}_{s,r}\) represents the \(s^{th}\) PB in the \(r^{th}\) coordinate within the \(I^{th}\) iteration,\(\tau\) denotes the distance between the current PB and the optimum PB, and \(\gamma _1,\gamma _2\) represents the random number. The distance \(\tau\) is calculated by using the Heming Way distance calculation, and it is determined by,
$$\begin{aligned} \tau =\sum \limits _{r=1}^{n+1}\sigma (\partial B_{s,r},\partial B_{b,r}) \end{aligned}$$
Here, \(\partial B_{b,r}\) represents the best PB, \(\sigma\) represents the Hemingway function parameter, and n+1 represents the last coordinate of the position of the bear. In the local search space, the polar bears are surrounded by prey and it is characterized by two parameters, such as the distance vision parameter and the angle of tumbling \((\phi )\) parameter. The distance vision radius is determined by,
$$\begin{aligned} R=4vcos(\phi )sin(\phi ) \end{aligned}$$
The local search space of each PB is identified by using the vision radius, and the position is updated by,
$$\begin{aligned} \partial B^{new}_s=\partial B^{actual}_s \pm \left[ \sum \limits _{g=1}^s R sin(\phi _g)+R cos(\phi _g)\right] \end{aligned}$$
Where, \((\partial B^{new}_s)\) represents the new position of the bear, \((\partial B^{actual}_s)\) denotes the actual position of the bear, and g represents the number of bears. Then, the fitness (high classification accuracy) of the population \(\textit{fit}(\partial B)\) is defined as,
$$\begin{aligned} \textit{fit}(\partial B)=\textit{acc}_{max} \end{aligned}$$
Here, \(\textit{acc}_{max}\) denotes the maximum classification accuracy. The population growth of the bear depends on the reproduction of the best and starvation of the worst among the population. The dynamic population is controlled based on the random constant \(\psi\), and it is expressed by,
$$\begin{aligned} \left\{ \begin{array}{ll} \partial B_{Re\text {production}} & \psi > 0.75 \\ \partial B_{\text {Death}} & \psi < 0.25 \\ \end{array} \right\} . \end{aligned}$$
Where, \(\partial B_{Re\text {production}}\) denotes the reproduction rate of the bear, and \(\partial B_{\text {Death}}\) is the death rate of the bear. The newly reproduced population \(\partial B_{Re\text {p}}\) is determined by,
$$\begin{aligned} \partial B_{Re\text {p}}=\frac{ \partial B_{\text {best}}+\partial B_{\text {s}}}{2} \end{aligned}$$
Here,\(\partial B_{\text {s}}\) implies the \(s^{th}\) PB and \(\partial B_{\text {best}}\) denotes the best polar bear in the \(I^{th}\) iteration. This process is reiterated until the best solution is achieved. After executing all steps,17 features are chosen as the optimum feature \((\vartheta _\textit{best})\) based on fitness function. The pseudocode for HeWaPBO algorithm is given in algorithm 1.

In the proposed model, the HeWaPBO algorithm uses Hamming weighted distance instead of Euclidean distance. The Hamming distance is scale-invariant and improves feature selection by avoiding erratic movements and local traps in the optimization process. It enhances the stability and efficiency of the model by accurately selecting critical features from ECG and PCG signals. The key hyperparameters of PBO are a) Population size-50 b) Number of iterations-100. c) Learning parameters: Exploration parameter-0.8 and Exploitation parameter-0.5 d)Hamming distance weight-0.5.
Trial-and-error experiments were conducted to determine the optimal values for these parameters. Various population sizes (30, 40,80), iterations (50, 75,120), exploration parameters (0.2, 0.5, 1), exploitation parameters (0.2, 0.5, 0.8), and Hamming distance weights (0.4, 0.6, 0.8) were tested. Among these, the above-mentioned values provided the best solution for the model’s performance.
Correlation matrix generation
Generating a correlation matrix from \(S_F(t)\) is a common technique to identify patterns and relationships within the data. Each entry in the correlation matrix represents the correlation coefficient between two variables. The variables are referred to as different data points or time series values within the fused signal. The correlation coefficient \(\upsilon (S_F(t))\) is determined by,
$$\begin{aligned} \upsilon (S_F(t)) = \frac{N\sum (bc) – (\sum b)(\sum c)}{\sqrt{[N\sum b^2 – (\sum b)^2][N\sum c^2 – (\sum c)^2]}} \end{aligned}$$
Where, N denotes the number of data points in \((S_F(t))\) and b,c is the data point coordinate of \((S_F(t))\) . Based on \(\upsilon (S_F(t))\), the nc number of correlation matrices \((CR_{b,c})\) is generated, and it is defined by,
$$\begin{aligned} \begin{aligned} (CR_{b,c}) = \begin{bmatrix} CR_1 & CR_2 & CR_3\\ CR_4 & CR_5 & CR_6\\ CR_7 & \dots & CR_{nc} \end{bmatrix} \end{aligned} \end{aligned}$$
In this section,\((\vartheta _\textit{best})\) ,(\(CR_{b,c}\)) ,and \((H_{RE})\) are inputted to the CP-SBI-DCNN classifier for classifying the types of CVDs into Valvular diseases {AV, TV, MV, PV}, Arrhythmia AF, and CAD {Ischemic heart disorder}. The traditional DCNN effectively minimized computation while achieving accurate disease classification within the extensive dataset34. However, the DCNN model required more time due to maxpool operation. To overcome this issue, the proposed model modified the max pool layer into a Csquared pool layer and also included the Sign BI-Power activation function to trigger the neurons more efficiently.The Pseudocode for heart diseases classification is given in algorithm 2.

The input layer in a neural network receives the best features of the abnormal signal \((\vartheta _\textit{best})\), the correlation matrix of the fused ECG and PCG signal (\(CR_{b,c}\)), and the heart rate that is analyzed from the localized waveforms of the fused signal \((H_{RE})\) . Then, the input layer transmits this input to the convolutional layer (CL). Here, the parameters, such as weights, biases, the maximum number of iterations, and the number of input layers, are initialized. The input layer is responsible for learning and extracting relevant signal features through a set of convolutional filters.
Csquared pooling layer
The Csquared pooling layer performed the downsampling operation. It reduces the spatial dimensions of the feature maps, helping the neural network focus on the most important information while reducing the computational load. The Csquared pooling layer output \((CS_L)\) is defined by,
$$\begin{aligned} CS_L=\Phi [CL*\omega *bs] \end{aligned}$$
Here,\(\Phi\) represents the Csquared pooling operation. The spatial dimension is reduced by using the flattening operation\((FL_{op})\) , and it is expressed by,
$$\begin{aligned} FL_{op}=FL(CS_L) \end{aligned}$$
Output layer
The output layer executes the final classification based on the Sign BI- Power activation function, and it is expressed by,
$$Op = \bar{\lambda} \left[ {FC_{L} } \right]\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \bar{\lambda} \left( {FC_{L} } \right) = \frac{1}{2}sig^{r} \left( {FC_{L} } \right) + \frac{1}{2}sig^{r} \left( {FC_{L} } \right)$$
Where \(\bar{\lambda}\)represents the Sign BI-Power activation function, and \(sig^r\) represents the function parameter. Finally, this output classified both ECG-PCG signals into six classes, such as (AV), (TV), (MV), (PV), (AF), and Ischemic heart disorder.