Inputs are initially passed as a result of some thoroughly connected layer, to the double-layer residual multihead awareness as demonstrated in Fig. seven. Residual networks (Kaiming He, 2016), incorporate feedforward to circumvent neurons from enduring exploding or vanishing gradients throughout the training method. The fully linked layers while i