That is what transformers attention does in the first place, so you would just be stacking two transformers.
That is what transformers attention does in the first place, so you would just be stacking two transformers.