This already exists in Transformer Lab and ONNX (not recommended for transformers).
You can also build a custom version of llama.cpp that writes out the ggml compute graph. What's irritating is that hugging face didn't add it to their GGUF file viewer.
Oh, sure, for the well-known models that are already on there.
I just wish that new research would always spell it out in full instead of these silly block diagrams labelled with just e.g. "Cross Attention" and not the exact parameters, number of heads, layer sizes, etc.
Also some of these diagrams use a + for concatenation and some use it for addition, that's another headache to figure out, having layer sizes would make it clear.
You can also build a custom version of llama.cpp that writes out the ggml compute graph. What's irritating is that hugging face didn't add it to their GGUF file viewer.