whisper-tiny | Notion

onnx-community/whisper-tiny · Hugging Face

Download

git clone <https://huggingface.co/onnx-community/whisper-tiny>

Compile onnx mode

The code goes like

import onnx
import tvm
from tvm import relax
from tvm.relax.frontend.onnx import from_onnx  # Correct import path

def compile_model(onnx_path, target="llvm"):
    # 1. Load ONNX model
    onnx_model = onnx.load(onnx_path)
    
    # 2. Convert to Relax IR (updated API)
    mod = from_onnx(onnx_model)
    
    # 3. Apply mandatory passes
    seq = relax.transform.Sequential([
        relax.transform.LegalizeOps(),
        relax.transform.FoldConstant(),
        relax.transform.DeadCodeElimination()
    ])
    mod = seq(mod)
    
    # 4. Build
    ex = relax.build(mod, target)
    
    # 5. Save
    output_path = onnx_path.replace(".onnx", ".so")
    ex.export_library(output_path)
    return output_path

# Compile both encoder and decoder
encoder_so = compile_model("encoder_model_fp16.onnx", target="llvm")
decoder_so = compile_model("decoder_with_past_model_fp16.onnx", target="llvm")

It turned out Gather is not supported

Screenshot 2025-04-22 at 3.03.32 PM.png

Relay no longer supported

Screenshot 2025-04-22 at 2.58.45 PM.png

Inference

To-read

kvcache

encoder → static shape

pip transformer >models>mode >mode_…py

transformer architecture

TVM MetaSchedule