IN-HOUSE EDA PART 1

VHE — Virtual Hardware Emulator

Million-Gate Simulation at Zero License Cost
"When you can't rent, you build."
WIOWIZ Technologies • January 2026 • 8 min read

1. The Challenge

We were verifying a 1.4 million gate NPU design. The simulation started. Six hours later, we had a 56GB trace file, 139 billion cycles queued, and a killed process.

This wasn't a bug. This was a fundamental limitation. Open-source simulators are excellent for small-to-medium designs. But when you cross into million-gate territory, the rules change.

The Reality:
Hardware emulators that handle this scale require significant investment — often beyond reach for early-stage teams. We couldn't wait for funding. We couldn't wait for subsidies. We needed to verify our chip.

So we asked a different question: What if we built our own?

2. The Gap We Saw

Gate-Level Simulation Landscape OPEN-SOURCE Verilator Icarus Verilog ✓ Free ✓ Well-documented ✗ Struggles at 1M+ gates ✗ Large trace files ✗ No GPU acceleration ENTERPRISE Hardware Emulators FPGA Prototyping ✓ Fast ✓ Scalable ✗ Enterprise pricing ✗ Long procurement ✗ Not accessible ? THE GAP VHE GPU-Accelerated Gate-Level Simulator Built in-house • Zero license cost • Tailored to our flow

We're not claiming to replace enterprise tools. We built what we needed — a research-grade simulator for teams in similar situations.

3. What is VHE?

VHE (Virtual Hardware Emulator) is a GPU-accelerated gate-level simulator. It takes synthesized netlists and simulates them on CUDA-capable GPUs, achieving throughput that would be impractical on CPUs alone.

Core Idea:
Gate-level simulation is embarrassingly parallel. Thousands of gates at the same logic level can be evaluated simultaneously. GPUs have thousands of cores. The match is natural.

Key characteristics:

  • Two-Phase Architecture: Phase 1 for zero-delay functional simulation, Phase 2 for timing
  • Levelization: Gates organized into dependency levels for parallel evaluation
  • CUDA Backend: Native GPU kernels, not emulation
  • Yosys Integration: Reads JSON netlists from Yosys synthesis

4. Architecture

VHE ARCHITECTURE RTL Design Yosys + KAALAIDE Synthesis JSON Netlist VHE FRONTEND Netlist Parser JSON → DAG Levelization Topological Sort GPU Preparation Data Structures VHE GPU ENGINE PHASE 1: FUNCTIONAL (Zero-Delay) Level 0 Level 1 Level 2 ... Level N Output PHASE 2: TIMING (Event-Driven) • SDF Delays • Event Queue • Timing Checks OUTPUTS: VCD • Metrics • Pass/Fail

5. The Journey — Real Numbers

We didn't start with 6.7 million gates. We started with 8,000. Here's the progression:

Design Gates FFs Levels VHE Speed
PicoRV32 (RISC-V CPU) 8,000 1,300 ~50 11,063 cyc/s
mor1kx (OpenRISC SoC) 1,250,000 596,000 ~200 2,941 cyc/s
GEMMX (AI/ML NPU) 1,400,000 10,866 403 1,465 cyc/s
WZ-NPU (16-tile NPU) 6,747,799 41,771 447 3,444 cyc/s
Peak Throughput:
2 Billion gate-cycles per second
6.7M gates × 3,444 cycles/sec ≈ 2B gate-cycles/sec

The 6.7 million gate WZ-NPU was the real test. It took 8 minutes to load and levelize (447 levels, 100 iterations). But once on GPU, it ran at 3,444 cycles per second — fast enough for meaningful verification.

📦 WZ-NPU: Open Source NPU

The 6.7M gate design verified with VHE is now open source.

16 tiles • 8,192 MACs • Gate-level verified

github.com/wiowiz-tech/wz-npu

6. Challenges & How We Solved Them

Challenge 1: Levelization at Scale

Our levelization algorithm hit a 100-iteration cap on large designs. The WZ-NPU with 447 logic levels was pushing limits.

Solution: We optimized the DAG traversal and increased iteration limits dynamically based on design complexity.

Challenge 2: Memory Management

6.7 million gates need ~7GB GPU memory. Not all GPUs have this.

Solution: Batched evaluation and careful memory layout. We also document minimum requirements clearly.

Challenge 3: SystemVerilog Parsing

Many modern designs use SystemVerilog. Standard Yosys struggles with some constructs.

Solution: We use KAALAIDE (our enhanced Yosys fork with Synlig integration) for SystemVerilog parsing before VHE synthesis.

SystemVerilog Flow .sv files KAALAIDE Synlig + Yosys JSON Netlist VHE GPU Sim

7. Current Status

VHE is research-grade and continuously improving.

VHE STATUS ✓ WORKING • Netlist parsing (JSON) • Levelization algorithm • Phase 1 GPU kernel • 6.7M gate designs ⏳ IN PROGRESS • SDF timing annotation • SAIF power estimation • VCD waveform output • Multi-GPU partitioning
Suitable for: Teams who need million-gate simulation without enterprise licensing. Research projects. Architecture exploration. Early-stage verification before committing to commercial tools.
"We couldn't afford to wait. So we built what we needed."

8. Part of the WIOWIZ EDA Stack

VHE is one component of our in-house EDA infrastructure:

WIOWIZ EDA STACK AVP Spec Entry & Tracking KAAL-AIDE SV Parsing & Synthesis VHE Gate-Level GPU Sim VSP-LAB HW/SW Co-Sim wiowiz-intelligence Control + Observability

Each tool was built because we needed it. No grand plan — just practical necessity driving development.

#semiconductor #EDA #verification #GPU #gate-level-simulation #in-house-tools #CUDA