IN-HOUSE EDA PART 1

VHE — Virtual Hardware Emulator

Million-Gate Simulation at Zero License Cost

"When you can't rent, you build."

WIOWIZ Technologies • January 2026 • 8 min read

1. The Challenge

We were verifying a 1.4 million gate NPU design. The simulation started. Six hours later, we had a 56GB trace file, 139 billion cycles queued, and a killed process.

This wasn't a bug. This was a fundamental limitation. Open-source simulators are excellent for small-to-medium designs. But when you cross into million-gate territory, the rules change.

                    The Reality:

                    Hardware emulators that handle this scale require significant investment — often
                    beyond reach for early-stage teams. We couldn't wait for funding. We couldn't
                    wait for subsidies. We needed to verify our chip.
                  

So we asked a different question: What if we built our own?

2. The Gap We Saw

We're not claiming to replace enterprise tools. We built what we needed — a research-grade simulator for teams in similar situations.

3. What is VHE?

VHE (Virtual Hardware Emulator) is a GPU-accelerated gate-level simulator. It takes synthesized netlists and simulates them on CUDA-capable GPUs, achieving throughput that would be impractical on CPUs alone.

                    Core Idea:

                    Gate-level simulation is embarrassingly parallel. Thousands of gates at the same
                    logic level can be evaluated simultaneously. GPUs have thousands of cores. The
                    match is natural.
                  

Key characteristics:

Two-Phase Architecture: Phase 1 for zero-delay functional simulation, Phase 2 for timing
Levelization: Gates organized into dependency levels for parallel evaluation
CUDA Backend: Native GPU kernels, not emulation
Yosys Integration: Reads JSON netlists from Yosys synthesis

4. Architecture

5. The Journey — Real Numbers

We didn't start with 6.7 million gates. We started with 8,000. Here's the progression:

Design	Gates	FFs	Levels	VHE Speed
PicoRV32 (RISC-V CPU)	8,000	1,300	~50	11,063 cyc/s
mor1kx (OpenRISC SoC)	1,250,000	596,000	~200	2,941 cyc/s
GEMMX (AI/ML NPU)	1,400,000	10,866	403	1,465 cyc/s
WZ-NPU (16-tile NPU)	6,747,799	41,771	447	3,444 cyc/s

                    Peak Throughput:

                    2 Billion gate-cycles per second

                    6.7M gates × 3,444 cycles/sec ≈ 2B gate-cycles/sec

The 6.7 million gate WZ-NPU was the real test. It took 8 minutes to load and levelize (447 levels, 100 iterations). But once on GPU, it ran at 3,444 cycles per second — fast enough for meaningful verification.

📦 WZ-NPU: Open Source NPU

The 6.7M gate design verified with VHE is now open source.

16 tiles • 8,192 MACs • Gate-level verified

github.com/wiowiz-tech/wz-npu

6. Challenges & How We Solved Them

Challenge 1: Levelization at Scale

Our levelization algorithm hit a 100-iteration cap on large designs. The WZ-NPU with 447 logic levels was pushing limits.

Solution: We optimized the DAG traversal and increased iteration limits dynamically based on design complexity.

Challenge 2: Memory Management

6.7 million gates need ~7GB GPU memory. Not all GPUs have this.

Solution: Batched evaluation and careful memory layout. We also document minimum requirements clearly.

Challenge 3: SystemVerilog Parsing

Many modern designs use SystemVerilog. Standard Yosys struggles with some constructs.

Solution: We use KAALAIDE (our enhanced Yosys fork with Synlig integration) for SystemVerilog parsing before VHE synthesis.

7. Current Status

VHE is research-grade and continuously improving.

                    Suitable for: Teams who need million-gate simulation without
                    enterprise licensing. Research projects. Architecture exploration. Early-stage
                    verification before committing to commercial tools.
                  

"We couldn't afford to wait. So we built what we needed."

8. Part of the WIOWIZ EDA Stack

VHE is one component of our in-house EDA infrastructure:

Each tool was built because we needed it. No grand plan — just practical necessity driving development.

#semiconductor #EDA #verification #GPU #gate-level-simulation #in-house-tools #CUDA