Architecture

Phases

Phase 1: Core tensor ops

  • Implemented struct Tensor wrapping Slice!(float*, N).

  • Implemented matmul (matrix multiplication) with naive and tiled block implementations.

  • SIMD support (e.g. core.simd) is planned.

Phase 2: GGUF parsing

  • GGUF reader in source/densor/format/gguf.d.

  • Map GGUF tensor types (F16, Q4_0, Q8_0) to internal storage.

  • Dequantization kernels (Q4_0 → F32) for inference.

Phase 3: CLIP

  • Layers for CLIP (Vision Transformer): Conv2d, LayerNorm, MultiHeadAttention, MLP.

Phase 4: Validation

  • Compare outputs layer-by-layer against Python transformers with known seed/input.