by StepFun
Step3 is a cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.
Models with similar or better quality but different tradeoffs
Compare performance with other models from the same creator
How this model performs across different benchmarks
Compare cost efficiency across all models
Performance trends across all benchmark runs
Number of benchmark runs over time
Get started with this model using OpenRouter