Latest

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

80k instruction–image pairs across 11 domains and 51 subtasks, built with systematic taxonomy. Unified multimodal models stumble when their training data underrepresents real-world complexity. OpenGPT-4o-Image answers with a large-scale dataset constructed via a hierarchical task taxonomy and automated generation pipeline. Beyond basics like text rendering and style transfer, it

By Kari Jaaskelainen

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

A dual-track “brain–mouth” LLM for omnimodal understanding and low-latency, expressive speech. MGM-Omni introduces a unified Omni LLM that cleanly decouples multimodal reasoning from real-time speech generation. Its dual-track, token-based “brain–mouth” architecture enables efficient cross-modal interaction while delivering streaming, low-latency speech that preserves voice identity over long horizons. On

By Kari Jaaskelainen

Physics-informed GNN for medium-high voltage AC power flow with edge-aware attention and line search correction operator

Edge-aware attention plus line-search correction for fast, accurate AC power flow Power systems planners need solvers that are both fast and faithful to physics. This work advances Physics-Informed GNNs with two key innovations. First, an edge-aware attention mechanism injects line physics via per-edge biases, capturing grid anisotropy that standard MLP-based

By Kari Jaaskelainen

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

An end-to-end, lossless-feel FP8 recipe that speeds LLM training for reasoning Can we train reasoning-strong LLMs faster and cheaper—without sacrificing accuracy? InfiR2 answers with a practical, open FP8 recipe spanning continual pretraining and supervised fine-tuning. The approach uses a fine-grained, hybrid-granularity quantization strategy to preserve numerical fidelity where it

By Kari Jaaskelainen