One of the two, the Phi-4-multimodal model, has 5.6 billion parameters and can process text, images, and speech ...