We would like to inform you of an upcoming Azure OpenAI model deprecation announced by Microsoft that may impact your current setup.
Impacted Model Versions
The following model versions, currently used by some customers via custom integrations or custom prompts, will be retired on March 31, 2026:
gpt-4o — versions 2024-05-13 and 2024-08-06
gpt-4o-mini — version 2024-07-18
After this date, continued usage of these versions may lead to API call failures.
What You Need to Do
As this is a Microsoft-led deprecation, we recommend taking one of the following actions in your Azure AI Foundry portal:
Option 1 — Continue with the Same Model Family
Upgrade to:
gpt-4o — version 2024-11-20
This version will be supported until October 1, 2026, and does not require any prompt changes.
Option 2 — Upgrade to Newer Models
Move to the latest models:
gpt-5 (replacement for gpt-4o)
gpt-5-mini (replacement for gpt-4o-mini)
Important:
If you choose this option, update your prompts by replacing:
max_tokens → max_completion_tokens
Recommendation
We encourage you to proactively review your deployments and plan your upgrade to avoid any service disruption. Additionally, please continue to monitor Microsoft announcements on model updates and retirements to stay informed.
We are working on providing native support to the following models for OpenAI and Azure OpenAI. They will be available latest by 25th April.
GPT‑5.4 as our flagship for complex, high‑value enterprise workflows
GPT‑5.3 Instant as the fast, cost‑efficient for most day‑to‑day chatbot traffic
GPT‑5.2 to preserve a stable, mature option for legacy and behavior‑sensitive flows
GPT‑5.4 Mini as a cost‑effective mid‑tier for routing, classification, and simple FAQ
GPT‑5.4 Nano as an ultra‑cheap, ultra‑low‑latency tier utility tasks
This selection provides a mix of all the required capabilities for customer service use cases. In addition, we will be providing system prompts for the most commonly used features in the platform viz. Agent Node, DialogGPT, Rephrase Responses, Rephrase User Query and Answer Generation. For the remaining features, we will provide templates that can be cloned to create custom prompts with minimal effort.
In the interim, you can add the 5-series models via the ‘Add Model’ option in the native integration. The prompts for 4-series models can be reused as-is for 5-series models except for one change - max_tokens in the request is replaced by max_completion_tokens in the 5-series models.