A practical troubleshooting note about VS Code Chat, Copilot quota, OpenAI models, utility models, and why "Auto" or "Vendor Default" can be misleading.
I recently ran into a confusing issue while using AI chat inside VS Code.
I had already reached my GitHub Copilot quota, so I wanted to use my own OpenAI API access instead. I selected OpenAI models in VS Code and expected the requests to stop using Copilot quota.
But I still kept seeing quota errors.
After checking the logs carefully, I found that the main model request was using OpenAI, but some hidden helper requests were still falling back to GitHub Copilot.
This article is my note for future reference, especially if I ever switch back to GitHub Copilot later.
The problem
The error looked something like this:
Sorry, your request failed. Please try again. Reason: Response contained no choices.
At first, this looked like an OpenAI model response problem.
But the VS Code logs showed something more useful:
Server error: 402 You have exceeded your monthly quota quotaExceeded | gpt-4o-mini
That means some request was still going through GitHub Copilot, even though I had already selected an OpenAI model elsewhere.
What was actually happening
My setup was not purely:
VS Code → OpenAI
It was more like:
VS Code Chat UI → Copilot Chat extension / VS Code chat wrapper → OpenAI model for the main answer → Copilot model for helper tasks
So the main chat response could succeed using OpenAI.
But VS Code still made extra helper requests for things like:
- conversation title generation
- utility tasks
- context handling
- summarization
Those helper requests were still using Copilot models.
The main OpenAI response could succeed, but a hidden Copilot helper request could fail because the Copilot quota was already exhausted.
Main OpenAI response = success Hidden Copilot helper request = quota exceeded Overall VS Code chat experience = unstable
The real trap: Auto / Vendor Default
Some VS Code settings were still set to:
Auto (Vendor Default)
or:
default
This sounds harmless, but it means:
Let VS Code or the active chat provider decide which model to use.
Because Copilot Chat was still part of the VS Code chat system, some background tasks continued to use Copilot models.
So even though I manually selected OpenAI for some visible model settings, hidden utility tasks still used Copilot.
The settings that fixed it
The important settings were:
chat.utilityModel chat.utilitySmallModel
They were originally set to:
default
I changed both to an OpenAI mini model, for example:
chat.utilityModel = gpt-5.4-mini chat.utilitySmallModel = gpt-5.4-mini
After that, the logs stopped showing Copilot quota errors for title generation or utility tasks. Instead, I saw successful requests using OpenAI models.
Settings to check
If I want to avoid using GitHub Copilot quota, I should check these VS Code settings:
chat.utilityModel chat.utilitySmallModel
Also check these model settings:
Chat Agent Default Model Plan Agent Default Model Edit Agent Default Model
Avoid leaving them as:
Auto (Vendor Default) default
unless I intentionally want VS Code or Copilot to choose the model automatically.
My recommended setup
For my current workflow, I would use something like this:
Chat Agent: stronger OpenAI chat model Plan Agent: Codex-style coding model Edit Agent: Codex-style coding model Utility Model: OpenAI mini model Utility Small Model: OpenAI mini model
A more cost-conscious setup:
Chat Agent: OpenAI mini model Plan Agent: Codex-style coding model Edit Agent: Codex-style coding model Utility Model: OpenAI mini model Utility Small Model: OpenAI mini model
The key point is not the exact model name.
Do not leave utility models as default if Copilot quota is already exhausted.
How to read the logs
Some logs are mostly harmless:
Logged in Got Copilot token Public code references are enabled Fetched model metadata
These do not automatically mean the request is using Copilot quota.
The important part is the actual model request result.
Good signs
success | OpenAI model name
Bad signs if Copilot quota is exhausted
quotaExceeded gpt-4o-mini You have exceeded your monthly quota Failed to fetch conversation title
Different issue: OpenAI rate limit
Rate limit reached tokens per min TPM
This means OpenAI was actually being used, but the request was too large or too many tokens were used in a short time.
Copilot quota vs OpenAI rate limit
| Issue | Example log | Meaning | Possible fix |
|---|---|---|---|
| Copilot quota exceeded | quotaExceeded | gpt-4o-mini | Some request is still using GitHub Copilot quota. | Change utility/helper models away from default/vendor default. |
| OpenAI rate limit | Rate limit reached / TPM | OpenAI is being used, but the request is too large or too frequent. | Use fewer files, lower effort, smaller model, or split the task. |
What "code-referencing" means
I also saw this log:
[code-referencing] Public code references are enabled.
This was not the cause of the quota problem.
It means Copilot can check whether generated code is similar to public code and show references when needed.
It does not mean my whole project is being uploaded or that this feature is consuming my OpenAI quota.
Effort setting guide
For AI coding tools, effort should not be based only on task size.
Higher uncertainty = higher effort More execution volume = not necessarily higher effort
| Effort | Use it for |
|---|---|
| Low | Small syntax fix, CSS change, add one CSP domain, rename variable. |
| Medium | Normal coding task, inspect a few files, small refactor, add one feature. |
| High | Multi-file refactor, unclear bug, architecture decision, security-sensitive change. |
| XHigh | Large repo analysis, legacy migration planning, full documentation generation, unclear production issue. |
For example, adding one missing domain to a CSP config usually does not need a very powerful model or very high effort.
But asking the agent to review the whole CSP strategy for a production app should use a stronger model and higher effort.
My rule for agent work
For large tasks, I should not use the highest effort for everything.
A better pattern:
XHigh = Think High = Build Medium = Verify
Example:
Discovery and planning: XHigh Architecture plan: High or XHigh Implementation: High Small fixes: Medium Testing and documentation update: Medium
This keeps cost and token usage under control.
Why I did not fully abandon Copilot Chat
At one point, I considered moving away from Copilot Chat completely.
But that would mean changing my existing VS Code workflow, including:
- agent mode
- file and folder references
- MCP setup
- custom instructions
- tool permissions
- workspace-aware chat habits
So the better solution was not to migrate immediately.
Keep the VS Code chat workflow. Use OpenAI models explicitly. Change hidden utility models away from Copilot defaults. Avoid Auto / Vendor Default while Copilot quota is exhausted.
This allowed me to keep my current workflow while avoiding accidental Copilot quota usage.
Future revert guide
If I subscribe to Copilot again or get enough quota back, I can revert these settings:
chat.utilityModel = default chat.utilitySmallModel = default
I can also set these back to:
Chat Agent Default Model = Auto (Vendor Default) Plan Agent Default Model = Auto (Vendor Default) Edit Agent Default Model = Auto (Vendor Default)
But I should only do this if I intentionally want Copilot or VS Code to manage model selection again.
If I want predictable OpenAI usage and billing, I should keep explicit OpenAI model choices.
Final lesson
The biggest lesson:
Do not trust Auto or Vendor Default when debugging quota issues.
Always check the logs and confirm which model actually handled the request.
The model selected in the UI may not be the only model being used. VS Code can also call helper models for title generation, summarization, utility tasks, and context handling.
If Copilot quota is exhausted, those helper calls can fail even when the main OpenAI model succeeds.
The fix was to explicitly set the utility models to OpenAI models instead of leaving them as default.