Jun 10, 2026

Why VS Code Kept Using GitHub Copilot Quota Even After I Selected OpenAI Models

VS Code AI Setup

A practical troubleshooting note about VS Code Chat, Copilot quota, OpenAI models, utility models, and why "Auto" or "Vendor Default" can be misleading.

I recently ran into a confusing issue while using AI chat inside VS Code.

I had already reached my GitHub Copilot quota, so I wanted to use my own OpenAI API access instead. I selected OpenAI models in VS Code and expected the requests to stop using Copilot quota.

But I still kept seeing quota errors.

After checking the logs carefully, I found that the main model request was using OpenAI, but some hidden helper requests were still falling back to GitHub Copilot.

This article is my note for future reference, especially if I ever switch back to GitHub Copilot later.

The problem

The error looked something like this:

Sorry, your request failed. Please try again.

Reason: Response contained no choices.

At first, this looked like an OpenAI model response problem.

But the VS Code logs showed something more useful:

Server error: 402
You have exceeded your monthly quota
quotaExceeded | gpt-4o-mini

That means some request was still going through GitHub Copilot, even though I had already selected an OpenAI model elsewhere.

What was actually happening

My setup was not purely:

VS Code
→ OpenAI

It was more like:

VS Code Chat UI
→ Copilot Chat extension / VS Code chat wrapper
→ OpenAI model for the main answer
→ Copilot model for helper tasks

So the main chat response could succeed using OpenAI.

But VS Code still made extra helper requests for things like:

  • conversation title generation
  • utility tasks
  • context handling
  • summarization

Those helper requests were still using Copilot models.

The confusing part

The main OpenAI response could succeed, but a hidden Copilot helper request could fail because the Copilot quota was already exhausted.

Main OpenAI response = success
Hidden Copilot helper request = quota exceeded
Overall VS Code chat experience = unstable

The real trap: Auto / Vendor Default

Some VS Code settings were still set to:

Auto (Vendor Default)

or:

default

This sounds harmless, but it means:

Let VS Code or the active chat provider decide which model to use.

Because Copilot Chat was still part of the VS Code chat system, some background tasks continued to use Copilot models.

So even though I manually selected OpenAI for some visible model settings, hidden utility tasks still used Copilot.

The settings that fixed it

The important settings were:

chat.utilityModel
chat.utilitySmallModel

They were originally set to:

default

I changed both to an OpenAI mini model, for example:

chat.utilityModel = gpt-5.4-mini
chat.utilitySmallModel = gpt-5.4-mini
Result

After that, the logs stopped showing Copilot quota errors for title generation or utility tasks. Instead, I saw successful requests using OpenAI models.

Settings to check

If I want to avoid using GitHub Copilot quota, I should check these VS Code settings:

chat.utilityModel
chat.utilitySmallModel

Also check these model settings:

Chat Agent Default Model
Plan Agent Default Model
Edit Agent Default Model

Avoid leaving them as:

Auto (Vendor Default)
default

unless I intentionally want VS Code or Copilot to choose the model automatically.

My recommended setup

For my current workflow, I would use something like this:

Chat Agent: stronger OpenAI chat model
Plan Agent: Codex-style coding model
Edit Agent: Codex-style coding model
Utility Model: OpenAI mini model
Utility Small Model: OpenAI mini model

A more cost-conscious setup:

Chat Agent: OpenAI mini model
Plan Agent: Codex-style coding model
Edit Agent: Codex-style coding model
Utility Model: OpenAI mini model
Utility Small Model: OpenAI mini model

The key point is not the exact model name.

Do not leave utility models as default if Copilot quota is already exhausted.

How to read the logs

Some logs are mostly harmless:

Logged in
Got Copilot token
Public code references are enabled
Fetched model metadata

These do not automatically mean the request is using Copilot quota.

The important part is the actual model request result.

Good signs

success | OpenAI model name

Bad signs if Copilot quota is exhausted

quotaExceeded
gpt-4o-mini
You have exceeded your monthly quota
Failed to fetch conversation title

Different issue: OpenAI rate limit

Rate limit reached
tokens per min
TPM

This means OpenAI was actually being used, but the request was too large or too many tokens were used in a short time.

Copilot quota vs OpenAI rate limit

Issue Example log Meaning Possible fix
Copilot quota exceeded quotaExceeded | gpt-4o-mini Some request is still using GitHub Copilot quota. Change utility/helper models away from default/vendor default.
OpenAI rate limit Rate limit reached / TPM OpenAI is being used, but the request is too large or too frequent. Use fewer files, lower effort, smaller model, or split the task.

What "code-referencing" means

I also saw this log:

[code-referencing] Public code references are enabled.

This was not the cause of the quota problem.

It means Copilot can check whether generated code is similar to public code and show references when needed.

It does not mean my whole project is being uploaded or that this feature is consuming my OpenAI quota.

Effort setting guide

For AI coding tools, effort should not be based only on task size.

Higher uncertainty = higher effort
More execution volume = not necessarily higher effort
Effort Use it for
Low Small syntax fix, CSS change, add one CSP domain, rename variable.
Medium Normal coding task, inspect a few files, small refactor, add one feature.
High Multi-file refactor, unclear bug, architecture decision, security-sensitive change.
XHigh Large repo analysis, legacy migration planning, full documentation generation, unclear production issue.

For example, adding one missing domain to a CSP config usually does not need a very powerful model or very high effort.

But asking the agent to review the whole CSP strategy for a production app should use a stronger model and higher effort.

My rule for agent work

For large tasks, I should not use the highest effort for everything.

A better pattern:

XHigh = Think
High = Build
Medium = Verify

Example:

Discovery and planning: XHigh
Architecture plan: High or XHigh
Implementation: High
Small fixes: Medium
Testing and documentation update: Medium

This keeps cost and token usage under control.

Why I did not fully abandon Copilot Chat

At one point, I considered moving away from Copilot Chat completely.

But that would mean changing my existing VS Code workflow, including:

  • agent mode
  • file and folder references
  • MCP setup
  • custom instructions
  • tool permissions
  • workspace-aware chat habits

So the better solution was not to migrate immediately.

Keep the VS Code chat workflow.
Use OpenAI models explicitly.
Change hidden utility models away from Copilot defaults.
Avoid Auto / Vendor Default while Copilot quota is exhausted.

This allowed me to keep my current workflow while avoiding accidental Copilot quota usage.

Future revert guide

If I subscribe to Copilot again or get enough quota back, I can revert these settings:

chat.utilityModel = default
chat.utilitySmallModel = default

I can also set these back to:

Chat Agent Default Model = Auto (Vendor Default)
Plan Agent Default Model = Auto (Vendor Default)
Edit Agent Default Model = Auto (Vendor Default)

But I should only do this if I intentionally want Copilot or VS Code to manage model selection again.

If I want predictable OpenAI usage and billing, I should keep explicit OpenAI model choices.

Final lesson

The biggest lesson:

Do not trust Auto or Vendor Default when debugging quota issues.

Always check the logs and confirm which model actually handled the request.

The model selected in the UI may not be the only model being used. VS Code can also call helper models for title generation, summarization, utility tasks, and context handling.

If Copilot quota is exhausted, those helper calls can fail even when the main OpenAI model succeeds.

The fix was to explicitly set the utility models to OpenAI models instead of leaving them as default.

No comments:

Post a Comment

Hey, thank you for spending time leaving some thoughts, that would be really helpful as encouragement for us to write more quality articles! Thank you!