Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copilot Chat lmTools API does not reliably use tool responses #225737

Closed
benmcmorran opened this issue Aug 15, 2024 · 3 comments
Closed

Copilot Chat lmTools API does not reliably use tool responses #225737

benmcmorran opened this issue Aug 15, 2024 · 3 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug chat verified Verification succeeded
Milestone

Comments

@benmcmorran
Copy link
Member

Does this issue occur when all extensions are disabled?: No, depends on Copilot Chat and another extension to use the lmTools API proposal

  • VS Code Version: 1.93.0-insider (user setup)
  • OS Version: Windows_NT x64 10.0.22631
  • GitHub Copilot Chat Version: v0.20.2024081501 (pre-release)

Steps to Reproduce:

  1. Run npx --package yo --package generator-code -- yo code and accept all prompts to create a new TypeScript extension.
  2. Open the new extension folder in VS Code.
  3. Edit package.json to add these blocks:
  "enabledApiProposals": [
    "lmTools"
  ],
  "contributes": {
    <existing contribution points>,
    "languageModelTools": [
      {
          "id": "cpptools-lmtool-configuration",
          "name": "cpp",
          "displayName": "C/C++ configuration",
          "canBeInvokedManually": true,
          "userDescription": "Configuration of the active C or C++ file, like language standard version and target platform.",
          "modelDescription": "For the active C or C++ file, this tool provides: the language (C, C++, or CUDA), the language standard version (such as C++11, C++14, C++17, or C++20), the compiler (such as GCC, Clang, or MSVC), the target platform (such as x86, x64, or ARM), and the target architecture (such as 32-bit or 64-bit).",
          "icon": "$(file-code)",
          "parametersSchema": {}
      }
    ]
  },
  1. Run npx vscode-dts dev.
  2. Edit src/extension.ts to add the following:
	context.subscriptions.push(vscode.lm.registerTool('cpptools-lmtool-configuration', {
		async invoke(parameters, token) {
			return {
				toString() {
					return "This C++ project uses the GCC compiler.";
				},
			}
		},
	}));
  1. Start debugging the extension.
  2. Open a C++ file. I used a file from z3, but I don't think the exact file will matter, and it shouldn't matter if the C/C++ extension is installed or correctly configured.
  3. In Copilot Chat, type "What compiler does this C++ project use? #cpp". This is designed to be a very direct question that should be exactly answered by the hardcoded tool response.
  4. GitHub Copilot responds with a verbose non-committal answer.

Image

Note that with verbose logging enabled, it's clear that GitHub Copilot did run the registered tool and include the tool response in the request.

[
    {
        "role": "system",
        <system prompt content>
    },
    {
        "role": "user",
        "content": "Excerpt from active file maxcore.cpp, lines 1 to 56:\n```cpp\n/*++\r\nCopyright (c) 2014 Microsoft Corporation\r\n\r\nModule Name:\r\n\r\n    maxcore.cpp\r\n\r\nAbstract:\r\n\r\n    Core based (weighted) max-sat algorithms:\r\n\r\n    - mu:         max-sat algorithm by Nina and Bacchus, AAAI 2014.\r\n    - mus-mss:    based on dual refinement of bounds.\r\n    - binary:     binary version of maxres\r\n    - rc2:        implementation of rc2 heuristic using cardinality constraints\r\n    - rc2t:       implementation of rc2 heuristic using totalizerx\r\n    - rc2-binary: hybrid of rc2 and binary maxres. Perform one step of binary maxres. \r\n                  If there are more than 16 soft constraints create a cardinality constraint.\r\n\r\n\r\n    MaxRes is a core-guided approach to maxsat.\r\n    MusMssMaxRes extends the core-guided approach by\r\n    leveraging both cores and satisfying assignments\r\n    to make progress towards a maximal satisfying assignment.\r\n\r\n    Given a (minimal) unsatisfiable core for the soft\r\n    constraints the approach works like max-res.\r\n    Given a (maximal) satisfying subset of the soft constraints\r\n    the approach updates the upper bound if the current assignment\r\n    improves the current best assignment.\r\n    Furthermore, take the soft constraints that are complements\r\n    to the current satisfying subset.\r\n    E.g, if F are the hard constraints and\r\n    s1,...,sn, t1,..., tm are the soft clauses and\r\n    F & s1 & ... & sn is satisfiable, then the complement\r\n    of of the current satisfying subset is t1, .., tm.\r\n    Update the hard constraint:\r\n         F := F & (t1 or ... or tm)\r\n    Replace t1, .., tm by m-1 new soft clauses:\r\n         t1 & t2, t3 & (t1 or t2), t4 & (t1 or t2 or t3), ..., tn & (t1 or ... t_{n-1})\r\n    Claim:\r\n       If k of these soft clauses are satisfied, then k+1 of\r\n       the previous soft clauses are satisfied.\r\n       If k of these soft clauses are false in the satisfying assignment\r\n       for the updated F, then k of the original soft clauses are also false\r\n       under the assignment.\r\n       In summary: any assignment to the new clauses that satisfies F has the\r\n       same cost.\r\n    Claim:\r\n       If there are no satisfying assignments to F, then the current best assignment\r\n       is the optimum.\r\n\r\nAuthor:\r\n\r\n    Nikolaj Bjorner (nbjorner) 2014-20-7\n```\n\n"
    },
    {
        "role": "user",
        "content": "What compiler does this C++ project use? #cpp "
    },
    {
        "role": "function",
        "content": "This C++ project uses the GCC compiler.",
        "name": "cpptools-lmtool-configuration"
    },
    {
        "role": "user",
        "content": "Above are the results of calling some functions. The user cannot see these results, so you should explain them to the user if referencing them in your answer."
    }
]

cc @isidorn @lukka @spebl @sinemakinci1 @esweet431

@roblourens
Copy link
Member

I've seen similar behavior, and have been experimenting with this. I think there are two things that I need to do

  • Our API doesn't allow you to include the Assistant message that contains the function call and args, and we should include this to put the function result in context for the next request to the LLM
  • I am only passing it the list of functions when asking it to use a tool. If I don't include this, then it seems to have trouble understanding the function result. It doesn't always ignore it but it often does.

I've been looking at the API part but I just realized this morning that the second point seems pretty impactful, so I will include the list of functions that have been used in the conversation thread, with function_call: 'none', that seems to give better results.

And yes internally we're also still on OpenAI's deprecated "functions" API, we will move to the "tools" API soon, maybe that will help too.

@roblourens roblourens added the bug Issue identified by VS Code Team member as probable bug label Aug 16, 2024
@roblourens roblourens added this to the August 2024 milestone Aug 16, 2024
@roblourens
Copy link
Member

I tried using a user message instead of a role=function message

Request: [
  {
    "role": "user",
    "content": "What compiler does this C++ project use?"
  },
  {
    "role": "assistant",
    "content": "This C++ project uses the GCC compiler."
  },
  {
    "role": "user",
    "content": "This C++ project uses the GCC compiler."
  },
  {
    "role": "user",
    "content": "What compiler does this C++ project use?"
  }
]
RESPONSE:
[
  {
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "I can find out the compiler being used for this project. Please hold on for a moment while I retrieve the information.",
      "refusal": null
    },
    "logprobs": null,
    "finish_reason": "stop"
  }
]

?????

@roblourens
Copy link
Member

Anyway, we already label context messages for the old variables API, I had just hoped I could get away from that, but something like this is still needed.

Request: [
  {
    "role": "user",
    "content": "What compiler does this C++ project use?"
  },
  {
    "role": "assistant",
    "content": "This C++ project uses the GCC compiler."
  },
  {
    "role": "user",
    "content": "C/C++ Configuration Context:\nThis C++ project uses the GCC compiler."
  },
  {
    "role": "user",
    "content": "What compiler does this C++ project use?"
  }
]
RESPONSE:
[
  {
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Based on the provided configuration context, this C++ project uses the GCC compiler.",
      "refusal": null
    },
    "logprobs": null,
    "finish_reason": "stop"
  }
]

And another reason I won't go the function_call: 'none' route with function messages is that this is leading to the responses like I can find out the compiler being used for this project. Please hold on for a moment while I retrieve the information. claiming that it will call a tool when that's not allowed.

@rzhao271 rzhao271 added the verified Verification succeeded label Aug 28, 2024
@vs-code-engineering vs-code-engineering bot locked and limited conversation to collaborators Sep 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug chat verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

5 participants