-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arbitrary code execution in LLMMathChain #8363
Comments
Thanks for raising! We are working to move the LLMMathChain out of the langchain core library, for reasons such as this |
Answer generated by a 🤖 AnswerI understand your concern about the potential security vulnerability in the LLMMathChain class of LangChain. You've correctly identified that the Here's the code snippet in question: def _evaluate_expression(self, expression: str) -> str:
try:
local_dict = {"pi": math.pi, "e": math.e}
output = str(
numexpr.evaluate(
expression.strip(),
global_dict={}, # restrict access to globals
local_dict=local_dict, # add common mathematical functions
)
)
except Exception as e:
raise ValueError(
f'LLMMathChain._evaluate("{expression}") raised error: {e}.'
" Please try again with a valid numerical expression"
)
# Remove any leading and trailing brackets from the output
return re.sub(r"^\[|\]$", "", output) In this code, This means that a malicious user could potentially execute arbitrary code on the server where LangChain is running by passing a specially crafted expression to the However, please note that this is just one potential interpretation of the issue. If you believe this interpretation is correct and that this is a bug that could impact other users, I encourage you to make a pull request to address this issue. If your experience contradicts this interpretation, I would suggest upgrading to the latest version of LangChain to see if the issue persists. SourcesThis response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. |
@hinthornw As an alternative, we could use I was testing some multi-step QA workflows inspired by a Langchain multi-step agent QA workflow, and was using the Happy to attach some examples / create a PR to scope out the replacement if people agree. It's what I've been using locally and it works great! |
That makes sense to me. I'd be happy to review a PR! Thank you for being proactive about this! |
I'd just like to point out that there still needs to be some kind of validation/protection because sympy.sympify uses eval and shouldn't be used on unsanitized input as per official docs. I think it could also be worth looking at this langchain PR that implemented sympy in RestrictedPython as an inspiration. |
Ah damn, yeah I missed that: was foolishly going off the StackOverflow thread and a memory of a still-unresolved effort within Sympify to remove eval / add a safe-mode.
|
Hi all, input sanitization has been added in numexpr version 2.8.6 for Since that is the case, I'd like to ask @hinthornw to advise on the preferable next course of action, that is whether to just bump the numexpr version, replace numexpr completely with Sympy and a secure container/environment (since it'd be vulnerable by itself) or other solutions to this. Thank you! |
Hi @hinthornw, Any ETA on resolving this issue? |
This vulnerability is getting flagged by InfoSec teams. Any idea on when the update is being released? |
@elmegatan26 , @tabdunabi , @jan-kubena , numexpr is now (on master) an optional dependency, we've also added a constraint to specify that code only works with >=2.8.6 (which has input sanitization).
|
@eyurtsev, for our use case, we do not use |
@eyurtsev We are not using LLMathChain but similar to @tabdunabi the vulnerability is automatically detected and flagged. Example: https://security.snyk.io/package/pip/langchain and https://security.snyk.io/package/pip/langchain/0.0.306 |
@eyurtsev just installed |
@tabdunabi we're taking a look. @elmegatan26 @tabdunabi Are the other CVEs a blocker for you right now? |
Yes @eyurtsev they are blockers. Interestingly, for |
@eyurtsev Yes, any CVE ranked High or Critical is a blocker. Any vulnerabilities found by most infosec teams are flagged and Devs are required to patch or prove the issue is not exploitable. |
Hi everyone, Thank you for opening the issue. Just to clarify the status: From the Snyk website, it seems no version of Langchain is completely free from critical issues. |
@tabdunabi Information about LLMathChain should now be reflected. @dvirginz Not at the moment. We're working on addressing all the CVEs.
|
Hi!
Thank you for the update and detailed response.
We look forward to a solution, as it will help us a lot.
Message ID: ***@***.***>
ᐧ
|
hi @eyurtsev, first apologies for pining you again. We are delaying our release for the CVEs to be patched. I see https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAIN-5850009 has been patched in Any chance you can accelerate PR #5640, referenced in issue #7700 as providing a fix for https://security.snyk.io/vuln/SNYK-PYTHON-LANGCHAIN-5843727 . |
Hi @tabdunabi, realistic timeline is 1-3 weeks. Are you relying on any of the agents; i.e., the pandas agent, xorbits agent or spark agent (which dependent on the python ast tool?). |
If we aren't using agents or LLMMathChain, is there a clear version we can already use? |
Thank you @eyurtsev for the update. Currently, we are not using LangChain agents. However, our build pipeline fails because of any security vulnerabilities in the libraries shipped with our solutions. So, even though we are not using LangChain agents, we need to go through a formal approval process with our security team, and prove the vulnerable code is not actually used by our solutions, to be able to get an exception to release. Additionally, once we publish our code on GitHub, GitHub Dependabot will flag these security vulnerabilities, and we need to address auto-cut tickets for our team. So, it would be much easier, and safer, to ship code with zero security vulnerabilities in downstream dependencies. |
Targeting end of month 10/28 (will announce in a bit with expected changes) to resolve the CVE to allow existing users to migrate. In the meantime, are you able to fork and remove affected code? |
Thank you @eyurtsev!. |
#11680 -- announcement |
Python AST tool CVE was resolved here: #12427 cc @tabdunabi / @dvirginz (The original CVE for LLMathChain was resolved a while back -- closing this issue.) As of release: https://github.com/langchain-ai/langchain/releases/tag/v0.0.325 |
Thank you @eyurtsev !. We will immediately upgrade LangChain version, used by our code, to |
Thank you. It seems that Snyk still identifies 1 high-risk CVE in
LangChain. Any thoughts? Thanks.
https://security.snyk.io/package/pip/langchain
|
@dvirginz that's the CVE that got patched with this release. It can take up to several days for the CVE to information to be updated in the relevant databases. |
Can this [CVE-2023-39631] (http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2023-39631) be closed off with the above fix? |
That CVE was resolved as well a while back: GHSA-f73w-4m7g-ch9x Locking conversation |
System Info
Langchain version: 0.0.244
Numexpr version: 2.8.4
Python version: 3.10.11
Who can help?
@hwchase17 @vowelparrot
Information
Related Components
Reproduction
Numexpr's evaluate function that Langchain uses here in the LLMMathChain is susceptible to arbitrary code execution with eval in the latest released version. See this issue where PoC for numexpr's evaluate is also provided.
This vulnerability allows an arbitrary code execution, that is to run code and commands on target machine, via LLMMathChain's run method with the right prompt. I'd like to ask the Langchain's maintainers to confirm if they want a full PoC with Langchain posted here publicly.
Expected behavior
Numerical expressions should be evaluated securely so as to not allow code execution.
The text was updated successfully, but these errors were encountered: