Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: GPT-4 doesn't support vision & file search in Jan #3520

Closed
1 task done
Tracked by #3505
imtuyethan opened this issue Sep 2, 2024 · 4 comments
Closed
1 task done
Tracked by #3505

bug: GPT-4 doesn't support vision & file search in Jan #3520

imtuyethan opened this issue Sep 2, 2024 · 4 comments
Assignees
Labels
category: multimodal Vision, audio, video, etc category: providers Local & remote inference providers category: tools RAG, web search, files, function calling move to Cortex type: bug Something isn't working
Milestone

Comments

@imtuyethan
Copy link
Contributor

imtuyethan commented Sep 2, 2024

  • I have searched the existing issues

Current behavior

https://discord.com/channels/1107178041848909847/1279700963439022090
Users are unable to use file-related features in Jan, including GPT-4 Vision capabilities (image analysis) and document upload for RAG (Retrieval-Augmented Generation), despite having added their OpenAI API key and selecting the GPT-4 model. These functionalities should be available but appear to be non-functional.

Minimum reproduction step

  1. Add OpenAI API key to Jan
  2. Select GPT-4 model
  3. Attempt to send an image for analysis
  4. Attempt to upload a document (e.g., PDF) for RAG

Expected behavior

Jan should be able to:

  • Process images using GPT-4's vision capabilities
  • Allow document uploads for RAG
  • Provide responses or allow for questions about the uploaded content

Screenshots / Logs

Screenshot 2024-09-02 at 5 44 24 PM

Jan version

v0.5.3

In which operating systems have you tested?

Operating System: Pop!_OS 22.04
KDE Plasma Version: 5.24.7
KDE Frameworks Version: 5.92.0
Qt Version: 5.15.3
Kernel Version: 6.9.3-76060903-generic (64-bit)
Graphics Platform: X11
Processors: 16 × 13th Gen Intel® Core™ i5-13500H
Memory: 15.3 GiB of RAM
Graphics Processor: Mesa Intel® Graphics

Btw, I have a dedicated Nvidia 4060 as well.

@imtuyethan imtuyethan added the type: bug Something isn't working label Sep 2, 2024
@imtuyethan imtuyethan moved this to Need Investigation in Jan & Cortex Sep 2, 2024
@imtuyethan imtuyethan changed the title bug: GPT-4 doesn't support have vision & file search in Jan bug: GPT-4 doesn't support vision & file search in Jan Sep 18, 2024
@imtuyethan imtuyethan added the category: providers Local & remote inference providers label Sep 18, 2024
@freelerobot
Copy link
Contributor

related #3505

@freelerobot freelerobot added category: tools RAG, web search, files, function calling category: multimodal Vision, audio, video, etc labels Oct 14, 2024
@imtuyethan
Copy link
Contributor Author

Close this ticket as dup?

@freelerobot
Copy link
Contributor

discussions: Remote API Extension #3505

Leave it open bc:

  1. We haven't really impl RAG yet
  2. We don't know RAG will work with remote endpoints

@imtuyethan imtuyethan added this to the v0.5.12 milestone Dec 9, 2024
@imtuyethan imtuyethan moved this from Investigating to Planning in Jan & Cortex Dec 9, 2024
@imtuyethan imtuyethan moved this from Planning to Scheduled in Jan & Cortex Dec 9, 2024
@imtuyethan imtuyethan moved this from Scheduled to Eng Review in Jan & Cortex Dec 9, 2024
@dan-menlo dan-menlo moved this from Eng Review to Scheduled in Jan & Cortex Dec 13, 2024
@dan-menlo dan-menlo moved this from Scheduled to Investigating in Jan & Cortex Dec 13, 2024
@imtuyethan imtuyethan moved this from Investigating to QA in Jan & Cortex Dec 16, 2024
@imtuyethan
Copy link
Contributor Author

OpenAI GPT-4o model’s vision capability is enabled:

Screenshot 2024-12-17 at 11 01 18 AM

@imtuyethan imtuyethan moved this from QA to Completed in Jan & Cortex Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: multimodal Vision, audio, video, etc category: providers Local & remote inference providers category: tools RAG, web search, files, function calling move to Cortex type: bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

3 participants