Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gh models view command #6

Merged
merged 33 commits into from
Oct 9, 2024
Merged

Add gh models view command #6

merged 33 commits into from
Oct 9, 2024

Conversation

cheshire137
Copy link
Member

@cheshire137 cheshire137 commented Oct 8, 2024

I referenced the kind of information shown on a page like https://github.com/marketplace/models/azure-openai/gpt-4o-mini. I think there's still additional formatting we could do on the output produced, but I thought this was a good enough first pass to get some kind of details view in there.

screenshot of ./gh-models view gpt-4o-mini in a terminal

Sample output:

% script/build && ./gh-models view gpt-4o-mini
Building extension (GOOS= GOARCH=)
Output: gh-models
Display name:            OpenAI GPT-4o mini
Summary name:            gpt-4o-mini
Publisher:               OpenAI
Summary:                 An affordable, efficient AI solution for diverse text and image tasks.
Context:                 up to 131072 input tokens and 4096 output tokens
Rate limit tier:         low
Tags:                    multipurpose, multilingual, multimodal
Supported input types:   text, image, audio
Supported output types:  text
Supported languages:     English, Italian, Afrikaans, Spanish, German, French, Indonesian, Russian, Polish, Ukrainian, Greek, Latvian, Chinese, A...
License:                 custom
License description:     Use of Azure OpenAI Service is subject to applicable Microsoft                                                          
Product Terms https://www.microsoft.com/licensing/terms/welcome/welcomepage including the Universal License Terms for   
Microsoft Generative AI Services and the service-specific terms for the Azure OpenAI product offering.                  

                       
Description:             GPT-4o mini enables a broad range of tasks with its low cost and latency, such as applications that chain or parallelize
multiple model calls (e.g., calling multiple APIs), pass a large volume of context to the model (e.g., full code base or
conversation history), or interact with customers through fast, real-time text responses (e.g., customer support        
chatbots).                                                                                                              
                                                                                                                        
Today, GPT-4o mini supports text and vision in the API, with support for text, image, video and audio inputs and outputs
coming in the future. The model has a context window of 128K tokens and knowledge up to October 2023. Thanks to the     
improved tokenizer shared with GPT-4o, handling non-English text is now even more cost effective.                       
                                                                                                                        
GPT-4o mini surpasses GPT-3.5 Turbo and other small models on academic benchmarks across both textual intelligence and  
multimodal reasoning, and supports the same range of languages as GPT-4o. It also demonstrates strong performance in    
function calling, which can enable developers to build applications that fetch data or take actions with external       
systems, and improved long-context performance compared to GPT-3.5 Turbo.                                               
                                                                                                                        
Resources                                                                                                               
OpenAI announcement https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/                         

                       
Notes:                   Model Provider                                                                                                          
This model is provided through the Azure OpenAI service.                                                                
                                                                                                                        
Relevant documents                                                                                                      
The following documents are applicable:                                                                                 
                                                                                                                        
Overview of Responsible AI practices for Azure OpenAI models https://learn.microsoft.com/en-us/legal/cognitive-         
services/openai/overview                                                                                                
Transparency Note for Azure OpenAI Service https://learn.microsoft.com/en-us/legal/cognitive-services/openai/transparency-
note                                                                                                                    
                                                                                                                        
Acknowledgments                                                                                                         
Leads: Jacob Menick, Kevin Lu, Shengjia Zhao, Eric Wallace, Hongyu Ren, Haitang Hu, Nick Stathas, Felipe Petroski Such  
                                                                                                                        
Program Lead: Mianna Chen                                                                                               
                                                                                                                        
Contributions noted in https://openai.com/gpt-4o-contributions/                                                         
                                                                                                                        
Responsible AI Considerations                                                                                           
Built-in safety measures - Safety is built into our models from the beginning, and reinforced at every step of our      
development process. In pre-training, we filter out information that we do not want our models to learn from or output, 
such as hate speech, adult content, sites that primarily aggregate personal information, and spam. In post-training, we 
align the model's behavior to our policies using techniques such as reinforcement learning with human feedback (RLHF) to
improve the accuracy and reliability of the models' responses.                                                          
                                                                                                                        
GPT-4o mini has the same safety mitigations built-in as GPT-4o, which we carefully assessed using both automated and human
evaluations according to our Preparedness Framework and in line with our voluntary commitments. More than 70 external   
experts in fields like social psychology and misinformation tested GPT-4o to identify potential risks, which we have    
addressed and plan to share the details of in the forthcoming GPT-4o system card and Preparedness scorecard. Insights   
from these expert evaluations have helped improve the safety of both GPT-4o and GPT-4o mini.                            
                                                                                                                        
Building on these learnings, our teams also worked to improve the safety of GPT-4o mini using new techniques informed by
our research. GPT-4o mini in the API is the first model to apply our instruction hierarchy method, which helps to improve
the model's ability to resist jailbreaks, prompt injections, and system prompt extractions. This makes the model's      
responses more reliable and helps make it safer to use in applications at scale.                                        
                                                                                                                        
We'll continue to monitor how GPT-4o mini is being used and improve the model's safety as we identify new risks.        
                                                                                                                        
Content Filtering                                                                                                       
Prompts and completions are passed through a default configuration of Azure AI Content Safety classification models to  
detect and prevent the output of harmful content. Learn more about Azure AI Content Safety                              
https://learn.microsoft.com/en-us/azure/ai-services/content-safety/overview. Additional classification models and       
configuration options are available when you deploy an Azure OpenAI model in production; learn more                     
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter?tabs=warning%2Cuser-prompt%2Cpython-new.

                       
Evaluation:              GPT-4o mini surpasses GPT-3.5 Turbo and other small models on academic benchmarks across both textual intelligence and  
multimodal reasoning, and supports the same range of languages as GPT-4o. It also demonstrates strong performance in    
function calling, which can enable developers to build applications that fetch data or take actions with external       
systems, and improved long-context performance compared to GPT-3.5 Turbo.                                               
                                                                                                                        
GPT-4o mini has been evaluated across several key benchmarks.                                                           
                                                                                                                        
Reasoning tasks: GPT-4o mini is better than other small models at reasoning tasks involving both text and vision, scoring
82.0% on MMLU, a textual intelligence and reasoning benchmark, as compared to 77.9% for Gemini Flash and 73.8% for      
Claude Haiku.                                                                                                           
                                                                                                                        
Math and coding proficiency: GPT-4o mini excels in mathematical reasoning and coding tasks, outperforming previous small
models on the market. On MGSM, measuring math reasoning, GPT-4o mini scored 87.0%, compared to 75.5% for Gemini Flash and
71.7% for Claude Haiku. GPT-4o mini scored 87.2% on HumanEval, which measures coding performance, compared to 71.5% for 
Gemini Flash and 75.9% for Claude Haiku.                                                                                
                                                                                                                        
Multimodal reasoning: GPT-4o mini also shows strong performance on MMMU, a multimodal reasoning eval, scoring 59.4%     
compared to 56.1% for Gemini Flash and 50.2% for Claude Haiku.                                                          
                                                                                                                        
               TASK              | GPT-4O MINI SCORE | GEMINI FLASH SCORE | CLAUDE HAIKU SCORE                          
---------------------------------+-------------------+--------------------+---------------------                        
  MMLU (Reasoning Text and       | 82.0%             | 77.9%              | 73.8%                                       
  Vision)                        |                   |                    |                                             
  MGSM (Math Reasoning)          | 87.0%             | 75.5%              | 71.7%                                       
  HumanEval (Coding Performance) | 87.2%             | 71.5%              | 75.9%                                       
  MMMU (Multimodal Reasoning)    | 59.4%             | 56.1%              | 50.2%                                       
                                                                                                                        
Source: GPT-4o mini: advancing cost-efficient intelligence https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-
intelligence/.

@cheshire137 cheshire137 self-assigned this Oct 8, 2024
@cheshire137 cheshire137 changed the title Add gh models view command Add gh models view command and update run model validation Oct 8, 2024
@cheshire137 cheshire137 changed the title Add gh models view command and update run model validation Add gh models view command Oct 9, 2024
@cheshire137 cheshire137 marked this pull request as ready for review October 9, 2024 21:47
@cheshire137 cheshire137 requested a review from sgoedecke October 9, 2024 21:48
Copy link
Collaborator

@sgoedecke sgoedecke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work!

@@ -234,7 +234,7 @@ func NewRunCommand() *cobra.Command {

foundMatch := false
for _, model := range models {
if strings.EqualFold(model.FriendlyName, modelName) || strings.EqualFold(model.Name, modelName) {
if model.HasName(modelName) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@cheshire137 cheshire137 requested a review from a team as a code owner October 9, 2024 21:55
@cheshire137
Copy link
Member Author

cheshire137 commented Oct 9, 2024

I feel like such a rebel, merging this without any passing tests. 😅 I swear it works on my machine, though!

@cheshire137 cheshire137 merged commit ec140bd into main Oct 9, 2024
3 checks passed
@cheshire137 cheshire137 deleted the view-cmd branch October 9, 2024 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants