Monitor and test the capabilities of OpenAI models with Python (Part 1: command line)

For developers in the field of artificial intelligence, verifying the accuracy of AI models is crucial. This blog introduces a Python script designed to automatically check the capabilities of OpenAI models directly from the command line.

Objective of the Script

The script interfaces with the OpenAI API to fetch all available GPT models, tests each by issuing a standardized query, and records the accuracy of their responses. This process is essential for quickly determining which models perform reliably.

How the Script Operates

  1. Model Retrieval: Retrieves a comprehensive list of GPT models available under your OpenAI API subscription.
  2. Performance Testing: Sends a predetermined prompt (“What is the capital of Norway?”) to each model and evaluates the response for accuracy.
  3. Results Compilation: Compiles the outcomes, indicating whether each model’s response was correct.

Setup Requirements

  • Python Installation: Ensure Python is installed on your system.
  • OpenAI API Key: An active API key from OpenAI is required, which should be securely configured in your environment variables.
  • Python Libraries: Utilize pandas for data management and the openai package for making API requests.

Executing the Script

Execute the script from your command line, making sure your API key is properly set up in your system’s environment variables to avoid any authentication issues.

Below is an example of the output of the script


The source code of the script can be found here: Code on Github

Upcoming Enhancement

In a future post, the script will be adapted for Streamlit, which will allow users to display the output directly on a web page.
This will provide a more interactive and user-friendly approach to evaluating model performances.