GPT-4 Image Input: Can you give image input and how to use GPT-4V?

GPT-4 is more creative and collaborative than ever before. However, one of its most important features, in addition to its creativity and reasoning, is its multimodal capability or using GPT-4 with image input. 

It can accept not only text as input but also images. So, how can we send images as input to the GPT-4 model, and can it read and understand them? In this article, we will thoroughly examine this topic.

Until now, OpenAI’s GPT models had the capability of being language models, meaning they could only accept text as input and generate text as output. With the introduction of the GPT-4 model, this has changed. GPT-4, under the name GPT-4V(ision), can now accept images as input.

How can you use GPT-4 model with Images?

When OpenAI first announced that GPT-4 is capable of accepting image inputs, they also mentioned on their website that image inputs are currently in a research preview phase and not publicly available.

However, on September 25, 2023, which is 195 days after the announcement of the GPT-4 model, OpenAI finally introduced the GPT-4-V version, which can accept image inputs.

To expand the availability of image input capability, OpenAI is collaborating closely with a partner called “Be My Eyes” to begin the process. Once the company makes image inputs generally available, they will announce it here.

GPT-4 Vision (GPT-4V)

On September 25, 2023, OpenAI announced on its website that they will release GPT-4 V(ision), an enhanced version of their GPT-4 model capable of processing image inputs. GPT-4V is the name of this model, which can handle image inputs and is now broadly available.

Additionally, OpenAI shared a system card document outlining the evaluations, preparation, and mitigation efforts for the GPT-4 version that can process image inputs.

Accessing and Using GPT-4 Model

Currently, there are two different ways to access and use the GPT-4 model. The first one is through OpenAI’s ChatGPT platform. 

OpenAI integrated GPT-4 into the ChatGPT platform exclusively for ChatGPT Plus members. If you are a ChatGPT Plus member, you can use the GPT-4 model through ChatGPT. 

OpenAI also stated that the image input feature coming to GPT-4 will be extended to ChatGPT.

Another way to invoke the GPT-4 model is by using the APIs provided by OpenAI. This method requires a basic level of programming knowledge. 

By purchasing and obtaining tokens from the OpenAI GPT-4 API, you can make calls to the model programmatically, sending parameters as input.

You can currently utilize only GPT-4’s text input capabilities when using the API. However, with GPT-4V, you will have the ability to send image input through the API as well.

Below is a simplified code snippet demonstrating how you can call the GPT-4 API using Python. For more detailed information.

import requests

def call_gpt4_api(prompt):
    api_key = 'YOUR_API_KEY'
    api_url = 'https://api.openai.com/v1/gpt-4.0/completions'
    headers = {
        'Authorization': f'Bearer {api_key}',
        'Content-Type': 'application/json'
    }
    data = {
        'prompt': prompt,
        'max_tokens': 100,
        'temperature': 0.8,
        'n': 1
    }

    try:
        response = requests.post(api_url, headers=headers, json=data)
        response.raise_for_status()
        generated_text = response.json()['choices'][0]['text']
        print(generated_text)
    except requests.exceptions.RequestException as err:
        print(f"An error occurred: {err}")

# Example usage
prompt = 'Once upon a time'
call_gpt4_api(prompt)

GPT-4 Model With Image Capabilities

While the image input capability of GPT-4 is not available at the moment, OpenAI documented its capacity to accept and understand images when they announced GPT-4. 

Below, we have compiled some examples shared by OpenAI that demonstrate GPT-4’s ability to take images as input:

In the screenshot below, an image has been sent to GPT-4. The image consists of three panels. In one panel, a phone is being attempted to charge using a VGA cable. The question asked to GPT-4 was about what is funny or humorous in this image. GPT-4 was able to analyze the image and understand the joke or the strange situation depicted. It could provide a suitable comment based on its understanding.

In the image below, it can be observed that the GPT-4 model can receive three files as input and read and understand the text within these files. It is capable of answering questions related to the content of these files and summarizing them.

The examples provided here have been taken from the GPT-4 research paper shared by OpenAI. If you’d like, you can directly review the paper yourself.

Can GPT-4 model generate images?

The GPT-4 model is capable of accepting images as input, but it can only generate text as output. This means that if you want to generate images, you should use other models specialized in image generation, such as OpenAI’s DALL-E 3 model.

However, there are different approaches you can try. For example, using ChatGPT or GPT-4, you can generate images in ASCII or SVG formats. Although these images may not be in the conventional sense we are familiar with, they demonstrate the models’ abilities to create images using text and code.

GPT-4 vs ChatGPT: What is the difference?

GPT-4 model is not the same as ChatGPT. ChatGPT is a chatbot application that specializes in interactive dialogues and is based on the GPT-3.5 model as its standard model. 

After the release of GPT-4, ChatGPT Plus members have been given the opportunity to use the GPT-4 model through the ChatGPT interface. 

You can also use ChatGPT with image input. Simply upload a photo, screenshot, or document to ChatGPT, and then ask any questions or seek information related to that specific image.

The other feature is ChatGPT Code interpreter model which allows you to upload files to ChatGPT. Then you can ask it to analyze those files.

Conclusion

The GPT-4 model possesses the capability of being a multi-model, allowing you to provide images as input. 

Unlike other GPT models, GPT-4 can understand and interpret images, grasping their details and providing comments on them. This feature sets it apart from other GPT models, making it the most up-to-date and advanced language model currently released by OpenAI.

Leave a Comment