Analyze artwork using GPT Vision and LLM
This past weekend I spent time diving deeper into OpenAI’s GPT-4 Vision capabilities. I figured this would be a good excuse to blend my interests in software development, AI/LLMs, and artwork. I wanted to see how quickly and effectively GPT-4V (gpt-4-1106-vision-preview) could analyze artwork given an uploaded image. I primarily wanted to do this for experimenting with my own artwork, but also interested in discovering what type of critique and feedback it would give on famous artist’s work.
For this project, GPT-4 is really doing all of the heavy lifting. With a carefully-crafted prompt, getting analysis feedback in a standard JSON format each time, I was able to reliably get back data that could be assembled into a very basic UI. I decided to use Python and Streamlit (https://streamlit.io/) to build a basic prototype, and released the code as open source to Github.
Check out this quick overview video:
The Art Analyzer App
Art Analyzer is an app that uses GPT Vision (See: OpenAI Platform) to identify artwork from images and AI language models like GPT-4 to provide detailed critiques of paintings, drawings, and other visual art forms. Users can upload an image of a piece for review, and the app will generate an analysis of the artwork covering composition, use of color, brushwork/texture, emotional impact, originality/creativity, and recommendations of similar artists and paintings.
This app is available to download and run locally, and also published at Streamlit – https://artanalyzer.streamlit.app/. An OpenAI API key is required to run and requires a subscription. You can obtain your own key at https://platform.openai.com/api-keys (Note: this key is not saved in *any* way, and after page reload will have to be entered again).
For quick examples, see:
Van Gogh’s – A Starry Night
The Results – Summary, Critique, Similar Artists, Similar Paintings
Along with the heavy lifting of this app coming from GPT-4 Vision APIs, the other secret sauce is the prompt used to analyze the image. Interestingly enough, the creation of this prompt was assisted by ChatGPT. What started as a question, turned into a complex prompt. Unfortunately with the complex prompt and in turn complex output, there was an easy way using this method to parse the output. So, going back to ChatGPT again to fine tune, I asked for the prompt output in JSON format.
What we effectively have here is a JSON definition of the output, with mutliple prompts included in the JSON itself. I did notice that periodically the results would have additional commentary at the end, so I included some verbiage in the prompt to ONLY include JSON as output. I think this approach is very powerful.
Can we really use AI as an art critic or expert? Yes and No. This app is a fun experiment to see how far along GPT Vision has come, and the results are generally very good and surprisingly detailed. But, if you are looking to analyze and critique artwork in a more official capacity feel free to defer to art historians, experts, and enthusiasts.
For the time being, this is a great showcase of the power of GPT Vision and LLM (Large Language Models). I can see many use cases where this sort of data and knowledge aggregation can be used to do analysis, research and incorporate this into personal workflows for creating new works of art.
This quick app was created and released as open source. Hopefully this example be expanded and also spark some new ideas for others.
StreamLit Demo – https://artanalyzer.streamlit.app/