Comparing Three AI Language Models

Published on
February 29, 2024
Tessa McDaniel
Marketing Team Lead

Comparing and contrasting ChatGPT, Copilot, and Gemini with different types of prompts.

While OpenAI's ChatGPT is the most talked about Generative AI chatbot, Microsoft Copilot and Google Gemini have come onto the scene with some force as well. Some AI language models are more suited for some tasks than others, and I do get into debates with my friends about which we use (if any), so why not compare the three and see what's different? This article isn't intended to pick a "best" model, just comparing and contrasting the responses I get from the same prompts. So let's go!

What is an AI Language Model?

Before we dive in, I want to define what an AI language model is along with some other useful terminology. These models are able to both understand and generate human language, using Natural Language Programming as the input. They can also be called "Generative AI chatbots", but that grossly simplifies their capabilities and limits them to just text, when the models can use various inputs, like images and voice. Microsoft Copilot has several different GPTs, or Generative Pre-training Transformers, tailored towards separate kinds of queries. Microsoft notes that while Copilot can reply to any queries with depth, the GPTs are trained on specific types of information, which can prevent hallucinations and provide more accurate responses.

Comparing and Contrasting ChatGPT, Copilot, and Gemini

I decided to compare each of the three AI language models by using two different types of prompts. The first is "Can you come up with some dinner recipes that are under 500 cal per serving?", which is something that anyone in any discipline in any country might need in their everyday lives. The second one I want to be more work-focused, but as I just wrote an article from the perspective of someone who writes for a living, I'm going a different direction with this one: "Can you write a Python function called color_print that prints text in the terminal in different colors or formatting using the colorama package and some defined ANSI escape sequences?" This is a function I've written before, so I (should) understand any errors the models generate. Let's start with the recipe one:

OpenAI ChatGPT

When I asked ChatGPT for dinner recipes under 500 calories, it instantly generated three different recipes. The "Lemon Herb Grilled Chicken" only consisted of chicken, though the last step recommended serving with grilled vegetables or a salad, which was not factored into the calorie count of 250-300 calories. It didn't ask me any clarifying questions about my preferences or dietary restrictions, which means that the next recipe, "Quinoa and Black Bean Stuffed Peppers", is not useful for me as I can't stand bell peppers and my partner doesn't like beans. The last recipe generated was "Zucchini Noodles with Turkey Meatballs." I love a good zoodle, plus this recipe is a complete meal, so out of the three recipes generated, only one is a winner for me. This was a very straightforward, short interaction.

Microsoft Copilot

A quick disclaimer: Copilot has several different GPTs, including a cooking assistant, which I used for this comparison. After asking the same starter question, the cooking assistant asked if I had any dietary restrictions "such as allergies, intolerances, or special diets." I responded, "No, but I don't like bell peppers and my partner doesn't like beans." Something to note is that the latency time for the cooking assistant is notably longer than that of ChatGPT, but that could be because it's conducting a search for each generated response as well. The cooking assistant then asked me if I have any cuisine preferences, to which I answered, "Yes, I like Japanese and Thai while my partner likes Italian." And it just kept going! The cooking assistant continued to ask questions: "What's your preference for the skill level required to prepare your dinner dishes?" "Do you have a preference for a specific meal type, such as soup, salad, sandwich, casserole, stir-fry, curry, etc.?" "What’s your preference for the available ingredients that you have or want to use?" I could go on.

But the most notable difference so far is that Copilot links to websites. Because I said that I like making noodle dishes, it embedded a link to a cold noodle salad with tahini dressing. This is a huge contrast to ChatGPT, which can neither accept links as input nor provide them as output. Copilot is also very personable; it restated my input at the start of each of its responses and replied to it as well, and the use of emojis made me feel like there was a person on the other side. At the end, after Copilot asked me six clarifying questions, it provided links to six recipes (two from each of my preferred cuisines). It also provided a synopsis of the dish and a look at the nutrition facts per serving. I'm starting to get really hungry looking at these. Unfortunately, when I navigated away from the page and went back, my chat history was gone. It seems the noodle salad with tahini dressing recipe is lost to the ages...

Google Gemini

Gemini's response to my request for dinner recipes under 500 calories was by far the most simplistic. While it generated ideas, like "Shrimp Scampi with Zucchini Noodles", it didn't actually provide a recipe, either as a link or a generated recipe, even though it calculated approximate calorie counts. I decided to ask a clarifying question: "I like the Shrimp Scampi with Zucchini Noodles idea. Can you give me a recipe for that?" Then it did generate a recipe along with the estimated calories per serving (minus the parmesan, but who wants a meal without cheese??), and that opened up a whole different dimension of Gemini. 

Even with the key (below), the color-coded lines are still confusing. The green lines do provide a link, but it seems to be more of a reference as if proving that lemon juice belongs in shrimp scampi, and the orange lines seem to be a disclaimer that no evidence for the presence of white wine in shrimp scampi was found. Then there's all the plain text that isn't "intended to convey factual information." While the intent of the color-coded highlights is likely to assure users about the authenticity of the generated results, it only succeeded in confusing me.


I know I said I was going to try a second prompt, and I did! This post is getting a bit long, so I'll be posting a second part, which will be linked here when it's live. For this particular recipe prompt, Copilot was able to provide the most personal suggestions, as it asked a lot of clarifying questions, but that can likely be attributed to me using the specific cooking assistant GPT. ChatGPT only generated two complete dinner recipes, and one of them had foods I didn't care for, though part of prompt engineering is knowing exactly what to ask to get the right answer. Gemini didn't generate any recipes at first, just ideas. But when prompted further, generated a really tasty-sounding recipe. I'm looking forward to putting these three through their paces with the Python function prompt!


No items found.

Subscribe to our Newsletter