M.B.A. Students vs. ChatGPT: Who Comes Up With More Innovative Ideas?
We put humans and AI to the test. The results weren’t even close.
We put humans and AI to the test. The results weren’t even close.
How good is AI in generating new ideas?
The conventional wisdom has been not very good. Identifying opportunities for new ventures, generating a solution for an unmet need, or naming a new company are unstructured tasks that seem ill-suited for algorithms. Yet recent advances in AI, and specifically the advent of large language models like ChatGPT, are challenging these assumptions.
We have taught innovation, entrepreneurship and product design for many years. For the first assignment in our innovation courses at the Wharton School, we ask students to generate a dozen or so ideas for a new product or service. As a result, we have heard several thousand new venture ideas pitched by undergraduate students, M.B.A. students and seasoned executives. Some of these ideas are awesome, some are awful, and, as you would expect, most are somewhere in the middle.
The library of ideas, though, allowed us to set up a simple competition to judge who is better at generating innovative ideas: the human or the machine.
In this competition, which we ran together with our colleagues Lennart Meincke and Karan Girotra, humanity was represented by a pool of 200 randomly selected ideas from our Wharton students. The machines were represented by ChatGPT4, which we instructed to generate 100 ideas with otherwise identical instructions as given to the students: “generate an idea for a new product or service appealing to college students that could be made available for $50 or less.”
In addition to this vanilla prompt, we also asked ChatGPT for another 100 ideas after providing a handful of examples of successful ideas from past courses (in other words, a trained GPT group), providing us with a total sample of 400 ideas.
Collapsible laundry hamper, dorm-room chef kit, ergonomic cushion for hard classroom seats, and hundreds more ideas miraculously spewed from a laptop.
The academic literature on ideation postulates three dimensions of creative performance: the quantity of ideas, the average quality of ideas, and the number of truly exceptional ideas.
First, on the number of ideas per unit of time: Not surprisingly, ChatGPT easily outperforms us humans on that dimension. Generating 200 ideas the old-fashioned way requires days of human work, while ChatGPT can spit out 200 ideas with about an hour of supervision.
Next, to assess the quality of the ideas, we market tested them. Specifically, we took each of the 400 ideas and put them in front of a survey panel of customers in the target market via an online purchase-intent survey. The question we asked was: “How likely would you be to purchase based on this concept if it were available to you?” The possible responses ranged from definitely wouldn’t purchase to definitely would purchase.
The responses can be translated into a purchase probability using simple market-research techniques. The average purchase probability of a human-generated idea was 40%, that of vanilla GPT-4 was 47%, and that of GPT-4 seeded with good ideas was 49%. In short, ChatGPT isn’t only faster but also on average better at idea generation.
Still, when you’re looking for great ideas, averages can be misleading. In innovation, it’s the exceptional ideas that matter: Most managers would prefer one idea that is brilliant and nine ideas that are flops over 10 decent ideas, even if the average quality of the latter option might be higher. To capture this perspective, we investigated only the subset of the best ideas in our pool—specifically the top 10%. Of these 40 ideas, five were generated by students and 35 were created by ChatGPT (15 from the vanilla ChatGPT set and 20 from the pre trained ChatGPT set). Once again, ChatGPT came out on top.
We believe that the 35-to-5 victory of the machine in generating exceptional ideas (not to mention the dramatically lower production costs) has substantial implications for how we think about creativity and innovation.
First, generative AI has brought a new source of ideas to the world. Not using this source would be a sin. It doesn’t matter if you are working on a pitch for your local business-plan competition or if you are seeking a cure for cancer—every innovator should develop the habit of complementing his or her own ideas with the ones created by technology. Ideation will always have an element of randomness to it, and so we cannot guarantee that your idea will get an A+, but there is no excuse left if you get a C.
Second, the bottleneck for the early phases of the innovation process in organisations now shifts from generating ideas to evaluating ideas. Using a large language model, an innovator can produce a spreadsheet articulating hundreds of ideas, which likely include a few blockbusters. This abundance then demands an effective selection mechanism to find the needles in the haystack.
To date, these models appear to perform no better than any single expert in their ability to predict commercial viability. Using a sample of a dozen or so independent evaluations from potential customers in the target market—a wisdom of crowds approach—remains the best strategy. Fortunately, screening ideas using a purchase intent survey of customers in the target market is relatively fast and cheap.
Finally, rather than thinking about a competition between humans and machines, we should find a way in which the two work together. This approach in which AI takes on the role of a co-pilot has already emerged in software development. For example, our human (pilot) innovator might identify an open problem. The AI (co-pilot) might then report what is known about the problem, followed by an effort in which the human and AI independently explore possible solutions, virtually guaranteeing a thorough consideration of opportunities.
The human decision maker is likely ultimately responsible for the outcome, and so will likely make the screening and selection decisions, informed by customer research and possibly by the opinion of the AI co-pilot. We predict such a human-machine collaboration will deliver better products and services to the market, and improved solutions for whatever society needs in the future.
Christian Terwiesch and Karl Ulrich are professors of operations, information and decisions at the Wharton School of the University of Pennsylvania, where Terwiesch also co-directs the Mack Institute for Innovation Management.
Rugged coastal drives and fireside drams define a slow, indulgent journey through Scotland’s far north.
A haven for hedge-fund titans and Hollywood grandees, Greenwich is one of the world’s most expensive residential enclaves, where eye-watering prices meet unapologetic grandeur.
The lunar flyby would be the deepest humans have traveled in space in decades.
It’s go time for the highest-stakes mission at NASA in more than 50 years.
On April 1, the agency is set to launch four astronauts around the moon, the deepest human spaceflight since the final Apollo lunar landing in 1972.
The launch window for Artemis II , as the mission is called, opens at 6:24 p.m. ET.
National Aeronautics and Space Administration teams have been preparing the vehicles to depart from Florida’s Kennedy Space Center on the planned roughly 10-day trip. Crew members have trained for years for this moment.
Reid Wiseman, the NASA astronaut serving as mission commander, said he doesn’t fear taking the voyage. A widower, he does worry at times about what he is putting his daughters through.
“I could have a very comfortable life for them,” Wiseman said in an interview last September.
“But I’m also a human, and I see the spirit in their eyes that is burning in my soul too. And so we’ve just got to never stop going.”
Wiseman’s crewmates on Artemis II are NASA’s Victor Glover and Christina Koch, as well as Canadian Space Agency astronaut Jeremy Hansen.

What are the goals for Artemis II?
The biggest one: Safely fly the crew on vehicles that have never carried astronauts before.
The towering Space Launch System rocket has the job of lofting a vehicle called Orion into space and on its way to the moon.
Orion is designed to carry the crew around the moon and back. Myriad systems on the ship—life support, communications, navigation—will be tested with the astronauts on board.
SLS and Orion don’t have much flight experience. The vehicles last flew in 2022, when the agency completed its uncrewed Artemis I mission .
How is the mission expected to unfold?
Artemis II will begin when SLS takes off from a launchpad in Florida with Orion stacked on top of it.
The so-called upper stage of SLS will later separate from the main part of the rocket with Orion attached, and use its engine to set up the latter vehicle for a push to the moon.
After Orion separates from the upper stage, it will conduct what is called a translunar injection—the engine firing that commits Orion to soaring out to the moon. It will fly to the moon over the course of a few days and travel around its far side.
Orion will face a tough return home after speeding through space. As it hits Earth’s atmosphere, Orion will be flying at 25,000 miles an hour and face temperatures of 5,000 degrees as it slows down. The capsule is designed to land under parachutes in the Pacific Ocean, not far from San Diego.

Is it possible Artemis II will be delayed?
Yes.
For safety reasons, the agency won’t launch if certain tough weather conditions roll through the Cape Canaveral, Fla., area. Delays caused by technical problems are possible, too. NASA has other dates identified for the mission if it doesn’t begin April 1.
Who are the astronauts flying on Artemis II?
The crew will be led by Wiseman, a retired Navy pilot who completed military deployments before joining NASA’s astronaut corps. He traveled to the International Space Station in 2014.
Two other astronauts will represent NASA during the mission: Glover, an experienced Navy pilot, and Koch, who began her career as an electrical engineer for the agency and once spent a year at a research station in the South Pole. Both have traveled to the space station before.
Hansen is a military pilot who joined Canada’s astronaut corps in 2009. He will be making his first trip to space.
Koch’s participation in Artemis II will mark the first time a woman has flown beyond orbits near Earth. Glover and Hansen will be the first African-American and non-American astronauts, respectively, to do the same.
What will the astronauts do during the flight?
The astronauts will evaluate how Orion flies, practice emergency procedures and capture images of the far side of the moon for scientific and exploration purposes (they may become the first humans to see parts of the far side of the lunar surface). Health-tracking projects of the astronauts are designed to inform future missions.
Those efforts will play out in Orion’s crew module, which has about two minivans worth of living area.
On board, the astronauts will spend about 30 minutes a day exercising, using a device that allows them to do dead lifts, rowing and more. Sleep will come in eight-hour stretches in hammocks.
There is a custom-made warmer for meals, with beef brisket and veggie quiche on the menu.
Each astronaut is permitted two flavored beverages a day, including coffee. The crew will hold one hourlong shared meal each day.
The Universal Waste Management System—that’s the toilet—uses air flow to pull fluid and solid waste away into containers.
What happens after Artemis II?
Assuming it goes well, NASA will march on to Artemis III, scheduled for next year. During that operation, NASA plans to launch Orion with crew members on board and have the ship practice docking with lunar-lander vehicles that Elon Musk’s SpaceX and Jeff Bezos’ Blue Origin have been developing. The rendezvous operations will occur relatively close to Earth.
NASA hopes that its contractors and the agency itself are ready to attempt one or more lunar landing missions in 2028. Many current and former spaceflight officials are skeptical that timeline is feasible.