M.B.A. Students vs. ChatGPT: Who Comes Up With More Innovative Ideas?

How good is AI in generating new ideas?

The conventional wisdom has been not very good. Identifying opportunities for new ventures, generating a solution for an unmet need, or naming a new company are unstructured tasks that seem ill-suited for algorithms. Yet recent advances in AI, and specifically the advent of large language models like ChatGPT, are challenging these assumptions.

We have taught innovation, entrepreneurship and product design for many years. For the first assignment in our innovation courses at the Wharton School, we ask students to generate a dozen or so ideas for a new product or service. As a result, we have heard several thousand new venture ideas pitched by undergraduate students, M.B.A. students and seasoned executives. Some of these ideas are awesome, some are awful, and, as you would expect, most are somewhere in the middle.

The library of ideas, though, allowed us to set up a simple competition to judge who is better at generating innovative ideas: the human or the machine.

In this competition, which we ran together with our colleagues Lennart Meincke and Karan Girotra, humanity was represented by a pool of 200 randomly selected ideas from our Wharton students. The machines were represented by ChatGPT4, which we instructed to generate 100 ideas with otherwise identical instructions as given to the students: “generate an idea for a new product or service appealing to college students that could be made available for $50 or less.”

In addition to this vanilla prompt, we also asked ChatGPT for another 100 ideas after providing a handful of examples of successful ideas from past courses (in other words, a trained GPT group), providing us with a total sample of 400 ideas.

Collapsible laundry hamper, dorm-room chef kit, ergonomic cushion for hard classroom seats, and hundreds more ideas miraculously spewed from a laptop.

How to compare

The academic literature on ideation postulates three dimensions of creative performance: the quantity of ideas, the average quality of ideas, and the number of truly exceptional ideas.

First, on the number of ideas per unit of time: Not surprisingly, ChatGPT easily outperforms us humans on that dimension. Generating 200 ideas the old-fashioned way requires days of human work, while ChatGPT can spit out 200 ideas with about an hour of supervision.

Next, to assess the quality of the ideas, we market tested them. Specifically, we took each of the 400 ideas and put them in front of a survey panel of customers in the target market via an online purchase-intent survey. The question we asked was: “How likely would you be to purchase based on this concept if it were available to you?” The possible responses ranged from definitely wouldn’t purchase to definitely would purchase.

The responses can be translated into a purchase probability using simple market-research techniques. The average purchase probability of a human-generated idea was 40%, that of vanilla GPT-4 was 47%, and that of GPT-4 seeded with good ideas was 49%. In short, ChatGPT isn’t only faster but also on average better at idea generation.

Still, when you’re looking for great ideas, averages can be misleading. In innovation, it’s the exceptional ideas that matter: Most managers would prefer one idea that is brilliant and nine ideas that are flops over 10 decent ideas, even if the average quality of the latter option might be higher. To capture this perspective, we investigated only the subset of the best ideas in our pool—specifically the top 10%. Of these 40 ideas, five were generated by students and 35 were created by ChatGPT (15 from the vanilla ChatGPT set and 20 from the pre trained ChatGPT set). Once again, ChatGPT came out on top.

What it means

We believe that the 35-to-5 victory of the machine in generating exceptional ideas (not to mention the dramatically lower production costs) has substantial implications for how we think about creativity and innovation.

First, generative AI has brought a new source of ideas to the world. Not using this source would be a sin. It doesn’t matter if you are working on a pitch for your local business-plan competition or if you are seeking a cure for cancer—every innovator should develop the habit of complementing his or her own ideas with the ones created by technology. Ideation will always have an element of randomness to it, and so we cannot guarantee that your idea will get an A+, but there is no excuse left if you get a C.

Second, the bottleneck for the early phases of the innovation process in organisations now shifts from generating ideas to evaluating ideas. Using a large language model, an innovator can produce a spreadsheet articulating hundreds of ideas, which likely include a few blockbusters. This abundance then demands an effective selection mechanism to find the needles in the haystack.

To date, these models appear to perform no better than any single expert in their ability to predict commercial viability. Using a sample of a dozen or so independent evaluations from potential customers in the target market—a wisdom of crowds approach—remains the best strategy. Fortunately, screening ideas using a purchase intent survey of customers in the target market is relatively fast and cheap.

Finally, rather than thinking about a competition between humans and machines, we should find a way in which the two work together. This approach in which AI takes on the role of a co-pilot has already emerged in software development. For example, our human (pilot) innovator might identify an open problem. The AI (co-pilot) might then report what is known about the problem, followed by an effort in which the human and AI independently explore possible solutions, virtually guaranteeing a thorough consideration of opportunities.

The human decision maker is likely ultimately responsible for the outcome, and so will likely make the screening and selection decisions, informed by customer research and possibly by the opinion of the AI co-pilot. We predict such a human-machine collaboration will deliver better products and services to the market, and improved solutions for whatever society needs in the future.

Christian Terwiesch and Karl Ulrich are professors of operations, information and decisions at the Wharton School of the University of Pennsylvania, where Terwiesch also co-directs the Mack Institute for Innovation Management.

As tech leaders race to bring Windows systems back online after Friday’s software update by cybersecurity company CrowdStrike crashed around 8.5 million machines worldwide, experts share with CIO Journal their takeaways for preparing for the next major information technology outage.

Be familiar with how vendors develop, test and release their software

IT leaders should hold vendors deeply integrated within IT systems, such as CrowdStrike , to a “very high standard” of development, release quality and assurance, said Neil MacDonald , a Gartner vice president.

“Any security vendor has a responsibility to do extensive regression testing on all versions of Windows before an update is rolled out,” he said.

That involves asking existing vendors to explain how they write software, what testing they do and whether customers may choose how quickly to roll out an update.

“Incidents like this remind all of us in the CIO community of the importance of ensuring availability, reliability and security by prioritizing guardrails such as deployment and testing procedures and practices,” said Amy Farrow, chief information officer of IT automation and security company Infoblox.

Re-evaluate how your firm accepts software updates from ‘trusted’ vendors

While automatically accepting software updates has become the norm—and a recommended security practice—the CrowdStrike outage is a reminder to take a pause, some CIOs said.

“We still should be doing the full testing of packages and upgrades and new features,” said Paul Davis, a field chief information security officer at software development platform maker JFrog . undefined undefined Though it’s not feasible to test every update, especially for as many as hundreds of software vendors, Davis said he makes it a priority to test software patches according to their potential severity and size.

Automation, and maybe even artificial intelligence-based IT tools, can help.

“Humans are not very good at catching errors in thousands of lines of code,” said Jack Hidary, chief executive of AI and quantum company SandboxAQ. “We need AI trained to look for the interdependence of new software updates with the existing stack of software.”

Develop a disaster recovery plan

An incident rendering Windows computers unusable is similar to a natural disaster with systems knocked offline, said Gartner’s MacDonald. That’s why businesses should consider natural disaster recovery plans for maintaining the resiliency of their operations.

One way to do that is to set up a “clean room,” or an environment isolated from other systems, to use to bring critical systems back online, according to Chirag Mehta, a cybersecurity analyst at Constellation Research.

Businesses should also hold tabletop exercises to simulate risk scenarios, including IT outages and potential cyber threats, Mehta said.

Companies that back up data regularly were likely less impacted by the CrowdStrike outage, according to Victor Zyamzin, chief business officer of security company Qrator Labs. “Another suggestion for companies, and we’ve been saying that again and again for decades, is that you should have some backup procedure applied, running and regularly tested,” he said.

Review vendor and insurance contracts

For any vendor with a significant impact on company operations , MacDonald said companies can review their contracts and look for clauses indicating the vendors must provide reliable and stable software.

“That’s where you may have an advantage to say, if an update causes an outage, is there a clause in the contract that would cover that?” he said.

If it doesn’t, tech leaders can aim to negotiate a discount serving as a form of compensation at renewal time, MacDonald added.

The outage also highlights the importance of insurance in providing companies with bottom-line protection against cyber risks, said Peter Halprin, a partner with law firm Haynes Boone focused on cyber insurance.

This coverage can include protection against business income losses, such as those associated with an outage, whether caused by the insured company or a service provider, Halprin said.

Weigh the advantages and disadvantages of the various platforms

The CrowdStrike update affected only devices running Microsoft Windows-based systems , prompting fresh questions over whether enterprises should rely on Windows computers.

CrowdStrike runs on Windows devices through access to the kernel, the part of an operating system containing a computer’s core functions. That’s not the same for Apple ’s Mac operating system and Linux, which don’t allow the same level of access, said Mehta.

Some businesses have converted to Chromebooks , simple laptops developed by Alphabet -owned Google that run on the Chrome operating system . “Not all of them require deeper access to things,” Mehta said. “What are you doing on your laptop that actually requires Windows?”