• Skip to main content
  • Keyboard shortcuts for audio player

A new AI chatbot might do your homework for you. But it's still not an A+ student

Emma Bowman, photographed for NPR, 27 July 2019, in Washington DC.

Emma Bowman

check gpt homework

Enter a prompt into ChatGPT, and it becomes your very own virtual assistant. OpenAI/Screenshot by NPR hide caption

Enter a prompt into ChatGPT, and it becomes your very own virtual assistant.

Why do your homework when a chatbot can do it for you? A new artificial intelligence tool called ChatGPT has thrilled the Internet with its superhuman abilities to solve math problems, churn out college essays and write research papers.

After the developer OpenAI released the text-based system to the public last month, some educators have been sounding the alarm about the potential that such AI systems have to transform academia, for better and worse.

"AI has basically ruined homework," said Ethan Mollick, a professor at the University of Pennsylvania's Wharton School of Business, on Twitter.

The tool has been an instant hit among many of his students, he told NPR in an interview on Morning Edition , with its most immediately obvious use being a way to cheat by plagiarizing the AI-written work, he said.

Academic fraud aside, Mollick also sees its benefits as a learning companion.

Opinion: Machine-made poetry is here

Opinion: Machine-made poetry is here

He's used it as his own teacher's assistant, for help with crafting a syllabus, lecture, an assignment and a grading rubric for MBA students.

"You can paste in entire academic papers and ask it to summarize it. You can ask it to find an error in your code and correct it and tell you why you got it wrong," he said. "It's this multiplier of ability, that I think we are not quite getting our heads around, that is absolutely stunning," he said.

A convincing — yet untrustworthy — bot

But the superhuman virtual assistant — like any emerging AI tech — has its limitations. ChatGPT was created by humans, after all. OpenAI has trained the tool using a large dataset of real human conversations.

"The best way to think about this is you are chatting with an omniscient, eager-to-please intern who sometimes lies to you," Mollick said.

It lies with confidence, too. Despite its authoritative tone, there have been instances in which ChatGPT won't tell you when it doesn't have the answer.

That's what Teresa Kubacka, a data scientist based in Zurich, Switzerland, found when she experimented with the language model. Kubacka, who studied physics for her Ph.D., tested the tool by asking it about a made-up physical phenomenon.

"I deliberately asked it about something that I thought that I know doesn't exist so that they can judge whether it actually also has the notion of what exists and what doesn't exist," she said.

ChatGPT produced an answer so specific and plausible sounding, backed with citations, she said, that she had to investigate whether the fake phenomenon, "a cycloidal inverted electromagnon," was actually real.

When she looked closer, the alleged source material was also bogus, she said. There were names of well-known physics experts listed – the titles of the publications they supposedly authored, however, were non-existent, she said.

"This is where it becomes kind of dangerous," Kubacka said. "The moment that you cannot trust the references, it also kind of erodes the trust in citing science whatsoever," she said.

Scientists call these fake generations "hallucinations."

"There are still many cases where you ask it a question and it'll give you a very impressive-sounding answer that's just dead wrong," said Oren Etzioni, the founding CEO of the Allen Institute for AI , who ran the research nonprofit until recently. "And, of course, that's a problem if you don't carefully verify or corroborate its facts."

check gpt homework

Users experimenting with the chatbot are warned before testing the tool that ChatGPT "may occasionally generate incorrect or misleading information." OpenAI/Screenshot by NPR hide caption

An opportunity to scrutinize AI language tools

Users experimenting with the free preview of the chatbot are warned before testing the tool that ChatGPT "may occasionally generate incorrect or misleading information," harmful instructions or biased content.

Sam Altman, OpenAI's CEO, said earlier this month it would be a mistake to rely on the tool for anything "important" in its current iteration. "It's a preview of progress," he tweeted .

The failings of another AI language model unveiled by Meta last month led to its shutdown. The company withdrew its demo for Galactica, a tool designed to help scientists, just three days after it encouraged the public to test it out, following criticism that it spewed biased and nonsensical text.

AI-generated fake faces have become a hallmark of online influence operations

Untangling Disinformation

Ai-generated fake faces have become a hallmark of online influence operations.

Similarly, Etzioni says ChatGPT doesn't produce good science. For all its flaws, though, he sees ChatGPT's public debut as a positive. He sees this as a moment for peer review.

"ChatGPT is just a few days old, I like to say," said Etzioni, who remains at the AI institute as a board member and advisor. It's "giving us a chance to understand what he can and cannot do and to begin in earnest the conversation of 'What are we going to do about it?' "

The alternative, which he describes as "security by obscurity," won't help improve fallible AI, he said. "What if we hide the problems? Will that be a recipe for solving them? Typically — not in the world of software — that has not worked out."

Logo

Upload a screenshot and solve any math problem instantly with MathGPT!

Drag & drop an image file here, or click to select an image.

GPT

ChatGPT: A GPT-4 Turbo Upgrade and Everything Else to Know

It started as a research project. But ChatGPT has swept us away with its mind-blowing skills. Now, GPT-4 Turbo has improved in writing, math, logical reasoning and coding.

check gpt homework

  • Shankland covered the tech industry for more than 25 years and was a science writer for five years before that. He has deep expertise in microprocessors, digital photography, computer hardware and software, internet standards, web technology, and more.

OpenAI's logo, a hexagonal rosette pattern

In 2022, OpenAI wowed the world when it introduced ChatGPT and showed us a chatbot with an entirely new level of power, breadth and usefulness, thanks to the generative AI technology behind it. Since then, ChatGPT has continued to evolve, including its most recent development: access to its latest GPT-4 Turbo model for paid users.

ChatGPT and generative AI aren't a novelty anymore, but keeping track of what they can do can be a challenge as new abilities arrive. Most notably, OpenAI now provides easier access to anyone who wants to use it. It also lets anyone write custom AI apps called GPTs and share them on its own app store, while on a smaller scale ChatGPT can now speak its responses to you. OpenAI has been leading the generative AI charge , but it's hotly pursued by Microsoft, Google and startups far and wide.

AI atlas logo with a woman materializing from particles of a globe

Generative AI still hasn't shaken a core problem -- it makes up information that sounds plausible but isn't necessarily correct. But there's no denying AI has fired the imaginations of computer scientists, loosened the purse strings of venture capitalists and caught the attention of everyone from teachers to doctors to artists and more, all wondering how AI will change their work and their lives. 

If you're trying to get a handle on ChatGPT, this FAQ is for you. Here's a look at what's up.

Read more :  ChatGPT 3.5 Review: First Doesn't Mean Best

What is ChatGPT?

ChatGPT is an online chatbot that responds to "prompts" -- text requests that you type. ChatGPT has countless uses . You can request relationship advice, a summarized history of punk rock or an explanation of the ocean's tides. It's particularly good at writing software, and it can also handle some other technical tasks, like creating 3D models .

ChatGPT is called a generative AI because it generates these responses on its own. But it can also display more overtly creative output like screenplays, poetry, jokes and student essays. That's one of the abilities that really caught people's attention.

Much of AI has been focused on specific tasks, but ChatGPT is a general-purpose tool. This puts it more into a category like a search engine.

That breadth makes it powerful but also hard to fully control. OpenAI has many mechanisms in place to try to screen out abuse and other problems, but there's an active cat-and-mouse game afoot by researchers and others who try to get ChatGPT to do things like offer bomb-making recipes.

ChatGPT really blew people's minds when it began passing tests. For example, AnsibleHealth researchers reported in 2023 that " ChatGPT performed at or near the passing threshold " for the United States Medical Licensing Exam, suggesting that AI chatbots "may have the potential to assist with medical education, and potentially, clinical decision-making."

We're a long way from fully fledged doctor-bots you can trust, but the computing industry is investing billions of dollars to solve the problems and expand AI into new domains like visual data too. OpenAI is among those at the vanguard. So strap in, because the AI journey is going to be a sometimes terrifying, sometimes exciting thrill.

What's ChatGPT's origin?

Artificial intelligence algorithms had been ticking away for years before ChatGPT arrived. These systems were a big departure from traditional programming, which follows a rigid if-this-then-that approach. AI, in contrast, is trained to spot patterns in complex real-world data. AI has been busy for more than a decade screening out spam, identifying our friends in photos, recommending videos and translating our Alexa voice commands into computerese.

A Google technology called transformers helped propel AI to a new level, leading to a type of AI called a large language model, or LLM. These AIs are trained on enormous quantities of text, including material like books, blog posts, forum comments and news articles. The training process internalizes the relationships between words, letting chatbots process input text and then generate what it believes to be appropriate output text. 

A second phase of building an LLM is called reinforcement learning through human feedback, or RLHF. That's when people review the chatbot's responses and steer it toward good answers or away from bad ones. That significantly alters the tool's behavior and is one important mechanism for trying to stop abuse.

OpenAI's LLM is called GPT, which stands for "generative pretrained transformer." Training a new model is expensive and time consuming, typically taking weeks and requiring a data center packed with thousands of expensive AI acceleration processors. OpenAI's latest LLM is called GPT-4 Turbo . Other LLMs include Google's Gemini (formerly called Bard), Anthropic's Claude and Meta's Llama .

ChatGPT is an interface that lets you easily prompt GPT for responses. When it arrived as a free tool in November 2022, its use exploded far beyond what OpenAI expected.

When OpenAI launched ChatGPT, the company didn't even see it as a product. It was supposed to be a mere "research preview," a test that could draw some feedback from a broader audience, said ChatGPT product leader Nick Turley. Instead, it went viral, and OpenAI scrambled to just keep the service up and running under the demand.

"It was surreal," Turley said. "There was something about that release that just struck a nerve with folks in a way that we certainly did not expect. I remember distinctly coming back the day after we launched and looking at dashboards and thinking, something's broken, this couldn't be real, because we really didn't make a very big deal out of this launch."

An OpenAI lapel pin with the company's logo and the word

ChatGPT, a name only engineers could love, was launched as a research project in November 2022, but quickly caught on as a consumer product.

How do I use ChatGPT?

The ChatGPT website is the most obvious method. Open it up, select the LLM version you want from the drop-down menu in the upper left corner, and type in a query.

As of April 1, OpenAI is allowing consumers to use ChatGPT without first signing up for an account. According to a blog post , the move was meant to make the tool more accessible. OpenAI also said in the post that as part of the move, it's introducing added content safeguards, blocking prompts in a wider range of categories.

However, users with accounts will be able to do more with the tool, such as save and review their history, share conversations and tap into features like voice conversations and custom instructions.

OpenAI in 2023 released a ChatGPT app for iPhones and for Android phones . In February, ChatGPT for Apple Vision Pro arrived , too, adding the chatbot's abilities to the "spatial computing" headset. Be careful to look for the genuine article, because other developers can create their own chatbot apps that link to OpenAI's GPT.

In January, OpenAI opened its GPT Store , a collection of custom AI apps that focus ChatGPT's all-purpose design to specific jobs. A lot more on that later, but in addition to finding them through the store you can invoke them with the @ symbol in a prompt, the way you might tag a friend on Instagram.

Microsoft uses GPT for its Bing search engine, which means you can also try out ChatGPT there.

ChatGPT is sprouting up in various hardware devices, including Volkswagen EVs , Humane's voice-controlled AI pin and the squarish Rabbit R1 device .

How much does ChatGPT cost?

It's free, though you have to set up an account to take advantage of all of its features.

For more capability, there's also a subscription called ChatGPT Plus that costs $20 per month that offers a variety of advantages: It responds faster, particularly during busy times when the free version is slow or sometimes tells you to try again later. It also offers access to newer AI models, including GPT-4 Turbo . OpenAI said it has improved capabilities in writing, math, logical reasoning and coding in this model.

The free ChatGPT uses the older GPT-3.5, which doesn't do as well on OpenAI's benchmark tests but which is faster to respond. The newest variation, GPT-4 Turbo, arrived in late 2023 with more up-to-date responses and an ability to ingest and output larger blocks of text.

ChatGPT is growing beyond its language roots. With ChatGPT Plus, you can upload images, for example, to ask what type of mushroom is in a photo.

Perhaps most importantly, ChatGPT Plus lets you use GPTs.

What are these GPTs?

GPTs are custom versions of ChatGPT from OpenAI, its business partners and thousands of third-party developers who created their own GPTs.

Sometimes when people encounter ChatGPT, they don't know where to start. OpenAI calls it the "empty box problem." Discovering that led the company to find a way to narrow down the choices, Turley said.

"People really benefit from the packaging of a use case -- here's a very specific thing that I can do with ChatGPT," like travel planning, cooking help or an interactive, step-by-step tool to build a website, Turley said.

OpenAI CEO Sam Altman stands in front of a black screen that shows the term

OpenAI CEO Sam Altman announces custom AI apps called GPTs at a developer event in November 2023.

Think of GPTs as OpenAI trying to make the general-purpose power of ChatGPT more refined the same way smartphones have a wealth of specific tools. (And think of GPTs as OpenAI's attempt to take control over how we find, use and pay for these apps, much like Apple has a commanding role over iPhones through its App Store.)

What GPTs are available now?

OpenAI's GPT store now offers millions of GPTs , though as with smartphone apps, you'll probably not be interested in most of them. A range of GPT custom apps are available, including AllTrails personal trail recommendations , a Khan Academy programming tutor , a Canva design tool , a book recommender , a fitness trainer , the laundry buddy clothes washing label decoder, a music theory instructor , a haiku writer and the Pearl for Pets for vet advice bot .

One person excited by GPTs is Daniel Kivatinos, co-founder of financial services company JustPaid . His team is building a GPT designed to take a spreadsheet of financial data as input and then let executives ask questions. How fast is a startup going through the money investors gave it? Why did that employee just file a $6,000 travel expense?

JustPaid hopes that GPTs will eventually be powerful enough to accept connections to bank accounts and financial software, which would mean a more powerful tool. For now, the developers are focusing on guardrails to avoid problems like hallucinations -- those answers that sound plausible but are actually wrong -- or making sure the GPT is answering based on the users' data, not on some general information in its AI model, Kivatinos said.

Anyone can create a GPT, at least in principle. OpenAI's GPT editor walks you through the process with a series of prompts. Just like the regular ChatGPT, your ability to craft the right prompt will generate better results.

Another notable difference from regular ChatGPT: GPTs let you upload extra data that's relevant to your particular GPT, like a collection of essays or a writing style guide.

Some of the GPTs draw on OpenAI's Dall-E tool for turning text into images, which can be useful and entertaining. For example, there is a coloring book picture creator , a logo generator and a tool that turns text prompts into diagrams like company org charts. OpenAI calls Dall-E a GPT.

How up to date is ChatGPT?

Not very, and that can be a problem. For example, a Bing search using ChatGPT to process results said OpenAI hadn't yet released its ChatGPT Android app. Search results from traditional search engines can help to "ground" AI results, and indeed that's part of the Microsoft-OpenAI partnership that can tweak ChatGPT Plus results.

GPT-4 Turbo, announced in November, is trained on data up through April 2023. But it's nothing like a search engine whose bots crawl news sites many times a day for the latest information.

Can you trust ChatGPT responses?

No. Well, sometimes, but you need to be wary.

Large language models work by stringing words together, one after another, based on what's probable each step of the way. But it turns out that LLM's generative AI works better and sounds more natural with a little spice of randomness added to the word selection recipe. That's the basic statistical nature that underlies the criticism that LLMs are mere "stochastic parrots" rather than sophisticated systems that in some way understand the world's complexity.

The result of this system, combined with the steering influence of the human training, is an AI that produces results that sound plausible but that aren't necessarily true. ChatGPT does better with information that's well represented in training data and undisputed -- for instance, red traffic signals mean stop, Plato was a philosopher who wrote the Allegory of the Cave , an Alaskan earthquake in 1964 was the largest in US history at magnitude 9.2.

ChatGPT response asking about tips for writing good prompts

We humans interact with AI chatbots by writing prompts -- questions or statements that seek an answer from the information stored in the chatbot's underlying large language model. 

When facts are more sparsely documented, controversial or off the beaten track of human knowledge, LLMs don't work as well. Unfortunately, they sometimes produce incorrect answers with a convincing, authoritative voice. That's what tripped up a lawyer who used ChatGPT to bolster his legal case only to be reprimanded when it emerged he used ChatGPT fabricated some cases that appeared to support his arguments. "I did not comprehend that ChatGPT could fabricate cases ," he said, according to The New York Times.

Such fabrications are called hallucinations in the AI business.

That means when you're using ChatGPT, it's best to double check facts elsewhere.

But there are plenty of creative uses for ChatGPT that don't require strictly factual results.

Want to use ChatGPT to draft a cover letter for a job hunt or give you ideas for a themed birthday party? No problem. Looking for hotel suggestions in Bangladesh? ChatGPT can give useful travel itineraries , but confirm the results before booking anything.

Is the hallucination problem getting better?

Yes, but we haven't seen a breakthrough.

"Hallucinations are a fundamental limitation of the way that these models work today," Turley said. LLMs just predict the next word in a response, over and over, "which means that they return things that are likely to be true, which is not always the same as things that are true," Turley said.

But OpenAI has been making gradual progress. "With nearly every model update, we've gotten a little bit better on making the model both more factual and more self aware about what it does and doesn't know," Turley said. "If you compare ChatGPT now to the original ChatGPT, it's much better at saying, 'I don't know that' or 'I can't help you with that' versus making something up."

Hallucinations are so much a part of the zeitgeist that Dictionary.com touted it as a new word it added to its dictionary in 2023.

Can you use ChatGPT for wicked purposes?

You can try, but lots of it will violate OpenAI's terms of use , and the company tries to block it too. The company prohibits use that involves sexual or violent material, racist caricatures, and personal information like Social Security numbers or addresses.

OpenAI works hard to prevent harmful uses. Indeed, its basic sales pitch is trying to bring the benefits of AI to the world without the drawbacks. But it acknowledges the difficulties, for example in its GPT-4 "system card" that documents its safety work.

"GPT-4 can generate potentially harmful content, such as advice on planning attacks or hate speech. It can represent various societal biases and worldviews that may not be representative of the user's intent, or of widely shared values. It can also generate code that is compromised or vulnerable," the system card says. It also can be used to try to identify individuals and could help lower the cost of cyberattacks.

Through a process called red teaming, in which experts try to find unsafe uses of its AI and bypass protections, OpenAI identified lots of problems and tried to nip them in the bud before GPT-4 launched. For example, a prompt to generate jokes mocking a Muslim boyfriend in a wheelchair was diverted so its response said, "I cannot provide jokes that may offend someone based on their religion, disability or any other personal factors. However, I'd be happy to help you come up with some light-hearted and friendly jokes that can bring laughter to the event without hurting anyone's feelings."

Researchers are still probing LLM limits. For example, Italian researchers discovered they could use ChatGPT to fabricate fake but convincing medical research data . And Google DeepMind researchers found that telling ChatGPT to repeat the same word forever eventually caused a glitch that made the chatbot blurt out training data verbatim. That's a big no-no, and OpenAI barred the approach .

LLMs are still new. Expect more problems and more patches.

And there are plenty of uses for ChatGPT that might be allowed but ill-advised. The website of Philadelphia's sheriff published more than 30 bogus news stories generated with ChatGPT .

What about ChatGPT and cheating in school?

ChatGPT is well suited to short essays on just about anything you might encounter in high school or college, to the chagrin of many educators who fear students will type in prompts instead of thinking for themselves.

Microsoft CEO Satya Nadella speaking while standing between logos for OpenAI and Microsoft

Microsoft CEO Satya Nadella touted his company's partnership with OpenAI at a November 2023 event for OpenAI developers. Microsoft uses OpenAI's GPT large language model for its Bing search engine, Office productivity tools and GitHub Copilot programming assistant.

ChatGPT also can solve some math problems, explain physics phenomena, write chemistry lab reports and handle all kinds of other work students are supposed to handle on their own. Companies that sell anti-plagiarism software have pivoted to flagging text they believe an AI generated.

But not everyone is opposed, seeing it more like a tool akin to Google search and Wikipedia articles that can help students.

"There was a time when using calculators on exams was a huge no-no," said Alexis Abramson, dean of Dartmouth's Thayer School of Engineering. "It's really important that our students learn how to use these tools, because 90% of them are going into jobs where they're going to be expected to use these tools. They're going to walk in the office and people will expect them, being age 22 and technologically savvy, to be able to use these tools."

ChatGPT also can help kids get past writer's block and can help kids who aren't as good at writing, perhaps because English isn't their first language, she said.

So for Abramson, using ChatGPT to write a first draft or polish their grammar is fine. But she asks her students to disclose that fact.

"Anytime you use it, I would like you to include what you did when you turn in your assignment," she said. "It's unavoidable that students will use ChatGPT, so why don't we figure out a way to help them use it responsibly?"

Is ChatGPT coming for my job?

The threat to employment is real as managers seek to replace expensive humans with cheaper automated processes. We've seen this movie before: elevator operators were replaced by buttons, bookkeepers were replaced by accounting software, welders were replaced by robots. 

ChatGPT has all sorts of potential to blitz white-collar jobs. Paralegals summarizing documents, marketers writing promotional materials, tax advisers interpreting IRS rules, even therapists offering relationship advice.

But so far, in part because of problems with things like hallucinations, AI companies present their bots as assistants and "copilots," not replacements.

And so far, sentiment is more positive than negative about chatbots, according to a survey by consulting firm PwC. Of 53,912 people surveyed around the world, 52% expressed at least one good expectation about the arrival of AI, for example that AI would increase their productivity. That compares with 35% who had at least one negative thing to say, for example that AI will replace them or require skills they're not confident they can learn.

How will ChatGPT affect programmers?

Software development is a particular area where people have found ChatGPT and its rivals useful. Trained on millions of lines of code, it internalized enough information to build websites and mobile apps. It can help programmers frame up bigger projects or fill in details.

One of the biggest fans is Microsoft's GitHub , a site where developers can host projects and invite collaboration. Nearly a third of people maintaining GitHub projects use its GPT-based assistant, called Copilot, and 92% of US developers say they're using AI tools .

"We call it the industrial revolution of software development," said Github Chief Product Officer Inbal Shani. "We see it lowering the barrier for entry. People who are not developers today can write software and develop applications using Copilot."

It's the next step in making programming more accessible, she said. Programmers used to have to understand bits and bytes, then higher-level languages gradually eased the difficulties. "Now you can write coding the way you talk to people," she said.

And AI programming aids still have a lot to prove. Researchers from Stanford and the University of California-San Diego found in a  study of 47 programmers  that those with access to an OpenAI programming help " wrote significantly less secure code  than those without access."

And they raise a variation of the cheating problem that some teachers are worried about: copying software that shouldn't be copied, which can lead to copyright problems. That's why Copyleaks, a maker of plagiarism detection software, offers a tool called the  Codeleaks Source Code AI Detector  designed to spot AI-generated code from ChatGPT, Google Gemini and GitHub Copilot. AIs could inadvertently copy code from other sources, and the latest version is designed to spot copied code based on its semantic structures, not just verbatim software.

At least in the next five years, Shani doesn't see AI tools like Copilot as taking humans out of programming.

"I don't think that it will replace the human in the loop. There's some capabilities that we as humanity have -- the creative thinking, the innovation, the ability to think beyond how a machine thinks in terms of putting things together in a creative way. That's something that the machine can still not do."

Editors' note: CNET used an AI engine to help create several dozen stories, which are labeled accordingly. For more, see our  AI policy .

Computing Guides

  • Best Laptop
  • Best Chromebook
  • Best Budget Laptop
  • Best Cheap Gaming Laptop
  • Best 2-in-1 Laptop
  • Best Windows Laptop
  • Best Macbook
  • Best Gaming Laptop
  • Best Macbook Deals
  • Best Desktop PC
  • Best Gaming PC
  • Best Monitor Under 200
  • Best Desktop Deals
  • Best Monitors
  • M2 Mac Mini Review
  • Best PC Speakers
  • Best Printer
  • Best External Hard Drive SSD
  • Best USB C Hub Docking Station
  • Best Keyboard
  • Best Webcams
  • Best Laptop Backpack
  • Best Camera to Buy
  • Best Vlogging Camera
  • Best Tripod
  • Best Waterproof Camera
  • Best Action Camera
  • Best Camera Bag and Backpack
  • Best E-Ink Tablets
  • Best iPad Deals
  • Best E-Reader
  • Best Tablet
  • Best Android Tablet
  • Best 3D Printer
  • Best Budget 3D Printer
  • Best 3D Printing Filament
  • Best 3D Printer Deals
  • Dell Coupon Codes
  • Newegg Promo Codes
  • HP Coupon Codes
  • Microsoft Coupons
  • Anker Coupons
  • Logitech Promo Codes
  • Western Digital Coupons
  • Monoprice Promo Codes
  • A4C Coupons

TechRepublic

Account information.

check gpt homework

Share with Your Friends

OpenAI’s GPT-4 Can Autonomously Exploit 87% of One-Day Vulnerabilities, Study Finds

Your email has been sent

Image of Fiona Jackson

The GPT-4 large language model from OpenAI can exploit real-world vulnerabilities without human intervention, a new study by University of Illinois Urbana-Champaign researchers has found. Other open-source models, including GPT-3.5 and vulnerability scanners, are not able to do this.

A large language model agent — an advanced system based on an LLM that can take actions via tools, reason, self-reflect and more — running on GPT-4 successfully exploited 87% of “one-day” vulnerabilities when provided with their National Institute of Standards and Technology description. One-day vulnerabilities are those that have been publicly disclosed but yet to be patched, so they are still open to exploitation.

“As LLMs have become increasingly powerful, so have the capabilities of LLM agents,” the researchers wrote in the arXiv preprint. They also speculated that the comparative failure of the other models is because they are “much worse at tool use” than GPT-4.

The findings show that GPT-4 has an “emergent capability” of autonomously detecting and exploiting one-day vulnerabilities that scanners might overlook.

Daniel Kang, assistant professor at UIUC and study author, hopes that the results of his research will be used in the defensive setting; however, he is aware that the capability could present an emerging mode of attack for cybercriminals.

He told TechRepublic in an email, “I would suspect that this would lower the barriers to exploiting one-day vulnerabilities when LLM costs go down. Previously, this was a manual process. If LLMs become cheap enough, this process will likely become more automated.”

How successful is GPT-4 at autonomously detecting and exploiting vulnerabilities?

Gpt-4 can autonomously exploit one-day vulnerabilities.

The GPT-4 agent was able to autonomously exploit web and non-web one-day vulnerabilities, even those that were published on the Common Vulnerabilities and Exposures database after the model’s knowledge cutoff date of November 26, 2023, demonstrating its impressive capabilities.

“In our previous experiments, we found that GPT-4 is excellent at planning and following a plan, so we were not surprised,” Kang told TechRepublic.

SEE: GPT-4 cheat sheet: What is GPT-4 & what is it capable of?

Kang’s GPT-4 agent did have access to the internet and, therefore, any publicly available information about how it could be exploited. However, he explained that, without advanced AI, the information would not be enough to direct an agent through a successful exploitation.

“We use ‘autonomous’ in the sense that GPT-4 is capable of making a plan to exploit a vulnerability,” he told TechRepublic. “Many real-world vulnerabilities, such as ACIDRain — which caused over $50 million in real-world losses — have information online. Yet exploiting them is non-trivial and, for a human, requires some knowledge of computer science.”

Out of the 15 one-day vulnerabilities the GPT-4 agent was presented with, only two could not be exploited: Iris XSS and Hertzbeat RCE. The authors speculated that this was because the Iris web app is particularly difficult to navigate and the description of Hertzbeat RCE is in Chinese, which could be harder to interpret when the prompt is in English.

GPT-4 cannot autonomously exploit zero-day vulnerabilities

While the GPT-4 agent had a phenomenal success rate of 87% with access to the vulnerability descriptions, the figure dropped down to just 7% when it did not, showing it is not currently capable of exploiting ‘ zero-day’ vulnerabilities . The researchers wrote that this result demonstrates how the LLM is “much more capable of exploiting vulnerabilities than finding vulnerabilities.”

It’s cheaper to use GPT-4 to exploit vulnerabilities than a human hacker

The researchers determined the average cost of a successful GPT-4 exploitation to be $8.80 per vulnerability, while employing a human penetration tester would be about $25 per vulnerability if it took them half an hour.

While the LLM agent is already 2.8 times cheaper than human labour, the researchers expect the associated running costs of GPT-4 to drop further, as GPT-3.5 has become over three times cheaper in just a year. “LLM agents are also trivially scalable, in contrast to human labour,” the researchers wrote.

GPT-4 takes many actions to autonomously exploit a vulnerability

Other findings included that a significant number of the vulnerabilities took many actions to exploit, some up to 100. Surprisingly, the average number of actions taken when the agent had access to the descriptions and when it didn’t only differed marginally, and GPT-4 actually took fewer steps in the latter zero-day setting.

Kang speculated to TechRepublic, “I think without the CVE description, GPT-4 gives up more easily since it doesn’t know which path to take.”

How were the vulnerability exploitation capabilities of LLMs tested?

The researchers first collected a benchmark dataset of 15 real-world, one-day vulnerabilities in software from the CVE database and academic papers. These reproducible, open-source vulnerabilities consisted of website vulnerabilities, containers vulnerabilities and vulnerable Python packages, and over half were categorised as either “high” or “critical” severity.

List of the 15 vulnerabilities provided to the LLM agent and their descriptions.

Next, they developed an LLM agent based on the ReAct automation framework, meaning it could reason over its next action, construct an action command, execute it with the appropriate tool and repeat in an interactive loop. The developers only needed to write 91 lines of code to create their agent, showing how simple it is to implement.

System diagram of the LLM agent.

The base language model could be alternated between GPT-4 and these other open-source LLMs:

  • OpenHermes-2.5-Mistral-7B.
  • Llama-2 Chat (70B).
  • LLaMA-2 Chat (13B).
  • LLaMA-2 Chat (7B).
  • Mixtral-8x7B Instruct.
  • Mistral (7B) Instruct v0.2.
  • Nous Hermes-2 Yi 34B.
  • OpenChat 3.5.

The agent was equipped with the tools necessary to autonomously exploit vulnerabilities in target systems, like web browsing elements, a terminal, web search results, file creation and editing capabilities and a code interpreter. It could also access the descriptions of vulnerabilities from the CVE database to emulate the one-day setting.

Then, the researchers provided each agent with a detailed prompt that encouraged it to be creative, persistent and explore different approaches to exploiting the 15 vulnerabilities. This prompt consisted of 1,056 “tokens,” or individual units of text like words and punctuation marks.

The performance of each agent was measured based on whether it successfully exploited the vulnerabilities, the complexity of the vulnerability and the dollar cost of the endeavour, based on the number of tokens inputted and outputted and OpenAI API costs.

SEE: OpenAI’s GPT Store is Now Open for Chatbot Builders

The experiment was also repeated where the agent was not provided with descriptions of the vulnerabilities to emulate a more difficult zero-day setting. In this instance, the agent has to both discover the vulnerability and then successfully exploit it.

Alongside the agent, the same vulnerabilities were provided to the vulnerability scanners ZAP and Metasploit, both commonly used by penetration testers. The researchers wanted to compare their effectiveness in identifying and exploiting vulnerabilities to LLMs.

Ultimately, it was found that only an LLM agent based on GPT-4 could find and exploit one-day vulnerabilities — i.e., when it had access to their CVE descriptions. All other LLMs and the two scanners had a 0% success rate and therefore were not tested with zero-day vulnerabilities.

Why did the researchers test the vulnerability exploitation capabilities of LLMs?

This study was conducted to address the gap in knowledge regarding the ability of LLMs to successfully exploit one-day vulnerabilities in computer systems without human intervention.

When vulnerabilities are disclosed in the CVE database, the entry does not always describe how it can be exploited; therefore, threat actors or penetration testers looking to exploit them must work it out themselves. The researchers sought to determine the feasibility of automating this process with existing LLMs.

SEE: Learn how to Use AI for Your Business

The Illinois team has previously demonstrated the autonomous hacking capabilities of LLMs through “capture the flag” exercises , but not in real-world deployments. Other work has mostly focused on AI in the context of “human-uplift” in cybersecurity, for example, where hackers are assisted by an GenAI-powered chatbot .

Kang told TechRepublic, “Our lab is focused on the academic question of what are the capabilities of frontier AI methods, including agents. We have focused on cybersecurity due to its importance recently.”

OpenAI has been approached for comment.

Subscribe to the Innovation Insider Newsletter

Catch up on the latest tech innovations that are changing the world, including IoT, 5G, the latest about phones, security, smart cities, AI, robotics, and more. Delivered Tuesdays and Fridays

  • New GoFetch Vulnerability in Apple’s M Chips Allows Secret Keys Leak on Compromised Computers
  • XZ Utils Supply Chain Attack: A Threat Actor Spent Two Years to Implement a Linux Backdoor
  • OpenAI’s Sora Generates Photorealistic Videos
  • ChatGPT Cheat Sheet: A Complete Guide for 2024
  • AI-Related News and Tips

Image of Fiona Jackson

Create a TechRepublic Account

Get the web's best business technology news, tutorials, reviews, trends, and analysis—in your inbox. Let's start with the basics.

* - indicates required fields

Sign in to TechRepublic

Lost your password? Request a new password

Reset Password

Please enter your email adress. You will receive an email message with instructions on how to reset your password.

Check your email for a password reset link. If you didn't receive an email don't forgot to check your spam folder, otherwise contact support .

Welcome. Tell us a little bit about you.

This will help us provide you with customized content.

Want to receive more TechRepublic news?

You're all set.

Thanks for signing up! Keep an eye out for a confirmation email from our team. To ensure any newsletters you subscribed to hit your inbox, make sure to add [email protected] to your contacts list.

IMAGES

  1. Homework AI

    check gpt homework

  2. Homework GPT

    check gpt homework

  3. How To Use Chat GPT For Homework Without getting CAUGHT!

    check gpt homework

  4. MBR or GPT? How to check your disk partition style on Windows 7, 8, 10

    check gpt homework

  5. GPT MBR Windows 10 ozki ru

    check gpt homework

  6. Chat GPT Homework Help

    check gpt homework

VIDEO

  1. When chat gpt does ur homework

  2. Use #ChatGPT Vision to check your Kid's Homework

  3. How to check Hard Disk GPT Or MBR in CMD

  4. Smartsolve -the most useful tools AI allowing you to get A's on tests and homework #ai #gpt

  5. MBR:Master Boot Record Explaine #shorts #kumarkeshavclasses

  6. Using ChatGPT to do homework 📚 #podcast #honesthourpodcast #viralshort #getviral

COMMENTS

  1. Free AI detector

    Paste your English text below to detect AI-generated content like ChatGPT, GPT-4, and Google Gemini. Feedback. Paste text. 0 / ... Rid your homework of AI content and let your writing shine. Work projects. The last thing you want is to turn something in to your boss that wasn't created by you. Use our AI detector to update your work product ...

  2. Free AI Detector

    Scribbr's AI and ChatGPT Detector confidently detects texts generated by the most popular tools, like ChatGPT, Gemini, and Copilot. GPT2, GPT3, and GPT3.5 are detected with high accuracy, while the detection of GPT4 is supported on an experimental basis. Note that no AI Detector can provide complete accuracy ( see our research ).

  3. ChatGPT

    Solve Your Homework. By Amir Adel. I'm here to guide you through your homework, offering step-by-step explanations to solve any problem. Begin by stating or uploading an image of the problem! Sign up to chat. Requires ChatGPT Plus.

  4. HowkGPT

    HowkGPT. HowkGPT is an experimental tool that has been specifically trained to detect AI-generated university student homework. The related research manuscript describing the ongoing research is published in arxiv. Check. Disclaimer. HowkGPT is under active development. The results generated by HowGPT may not always accurately determine if the ...

  5. This program can tell if ChatGPT did your homework

    Jan 12, 2023 - 11.51am. A new online tool conceived by two Australians has been launched to help teachers and academics detect when homework and assignments have been churned out by the powerful ...

  6. ChatGPT

    Homework Checker. An AI-Powered Tool for Detecting and Correcting Errors in Homework Problems.

  7. A college student made an app to detect AI-written text : NPR

    Some students have been using ChatGPT, a text-based bot, to do their homework for them. Now, 22-year-old Edward Tian's new app is attracting educators working to combat AI plagiarism.

  8. A new tool helps teachers detect if AI wrote an assignment

    ChatGPT is a buzzy new AI technology that can write research papers or poems that come out sounding like a real person did the work. You can even train this bot to write the way you do. Some ...

  9. ChatGPT

    Homework Solver. By studyx.ai. Homework helper using chain of thought and tools like Python, DALL-E, and browser. Sign up to chat.

  10. Using ChatGPT for Assignments

    Using ChatGPT for Assignments | Tips & Examples. Published on February 13, 2023 by Jack Caulfield and Tobias Solis. Revised on November 16, 2023. People are still figuring out the best use cases for ChatGPT, the popular chatbot based on a powerful AI language model.This article provides some ideas for how to use ChatGPT and other AI tools to assist with your academic writing.

  11. ChatGPT could transform academia. But it's not an A+ student yet

    A new AI chatbot might do your homework for you. But it's still not an A+ student. December 19, 20225:00 AM ET. Emma Bowman. Enlarge this image. Enter a prompt into ChatGPT, and it becomes your ...

  12. ChatGPT

    Access to GPT-4 (our most capable model) Chat with images, voice and create images; Use and build custom GPTs; and includes everything in Free; Do more with GPTs. You can choose from hundreds of GPTs that are customized for a single purpose—Creative Writing, Marathon Training, Trip Planning or Math Tutoring. Building a GPT doesn't require ...

  13. SchoolGPT

    Type in your question or equation, and our GPT will analyze it using advanced prompt algorithms. Our prompt can do homework, test, exercises and more! But unlike other homework apps, our prompt doesn't just give you a final answer. Instead, it breaks down every step of the solution in clear, concise language, so you can follow along and learn ...

  14. GPTZero

    Our AI detection model contains 7 components that process text to determine if it was written by AI. We utilize a multi-step approach that aims to produce predictions that reach maximum accuracy, with the least false positives. Our model specializes in detecting content from Chat GPT, GPT 3, GPT 4, Bard, and LLaMa models.

  15. MathGPT

    MathGPT. MathGPT Vision. MathGPT can solve word problems, write explanations, and provide quick responses. Drag & drop an image file here, or click to select an image. or. MathGPT is an AI-powered math problem solver, integral calculator, derivative cacluator, polynomial calculator, and more! Try it out now and solve your math homework!

  16. How to use ChatGPT in your studies

    Input. Please write a program in Python that calculates the sum of all numbers from 1 to N, where N is entered by the user. ChatGPT result: Here is one way to write a program in Python to calculate the sum of all numbers from 1 to N: n = int (input ("Enter a positive integer: ")) sum = 0. for i in range (1, n+1):

  17. ChatGPT Homework

    Academics worry about students using AI tools to write their homework. ChatGPT can do things that previously required human judgment and analysis, like ...

  18. PDF Homework Help? ChatGPT is Poised to Disrupt Education

    version of the article, "Homework Help? ChatGPT and other AI tools are poised to disrupt education," appears in the April 8, 2023 issue of Science News — but the print version gives away the ...

  19. Can ChatGPT do my homework?

    It is quite obvious that Chat GPT can do your homework and it can do it pretty well. The issue with using ChatGPT is that it is technically academic dishonesty. Using the model's answers and passing them off as your own goes against the academic integrity rules set by your school or university.

  20. How teachers started using ChatGPT to grade assignments

    A new tool called Writable, which uses ChatGPT to help grade student writing assignments, is being offered widely to teachers in grades 3-12.. Why it matters: Teachers have quietly used ChatGPT to grade papers since it first came out — but now schools are sanctioning and encouraging its use. Driving the news: Writable, which is billed as a time-saving tool for teachers, was purchased last ...

  21. Introducing ChatGPT

    In the following sample, ChatGPT asks the clarifying questions to debug code. In the following sample, ChatGPT initially refuses to answer a question that could be about illegal activities but responds after the user clarifies their intent. In the following sample, ChatGPT is able to understand the reference ("it") to the subject of the previous question ("fermat's little theorem").

  22. ChatGPT

    A homework helper providing solutions and explanations. ChatGPT Sign up Sign up

  23. How can I tell when my students use Chat GPT or other ai writers

    People are using it constantly. They will be competing against people are that are using AI. So my advice to you as a teacher would be to brush up on AI, engage with with it, engage you students with it, and look for other means of assessment. Next year I will have one exam where I will let them use chatGPT in class.

  24. Free Grammar Checker

    Use QuillBot's free online grammar checker tool to perfect your writing by reviewing your text for grammar, spelling, and punctuation errors. Whenever you need to review your writing or grammar check sentences, QuillBot is here to help make the editing process painless. QuillBot's free online sentence corrector helps you avoid mistakes and ...

  25. ChatGPT: A GPT-4 Turbo Upgrade and Everything Else to Know

    The newest variation, GPT-4 Turbo, arrived in late 2023 with more up-to-date responses and an ability to ingest and output larger blocks of text. ChatGPT is growing beyond its language roots.

  26. OpenAI's GPT-4 Can Autonomously Exploit 87% of One-Day ...

    The GPT-4 large language model from OpenAI can exploit real-world vulnerabilities without human intervention, a new study by University of Illinois Urbana-Champaign researchers has found. Other ...