37

I tried some math exercises we will give to students and ChatGPT does really well answering these. It excels at proofs and often gives details that were not our the example solution, and makes some mistakes when it would need to do real calculations, but often mistakes that could plausibly be a student mistake.

Now ChatGPT is here and at least some students will try it and we have to deal with it. I see no good way to reliably detect it (most automatic methods have a lot of false positives) and don't want to be unfair to anyone just providing a detailed solution. Still it looks like it would not be possible to give exercises in the way one did it all the years before, as it takes 2 minutes to get a ChatGPT answer and remove the obvious mistakes and change a bit of the style for solving an exercise without actually understanding it.

We already have the rule that students have to present two solutions per semester to avoid that they copy from each other, but this won't prevent them from using ChatGPT without understanding the solution for all exercises they do not have to present.

We considered different ways to change exercises or exercise groups, but most mean a lot of work that would have to be done in a rather short timeframe. The best approach may be to prepare mandatory tests (in presence) for the exercise groups, but this would mean a lot of work for creating new fair tests as we only have exercises and their example solutions prepared.

What other ways are there to deal with people using ChatGPT?
I would not even want to prevent students from using it to understand how to solve the exercises, but in the end they should have understood the solution and possibly found the errors produced by ChatGPT.


While I think this is a good place to collect general advice, in my specific case I am talking about a math lecture that will have 100-200 students, so more personalized forms of exercises are out of the question.

We also do not have the resources to make major changes to the existing exercises or organize a whole new exercise group structure before the semester start in two weeks. One thing we have considered giving in-class tests, but creating one fair test per week may also be more work than we have the resources for.

A few more details about our organisation in the last few years:

  • We give exercise sheets to be done in small groups of 2-3 students.
  • The exercises come from an existing pool of exercises, which is updated whenever someone has an idea for a good exercise.
  • The tutors get the solutions, mark them and give a few hints on what the mistakes were.
  • Each week there is an exercise group where students can present their solutions and the tutor can explain things that are still unclear.
  • Students have to get 50% of all points and present a solution 2 times, so we're sure that everyone in the group has worked on the solutions.
  • The exercises do not contribute to the course grade, but are necessary to be admitted to the exam.
  • We tend to be lenient if there are a few points missing at the end, but the group has worked on the exercises until the last sheet. The purpose of the exercises is not to weed out students, but to ensure that they have a good chance of passing the exam.
allo
  • 476
  • 1
  • 4
  • 11
  • 2
    Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on [meta], or in [chat]. Comments continuing discussion may be removed. – Xander Henderson Mar 23 '23 at 13:33
  • 7
    The SE software is throwing a lot of flags here about there being too many comments. I have moved the comments below this question into chat using the automatic systems. However, because it seems that GPT and generative AI are of broader interest, I have also created a ChatGPT in Mathematics Education chatroom where we might be able to continue the discussion. – Xander Henderson Mar 23 '23 at 13:35
  • 2
    Just linking this to https://matheducators.stackexchange.com/q/6/112 since I don't think ChatGPT brings anything new to this. – Andrew Stacey Mar 26 '23 at 09:53

14 Answers14

83

I wouldn't bother at all. Just base your grades 90% on proctored exams where using the internet is prohibited. Those who will cheat on homework and quizzes will fail miserably and that will be their problem. Our task as teachers is to provide an oportunity to learn to everybody who is willing to learn and is capable of learning and I usually go far out of my way and spend a lot of extra time to help such students. But we are neither policemen, nor babysitters, so if somebody really wants to remain ignorant and get kicked out in the end, just let him or her do it and consider it the society problem, not yours.

fedja
  • 3,809
  • 7
  • 20
  • 2
    Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on [meta], or in [chat]. Comments continuing discussion may be removed. – Xander Henderson Mar 22 '23 at 19:43
  • 5
    Sometime students cheat not out of laziness, but from pressure. So they qualify as both "willing to learn" and "fail miserably". – Shawn V. Wilson Mar 22 '23 at 22:12
  • 3
    @ShawnV.Wilson Heed to what Xander said and join the chat if you want to discuss the issue further! Once you posted here, I'll still answer once. The right way then is not to cheat, but to inform the teacher that one is overheating. Also I always make it easy to have a few "bad days" in my grading scheme (counting 7 best quiz scores out of 10, 2 best midterms out of 3, especially when I use my "cold shower" approach, extra non-obligatory projects for credit). But feeling constant pressure merely means that one has miscalculated his abilities and calls for dropping some of the courses. – fedja Mar 22 '23 at 22:33
  • 2
    The SE software is throwing a lot of flags here about there being too many comments. I have moved the comments below this question into chat using the automatic systems. However, because it seems that GPT and generative AI are of broader interest, I have also created a ChatGPT in Mathematics Education chatroom where we might be able to continue the discussion. – Xander Henderson Mar 23 '23 at 13:36
33

I don't think ChatGPT fundamentally changes the ability to use unauthorized aids. Many other source of help have existed for a long time (other students, copying, Chegg, paid tutors, Google, books, Mathematics Stack Exchange (MSE), Reddit, Physics Forums, etc.)

My advice is to change your view on homework and assign it as drill work, but don't grade it (even for completion). Increase the frequency of in-class tests instead (proctored and without electronic aids). Do not resent the loss of lecture time. "Flipping" is motivated by the inferiority of lectures, even with great lecturers, versus practice. And tests are medium stakes practice.

I would also recommend assigning problems that really are drill that come from the book (which should not be too hard) and that represent similar problems to what they will do on tests. Not "projects", not super hard.

There's a lot of research that says neophytes are already challenged just by the new topic and benefit from drill. The desire for more complex homework is driven by professors/researchers who see it as more interesting, but they don't really have a good feel for the dynamics of gradual training (essentially are better at math than at pedagogy). In addition, overly complex problems (if collected/graded) are also more likely to lead to cheating.

PS: Many previous questions on this site have addressed issues of cheating on outside class assignments.

guest troll
  • 359
  • 2
  • 3
  • 3
    My only comment here is that grading homework is often what is necessary to get honest but inexperienced students to do the work. See, e.g., Inclusive Teaching, Hogan & Sathy. – Opal E Mar 23 '23 at 17:16
  • 1
    "essentially are better at math than at pedagogy." Wow, this was my high school chemistry teacher to a T. The guy clearly knew the material, but was really awful at actually effectively teaching students so very few of us got anything out of his class. – Mason Wheeler Mar 23 '23 at 20:42
  • "I don't think ChatGPT fundamentally changes the ability to use unauthorized aids." When I took my first postgrad math class, I struggled so hard with the first assignment that the professor asked me to talk to him after class. When I met with him, he said that he was fine with me looking up the solutions to problems and copying them! - as long as I paraphrased them myself to make sure I understood the proofs and clearly cited them. I had to do that for a few more assignments with him, but I truly did understand the material better for having looked the proofs up and digested them. – Kevin Apr 18 '23 at 19:47
20

Here are some suggested changes for dealing with ChatGPT, in no particular order. Note that these are based on our discussions in (mathematics and natural sciences) teacher education, so they are not fully adjusted for pure/applied mathematics, so use your own judgment.

  1. Present a wrong proof or calculation provided by ChatGPT and ask them to find an explain or fix the mistakes. (In general it is a good idea to vary the kinds of exercises you use; not only prove this and calculate that, but also other types.) Or you peer evaluation as an exercise.
  2. Have an exercise that requires use of both pictures and text, or where the answer is supposed to be a picture. Draw a comic about an epsilon-delta proof or explain what kind of argument a geometric drawing might contain the idea of. Easier in analysis, geometry or graph theory, but often visualizations are also useful elsewhere.
  3. Have longer exercises in several steps, where students deliver something, get feedback and adjust accordingly, delivering again.
  4. In university mathematics it is probably fairly hard to get writing exercises where one reflects on their own experiences, but if you manage it, ChatGPT is of limited help. Self-reflection and similar skills are useful also in mathematics.
  5. Ask students to solve a problem with several methods. For example, if it is computing an integral, design an integral that can be evaluated by using substitution, integration by parts and trigonometric substitutions. Contributed by Mahdi Majidi-Zolbanin.
  6. Give students a solution and maybe a method, and ask them to create an exercise that can be solved using the given method and that has the given solution. Maybe combine with number one.
Tommi
  • 7,018
  • 2
  • 25
  • 54
  • 3
    Upvoted for interesting and helpful thoughts, but sadly no solution to my question. The first thing is interesting, but for general teaching and not for this course. The other advice is not feasible for tutors who have to grade 20 exercise sheets and hold a 2 hour exercise group. – allo Mar 21 '23 at 17:40
  • @allo You might take up the issues with whoever is responsible for the course, and ultimately find people with interest in didactics and pedagogy and consider whether the teaching as a whole can be organized in a different manner. In any case, good luck. – Tommi Mar 22 '23 at 08:00
  • 8
    As for point 2: GPT-4 already allows for images as input. I guess it's also just a matter of time before it can also produce images as output – Ivo Mar 22 '23 at 09:47
  • 5
    @MahdiMajidi-Zolbanin: You can ask ChatGPT to solve a problem using a different method (e.g. "Now please solve it using integration by parts.") and it will usually generate a plausible-looking answer. It might not be a correct answer (because ChatGPT's answers rarely are, unless a correct answer for that specific problem just happens to be in the model's training data), but it will likely look like a plausible attempt to answer the problem using the requested method, because that's what AI models like GPT are trained to do: generate text that looks plausibly human-written in a given context. – Ilmari Karonen Mar 22 '23 at 12:47
  • 1
    @IlmariKaronen Ok, how about a problem like the following? Can ChatGPT solve it? Assume that $\int\frac{f(x)}{1+x}dx=e^{\sin^2(x)}$. Evaluate $\int_0^1\frac{2xf(x^2)}{1+x^2}dx$. – Mahdi Majidi-Zolbanin Mar 22 '23 at 15:17
  • @MahdiMajidi-Zolbanin e-1? – eps Mar 22 '23 at 16:50
  • @eps $e^{\sin^2(1)}-1$. – Mahdi Majidi-Zolbanin Mar 22 '23 at 16:54
  • 1
    @MahdiMajidi-Zolbanin the interesting thing is it got all the way to the end (came up with $[e^(sin^2(x^2))]_0^1 )$ and made a simple error on the final evaluation over 0 to 1. It did correct itself when i challenged its answer. – eps Mar 22 '23 at 17:12
  • 1
  • 3
    @MahdiMajidi-Zolbanin here: we'll first use the substitution u = x^2 in the integrand:

    ∫(2x * f(x^2))/(1 + x^2) dx = ∫f(u)/(1+u) du

    Next, we'll use the given assumption to evaluate the integral:

    ∫f(u)/(1+u) du = e^(sin^2(u)) + C

    Substituting back x^2 for u, we have:

    ∫(2x * f(x^2))/(1 + x^2) dx = e^(sin^2(x^2)) + C

    Now, we can evaluate the definite integral:

    ∫_0^1 (2x * f(x^2))/(1 + x^2) dx = [e^(sin^2(x^2))]_0^1

    Using the substitution v = sin^2(x^2), we can simplify the evaluation of the integral:

    [e^(sin^2(x^2))]_0^1 = [e^v]_0^1 = e - 1

    – eps Mar 22 '23 at 17:21
  • @MahdiMajidi-Zolbanin Yes, it is that powerful. – Volker Siegel Mar 22 '23 at 23:09
  • 3
    The SE software is throwing a lot of flags here about there being too many comments. Because it seems that GPT and generative AI are of broader interest, I have also created a ChatGPT in Mathematics Education chatroom where we might be able to continue the discussion. – Xander Henderson Mar 23 '23 at 13:36
12

ChatGPT is stupid.

This is not an opinion, but an observed behaviour. I've asked ChatGPT to calculate a number with 36 divisors and this went wrong, so I decided to learn the program how to perform that calculation. While trying to do that, I saw that ChatGPT has a wrong "algorithm" to calculate the number of divisors of a number. While trying to resolve that issue, I asked ChatGPT to calculate $2^5 \cdot 3^5$ to which it answered $243 \cdot 243$: you've seen this correctly: it can't even calculate $2^5$!!!

Hereby a screenshot of the blunder:

enter image description here

Translated: A smaller number with 36 divisors can be created, using 2 and 3 as prime numbers, like $2^5 * 3^5 = 243 * 243 = 59049$. This has indeed 36 divisors, $(6)(6)=36$ :-)

You can use this as a starting point for an exercise, and let your students fall into the pit, in order to get them disgusted by ChatGPT's stupidity that much that they might decide from themselves not to use it anymore.

PS: I have just asked this bogus question:

Why are there no numbers, larger than one million, which you can divide by two, three, four, five, six, seven and eight?

This is nonsense, because every multiple of 840 is such a number.

Enjoy ChatGPT's "answer":

This is actually a well-known problem in number theory called the "Harmonic Division Problem" or the "Frobenius Coin Problem". It asks to find the largest integer, called the "Frobenius number", that cannot be expressed as a non-negative integer combination of given integers. In this case, the given integers are 2, 3, 4, 5, 6, 7, and 8.
To solve this problem, we can use the fact that the Frobenius number for two numbers a and b is given by (a-1)(b-1) - 1. Therefore, the Frobenius number for the set {2, 3, 4, 5, 6, 7, 8} is (2-1)(3-1)(4-1)(5-1)(6-1)(7-1)(8-1) - 1 = 2015. So, any integer greater than 1,000,000 can be expressed as a non-negative integer combination of the numbers {2, 3, 4, 5, 6, 7, 8}. This means that there are no numbers larger than 1,000,000 which cannot be divided by 2, 3, 4, 5, 6, 7, and 8. Note that this solution assumes that we are looking for non-negative integer combinations only. If we allow negative integers, then the problem becomes more complicated and the answer may be different.

Dominique
  • 1,809
  • 7
  • 27
  • 2
    Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on [meta], or in [chat]. Comments continuing discussion may be removed. – Xander Henderson Mar 23 '23 at 13:35
  • 4
    The SE software is throwing a lot of flags here about there being too many comments. I have moved the comments below this question into chat using the automatic systems. However, because it seems that GPT and generative AI are of broader interest, I have also created a ChatGPT in Mathematics Education chatroom where we might be able to continue the discussion. – Xander Henderson Mar 23 '23 at 13:35
  • 1
    @XanderHenderson: an interesting discussion on the subject can be found here: https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt-is-banned – Dominique Mar 23 '23 at 13:57
  • 2
    This comment is out of topic. The question is about exercises for student that use chatgpt, not about whether it is possible or not to trick chatgpt into a stupid answer. – stenci Mar 24 '23 at 21:39
7

I think that approaches like: "I don't care if the students cheat in exercises, they will learn their lesson at the exam." don't work well because dealing with large number of students doing very badly at an exam is not nice. Either you fail a lot of them, get political pressure for being too harsh and cause problems to others (students repeating the semester, students dropping out so that follow-up courses have very few students), or you let them pass, which means that a lot of students with little knowledge and experience enter the next courses.

In a typical German lecture with 5-10 exercise groups you could try the following:

  • You split your exercise sheet in half. One half is homework like before, the other half is done in person in a lecture hall.
  • For the in-person-part, you gather all students (weekly) in a lecture hall for 2-3 hours. The 5-10 exercise group tutors can take turn in supervising this, so it is not much work for everybody.
  • You don't allow phones or talking, but you don't put much effort in enforcing this.

This way to put more pressure on the students actually trying to do exercises themselves without causing too much work and trouble for anybody.

J Fabian Meier
  • 250
  • 1
  • 7
  • 3
    Some good thoughts, but eek @ "You don't allow X, but you don't put much effort in enforcing this." IME at the places I've taught this is a complete non-starter. Either X is enforced (and enforceable) or rampant and public violations will be occurring all the time. – Daniel R. Collins Mar 22 '23 at 14:36
  • 1
    @DanielR.Collins I never had much problems with cheating when teaching maths students (different with other students). – J Fabian Meier Mar 22 '23 at 16:37
  • I believe you (and agree with lots of what you said in comments above), but different institutions will have very different experiences about that. – Daniel R. Collins Mar 22 '23 at 17:11
  • 5
    "dealing with large number of students doing very badly at an exam is not nice" is true; a mitigation measure would be to have an in-person test on the first homework stuff early on (after, say, the first two weeks), give the ones who fail one more chance and eliminate all that fail that one as well. Obviously, announce that in the beginning, and the reason why. Failing early is much better than failing at the end. – Peter - Reinstate Monica Mar 22 '23 at 22:52
7

The Chronicle of Higher Education recently held a discussion on ChatGPT and other generative AI tools. While they were interested in the impact of these tools on academia in general, I think that the discussion is relevant to this site, as well. My impression was that there were a few important take aways:

  1. These tools are not going away. Trying ban their use is unlikely to be productive, and we likely need to find ways to teach and learn which incorporate the existence of generative AI. In my mind, this is no different from the way in which calculators have been adopted in mathematics (when I was in school, they were seen as verboten aides for cheating), or spellchecking and grammar checking tools have been adopted in composition classes (again, when I was in school, using spellcheck was controversial, though maybe not seen as right-out cheating).

    It is incumbent upon us to learn what these tools are capable of, and how they may or may not be used in our classes.

  2. These tools are getting better. I have seen many discussions (including answers to this very question) about how to write better questions which will trip-up or trick GPT. This is a fool's errand. It might work in the short term, but the tools are continuing to get better, and whatever prompts are tricky today are likely to be old hat in a year or two. Trying to stay ahead of the generative AIs is not a long-term, sustainable solution.

  3. Communication is important. Students need to be told in very clear, certain terms, what an instructor's policy on generative AI is, why that policy exists, and what sanctions exist if students violate that policy. As is noted in the question, reliably detecting generative AI is hard, but being clear about expectations means that students know what risks they are running, and hopefully understand the rationale.

  4. Seek more authentic, process-based assessments. To me, this was the big one. It is more labor intensive, and requires more face-to-face time with students, but is likely the only thing which is going to ensure student learning in the long run. Instead of having students turn in a pile of completed work, have them turn in drafts, then require that they discuss those drafts with you (I already do this in most of my classes). Or, as is noted in the question itself, have them present work in class—if a student doesn't understand what they are presenting, it becomes clear pretty quickly.

    In any case, the goal is to have assessment focus on how well students can demonstrate understanding of a process. This could mean grading iterative work over time, or grading work which is done in a setting where it is easier to monitor the tools that are being used.

A recording of the event can be found via Chronicle of Higher Ed's event page. The Chronicle has posted other resources in a Google document. Finally, the following is taken from a follow-up email sent by the Chronicle after the event:

Writing and resources on ChatGPT

Xander Henderson
  • 7,801
  • 1
  • 20
  • 57
  • This outline takes the words right out of my mouth: 1. True, chatGPT is only as effective as its user prompts and currently merely emulates politicians (quoting user Bram28 from math.SE: "they talk a lot, and what they say sounds good enough for many people to trust and believe, but a little bit of analysis shows that they are just doing word association without any logic"). But it is by design that its current iteration is dumb at technical reasoning (i.e., has no maths-aware component or link). – ryang Mar 24 '23 at 09:32
  • Communication and Education truly are key (this allies with Dikran's and fedja's Answers on this page): clearly thinking through the issues, having clear and comprehensive policy statements, and adopting a progressive attitude by embracing chatGPT but also being cognizant of overreliance on it (social media has already made our thinking collectively shallower, and chatGPT will only accelerate this atrophy).
  • – ryang Mar 24 '23 at 09:44
  • In particular, chatGPT interferes with the development/honing of writing skills and, consequently, the type of deep broad analytical thinking that comes with longer-form writing (crystallising, refining, organising, revising and distilling opinions/ideas, catching inconsistencies, discovering conclusions and even altering one's own position). Writer Flannery O'Connor on writing as a discovery process: "I write because I don't know what I think until I read what I say." – ryang Mar 24 '23 at 09:44
  • 3
    @ryang it now has an integration into WolframAlpha solving your request for being math aware: https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/ – JonathanReez Mar 24 '23 at 14:08
  • 1
    @ryang the most likely outcome is that the top 10% of workers will become super productive and the rest will be fired and forced to do jobs that AI cannot such as construction or childcare or plumbing. You are entirely right that students who fail to develop critical skills will not get cushy office jobs but those critical skills now include “how to use LLMs”. There’s a balance to be had there. – JonathanReez Mar 24 '23 at 14:10