-2

I am trying to run a multi agent reinforcement learning project, and getting the following error:

Traceback (most recent call last):
  File "E:\USER\Desktop\TD3p\V2\main.py", line 162, in <module>
    marl_agents.learn(memory, writer, steps_total)
  File "E:\USER\Desktop\TD3p\V2\matd3.py", line 118, in learn
    self.agents[agent_idx].actor_loss.backward()
  File "E:\anaconda3\envs\pytorch\lib\site-packages\torch\_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "E:\anaconda3\envs\pytorch\lib\site-packages\torch\autograd\__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

My codes are here:

 for agent_idx in range(self.n_agents):
        ...
        # critic loss calculation
        self.agents[agent_idx].critic_loss = F.mse_loss(current_Q1.float(), target_Q.float()) +\
                                             F.mse_loss(current_Q2.float(), target_Q.float())

        # critic optimization
        self.agents[agent_idx].critic.optimizer.zero_grad()
        self.agents[agent_idx].critic_loss.backward()
        self.agents[agent_idx].critic.optimizer.step()

        if steps_total % self.freq == 0 and steps_total > 0:
            # actor loss calculation
            self.agents[agent_idx].actor_loss = -T.mean(self.agents[agent_idx].critic.Q1(states, mu))
            # actor optimization
            self.agents[agent_idx].actor.optimizer.zero_grad()
            self.agents[agent_idx].actor_loss.backward()
            self.agents[agent_idx].actor.optimizer.step()
            self.agents[agent_idx].update_network_parameters()

The error happens on actor optimization: self.agents[agent_idx].actor_loss.backward()

For each agent, I need to use this backward() function. So, in iterations, this function would definitely be used multiple times. However, I think for each agent, they would use this backward() function independently. So, I do not need to set 'retain_graph=True' because for example, the second agent did not need to access the saved variables of the first agent.

Secondly, this problem would only happen on the calculations of actor losses, although both critic losses and actor losses follow the same order of execution: calculate the loss and optimizate. After the previous agent completes the calculation and optimization, the latter agent would execute the code. Critics can call this backward() function multiple times for optimazation, but actiors can not.

I have detached the actor_loss before backpropagation. The modified code works, but the loss graphes are strange: all critics' losses keep fluctuating, while all actors' losses keep increasing.

The modified codes are here:

 for agent_idx in range(self.n_agents):
        ...
        # critic loss calculation
        self.agents[agent_idx].critic_loss = F.mse_loss(current_Q1.float(), target_Q.float()) +\
                                             F.mse_loss(current_Q2.float(), target_Q.float())

        # critic optimization
        self.agents[agent_idx].critic.optimizer.zero_grad()
        self.agents[agent_idx].critic_loss.backward()
        self.agents[agent_idx].critic.optimizer.step()

        if steps_total % self.freq == 0 and steps_total > 0:
            # actor loss calculation
            self.agents[agent_idx].actor_loss = self.agents[agent_idx].critic.Q1(states, mu).detach()
            self.agents[agent_idx].actor_loss.requires_grad = True
            self.agents[agent_idx].actor_loss = -T.mean(self.agents[agent_idx].actor_loss)
            # actor optimization
            self.agents[agent_idx].actor.optimizer.zero_grad()
            self.agents[agent_idx].actor_loss.backward()
            self.agents[agent_idx].actor.optimizer.step()
            self.agents[agent_idx].update_network_parameters()

I would appreciate it if anyone could give me some hints or help me fix this bug.

  • Welcome to Stack Overflow. Please read [ask] and try to ask a *specific* question - "I would appreciate some hints" [does not qualify](https://meta.stackoverflow.com/questions/284236). It is also helpful to try to reproduce the problem with as little code as possible, but make sure there is enough information that others can reproduce it (ideally, we should be able to copy and paste your code without modification, and see the problem). See [mre] for details. – Karl Knechtel May 22 '22 at 18:12
  • Finally, it is a good idea to show a [complete](https://meta.stackoverflow.com/questions/359146) error message, formatted like code (so that we can see it as it appears in your terminal; some error messages use whitespace to align text in ways that help understand the problem). – Karl Knechtel May 22 '22 at 18:16

0 Answers0