3

Using the eksctl tool, I created an EKS cluster with 5 nodes. My application's docker images are stored in ECR registries in the same region. I deployed my kubernetes cluster and everything has been happy for the past 6 weeks or so.

This morning, I came in and found 3 pods were in an ErrImagePull state. Using kubectl describe pod <pod name>, I found the error:

Failed to pull image "<ECR REGISTRY>/<IMAGE>": rpc error: code = Unknown desc = Error response from daemon: Get <ECR REGISTRY>/<IMAGE>: no basic auth credentials

My understanding of EKS and ECR is that I don't need a pull secret (and I haven't used one for any of the other running pods) so my guess is that some process or docker image on that node died but I can't find any docs on this.

Update: I forgot all about this question. When I created the original node group, I failed to include the --ssh-access flag which prevented me from getting onto the node and see if a kubernetes process had failed. I never found the actual solution; I simply added a taint to the problem node, created a new node, and went about my business. I'm still trying to find time to spin up a new node group with ssh access.

Matthew
  • 261
  • 2
  • 5

1 Answers1

1

Do your IAM roles that are attached to EC2 instances that are in EKS cluster have ECR iam policies? If not please update IAM roles Ref Link: https://docs.aws.amazon.com/AmazonECR/latest/userguide/ECR_on_EKS.html#:~:targetText=The%20Amazon%20EKS%20worker%20node,policy%20permissions%20for%20Amazon%20ECR.&targetText=When%20referencing%20an%20image%20from,tag%20naming%20for%20the%20image.