0

I have a list of filenames that I need to sort based on a section within the string. However, it only works if I make the file extension part of my sorting dictionary. I want this to work if the file is a .jpg or a .png, so I am trying to split on both the '_' and the '.' character.

sorting = ['FRONT', 'BACK', 'LEFT', 'RIGHT', 'INGREDIENTS', 'INSTRUCTIONS', 'INFO', 'NUTRITION', 'PRODUCT']

filelist = ['3006345_2234661_ENG_PRODUCT.jpg', '3006345_2234661_ENG_FRONT.jpg', '3006345_2234661_ENG_LEFT.jpg', '3006345_2234661_ENG_RIGHT.jpg', '3006345_2234661_ENG_BACK.jpg', '3006345_2234661_ENG_INGREDIENTS.jpg', '3006345_2234661_ENG_NUTRITION.jpg', '3006345_2234661_ENG_INSTRUCTIONS.jpg', '3006345_2234661_ENG_INFO.jpg']

sort = sorted(filelist, key = lambda x : sorting.index(x.re.split('_|.')[3]))

print(sort)

This returns the error "AttributeError: 'str' object has no attribute 're'"

What do I need to do to split on both the _ and . when splitting out my strings for sorting? I only want to use the split for the sorting, not for re-forming the strings.

  • try `x.split` instead of `x.re.split` – malmiteria Feb 14 '20 at 21:49
  • Does this answer your question? [Split string based on a regular expression](https://stackoverflow.com/questions/10974932/split-string-based-on-a-regular-expression) – Michael Bianconi Feb 14 '20 at 21:49
  • That gives an error "IndexError: list index out of range" which I figure means that it isn't making enough splits to get to the [3] index. – Micah Edelblut Feb 14 '20 at 21:50
  • `re.split` takes position one argument as regular expression, second as data input so your syntax should be something like `re.split('_|.', x)[3]`, as mentioned in the comment below. – Guven Degirmenci Feb 14 '20 at 21:54
  • Please share the entire error message. – AMC Feb 14 '20 at 23:33
  • _"AttributeError: 'str' object has no attribute 're'"_ I'm not sure what kind of answer you expect. Can you be more specific about what the issue is? – AMC Feb 14 '20 at 23:33

1 Answers1

5

Here's the fixed code:

sorted_output = sorted(filelist,key=lambda x: sorting.index(re.split(r'_|\.',x)[3])) 

The string input to re.split() should be passed as the second argument to the function; you do not call re.split() on a string. The first argument is the regular expression itself which you had correct.

Also: you need to escape the . with a \ because the full-stop or period is a special character in regular expressions which matches everything.

Output:

In [13]: sorted(filelist,key=lambda x: sorting.index(re.split(r'_|\.',x)[3]))                       
Out[13]: 
['3006345_2234661_ENG_FRONT.jpg',
 '3006345_2234661_ENG_BACK.jpg',
 '3006345_2234661_ENG_LEFT.jpg',
 '3006345_2234661_ENG_RIGHT.jpg',
 '3006345_2234661_ENG_INGREDIENTS.jpg',
 '3006345_2234661_ENG_INSTRUCTIONS.jpg',
 '3006345_2234661_ENG_INFO.jpg',
 '3006345_2234661_ENG_NUTRITION.jpg',
 '3006345_2234661_ENG_PRODUCT.jpg']

Edit: as @Todd mentions in the comments, if you want to additionally ensure that the strings are sorted by the numeric part after the first sort takes place then use:

sorted(filelist,key=lambda x: [sorting.index(re.split(r'_|\.',x)[3]),x])
mechanical_meat
  • 155,494
  • 24
  • 217
  • 209