1

Target = price.

Suppose I have license plates in the form {letter}-{number}. For example, 'A-12345', or 'K-343'

Letter can be any letter from A to Z, and numbers from 2-5 digits long.

Here are some of the features I am extracting, but I would like some more ideas.

  1. It's common knowledge that plates with more digits tend to be more expensive, So # digits is an obvious feature.

  2. Plates with a specific pattern tend to be more expensive, '12345' or '44544' , will be more expensive than '19428'. I am not sure how to capture this. Number of unique digits divided by the number of digits, will probably capture the latter pattern but not the former.

I need more ideas basically.

  • I wonder whether the entropy of the pattern might be a useful surrogate for its human interest, with "interesting" patterns exhibiting extremely large or extremely small entropies. This wouldn't account for the interest of having digits in order, but you could handle that similarly by coding the order as a sequence of signs of the differences and compute its entropy. – whuber Dec 18 '21 at 22:14

0 Answers0