0

I am using ArcMap 10.6.

In an attribute field, I have 1000s of records.

I am wondering if there is a way to use the field calculator to get this desired result:

Species | Label
A | 1
A | 1
D | 2
D | 2
F | 3
F | 3

I could use an if-elif statement, but I have 100s of unique values and would like to assign a number to them.

Kadir Şahbaz
  • 76,800
  • 56
  • 247
  • 389

4 Answers4

5

You could use the summary statistic tool to create a temporary table to aggregate by species name. Then add a new sequential number field and populate that with field calculator. Finally join the table to the original dataset, either leave it as a temporary join or pass in the field to make it permanent (see join field tool).

Hornbydd
  • 43,380
  • 5
  • 41
  • 81
5

If you're talking about ArcMap, you could try something like this in the field calculator's Python code block (untested):

lookup = dict()

def get_id(value): global lookup try: uid = lookup[value] except KeyError: uid = lookup[value] = len(lookup) return uid

get_id(!some_field!)

Edit: annotated to explain the code:

# create a dictionary with nothing in it
lookup = dict()

define the function that will be called for every row in the table

def get_id(value): # tell the function to look for the dictionary defined outside of the function itself. # we can't define it inside of the function, otherwise it would get emptied on every row. global lookup # attempt to look up the ID associated with this value, store it in the uid variable. # it will only be present if we've already seen this value, otherwise we'll get a KeyError. try: uid = lookup[value] # in the case that we haven't yet seen the value, then set the ID to the current length # of the dictionary, which is the number of unique values we've seen so far. # also store it in the dictionary for later. except KeyError: uid = lookup[value] = len(lookup) # give the ID back to whatever called the function return uid

Edit 2: A lot of similar approaches are being floated, which are all great! But what's actually fastest? Let's see. I've added a bonus one here as well, that hasn't bee proposed yet.

import timeit
import random

random.seed(42)

field_values = [chr(random.randint(32, 127)) for _ in range(1010*6)]

def test_eafp(): lookup = dict()

def get_id(value):
    try:
        uid = lookup[value]
    except KeyError:
        uid = lookup[value] = len(lookup)
    return uid

result = [get_id(v) for v in field_values]


def test_lbyl(): lookup = dict()

def get_id(value):
    if value in lookup:
        uid = lookup[value]
    else:
        uid = lookup[value] = len(lookup)
    return uid

result = [get_id(v) for v in field_values]


def test_get_method(): lookup = dict()

def get_id(value):
    uid = lookup[value] = lookup.get(value, len(lookup))
    return uid

result = [get_id(v) for v in field_values]


def test_missing_overload(): class LookupDict(dict): def missing(self, key): value = self[key] = len(self) return value

lookup = LookupDict()
result = [lookup[v] for v in field_values]


if name == 'main': print 'Timings with {} random characters (best of 10)'.format(len(field_values)) timer = timeit.Timer(test_eafp) print 'EAFP', min(timer.repeat(repeat=10, number=1)), 'seconds' timer = timeit.Timer(test_lbyl) print 'LBYL', min(timer.repeat(repeat=10, number=1)), 'seconds' timer = timeit.Timer(test_get_method) print 'dict.get()', min(timer.repeat(repeat=10, number=1)), 'seconds' timer = timeit.Timer(test_missing_overload) print 'overload missing', min(timer.repeat(repeat=10, number=1)), 'seconds'

Timings with 10000000 random characters (best of 10)
EAFP 0.762635946274 seconds
LBYL 0.871690034866 seconds
dict.get() 1.51433086395 seconds
overload __missing__ 0.721973896027 seconds
mikewatt
  • 5,083
  • 9
  • 21
  • Omg I have no clue what this means, but it worked! Hell ya! Thank you! If you don't mind me asking (for my own understanding), can you explain how this works? How do the numbers get added if there is no counter? – Ayayayayaoh Feb 17 '21 at 23:29
  • Nice, glad it worked for ya. Feel free to mark as solved if that did the trick! I just edited the post with an explanation-- it's using the dictionary length as opposed to incrementing a variable. The length will correspond to the number of unique field values we've assigned as keys up to that point. – mikewatt Feb 17 '21 at 23:35
  • 1
    That's a nice approach, I've used similar and know it works well; using field calculator shortcuts the I/O code eliminating the need for an update cursor and python window. Are you sure the crash 'n burn (try/except) approach is the best? I've always used if value in lookup: get the value else: assign new value; I'd also sneak in a value=value.lower() on the 2nd line, I've been bitten by 'String' != 'string' too many times before. – Michael Stimson Feb 18 '21 at 00:18
  • 1
    @MichaelStimson, best how? EAFP is generally encouraged in Python over LBYL so I tend to stick with that. There's no danger to a try/except when you're specific with what you're catching and limit the try block to its minimum. There's also nothing wrong with the LBYL approach you propose, though, just preference.

    Good point re: the case-sensitive comparison, @Ayayayayaoh might want to think about changing that. I might leave the answer as-is though, since the function should work for any hashable field type, not just strings.

    – mikewatt Feb 18 '21 at 01:08
  • Good point @mikewatt, that is the python way. I come from a background of not having try/catch (try/except) to fall back on so I'm guess I'm just holding onto old prejudices and going overboard anticipating the likely failures.It is even possible a try/except is faster than the 'in' operator which is notably slow to resolve a boolean from exceptionally large lists or dictionaries. – Michael Stimson Feb 18 '21 at 07:09
  • 1
    I've never heard of these phrases EAFP and LBYL, but I guess I'm LBYL, although I do use try/excepts but probably not as best practise. Whilst this approach is very slick indeed it is using that style of coding I certainly find confusing when commands are chained together on a single line. For that reason alone I would have gone down the road that @MichaelStimson uses, but its good to see alternative approaches. Thanks. – Hornbydd Feb 18 '21 at 10:35
  • 2
    So many ways in python :) uid = lookup[value] = lookup.get(value, len(lookup)) – user2856 Feb 20 '21 at 02:04
  • @user2856 indeed, nice one-liner :p Downside to that one is that it's doing both a lookup and an assignment for every row, regardless of whether or not the value is already present. In practical terms that won't matter, but it will be technically the slowest approach out of any listed so far. I've added timings for all of these to the answer. – mikewatt Feb 22 '21 at 17:37
3

LBYL approach:

lookup = dict()
def get_id(value):
 uid = lookup.get(value,0)
 if uid ==0: uid = lookup[value] = len(lookup)+1
 return uid

But result is not sorted, unlike @Hornbydd one.

Update based on user @2856 comment to post below:

lookup = dict()
def get_id(value):
 uid = lookup[value] = lookup.get(value,len(lookup)+1)
 return uid
FelixIP
  • 22,922
  • 3
  • 29
  • 61
0

You can achieve this using a virtual layer.

In a single step, it would identify the distinct species, create a unique ID for each species and join back this ID to the original layer.

Go to the menu Layer > Add Layer > Add/Edit Virtual Layer... and enter the following query. Replace the layer name for yours (a).

WITH labels as 
  (SELECT a.species, row_number() over(order by species) as label
  FROM a
  GROUP BY a.species)
SELECT a.*, labels.label
FROM a
JOIN labels ON a.species = labels.species
JGH
  • 41,794
  • 3
  • 43
  • 89