38

I'm building a world builder out of a SQL database. I'm going to use an implementation of Markov chains to build up lists of 1,000's of names. I'd like those names to have some consistency, so I plan to use various mixtures of real-world names as the input data for the procedure. The intention is not to re-create names that have existed, but rather to use Markovian logic to let SQL notice phonetic/morphological patterns my English-shackled brain can't.

I intend to use the worlds I create as backstory for D&D campaigns and also a showcase for my development skills, so the fruits of this labor will be public.

For example, the names for culture X are derived from 80% Babylonian and 20% modern Lithuanian. Using a data set of 80 Babylonian names and 20 Lithuanian names will, once fed through the name generator, give me something that is close to historical but hopefully with enough flavor to not sound derivative.

Doing this requires large volumes of names sorted by culture. I've been unable to locate such a data source. I'd like something that requires minimal re-formatting, editing, etc.

Sites such as 20000-names.com are helpful, but I'm hoping to avoid the formatting that comes with them.

Update: I've published my results so far on GitHub. For the non-SQL speakers, I hope to have a better interface for this project in the future. For those who can SQL, clone down the repo, run each .sql file, and execute the procedure markov_Complete. I'm happy to accept pull requests, even if it's just improving the readme. Any conversation specific to the database should happen on GitHub, not here.

Seymour Guado
  • 271
  • 1
  • 8
Nate Anderson
  • 511
  • 5
  • 9
  • 3
    While there is precedent for questions seeking resources on the real world being on topic on Worldbuilding, you might want to also visit our sister site [history.se] as they may be able to help you with this. – user Sep 05 '17 at 17:29
  • 3
    There is also no need to specifically call out your edits in the text of a post. Rather, try to work your edits into the text such that the post reads as a coherent whole, not a collection of edits. The revision history is available for everyone to see if someone wants to know how the post evolved into its current form. – user Sep 05 '17 at 17:30
  • @MichaelKjörling it's a carryover from Stack Overflow, where that practice is common. – Nate Anderson Sep 05 '17 at 17:31
  • I suppose I can see the value of doing it that way on [so], which is much faster paced. As questions here typically require some thinking and possibly research before they can be answered, along with a much smaller (but still vibrant) community, our site isn't as fast paced. (The recommendation here is also to wait at least a day before accepting an answer, even if you receive answers early. Just a heads up.) – user Sep 05 '17 at 17:32
  • 10
    Very nice idea. Any intention to make such a database publicly available? – Thorsten S. Sep 05 '17 at 18:36
  • 1
    I'd love to see the code for this once it's up and running... – Jeff Zeitlin Sep 05 '17 at 18:53
  • 12
    While it is great that you're coming to us with your question I think you might profit more from asking on the open data stackexchange - that's pretty much what they do :)

    The mentioned site also helped me with a question so similar to yours that I will make it an answer :)

    – dot_Sp0T Sep 05 '17 at 20:48
  • 1
    Having done something similar myself, I think 100 names is definitely not enough. I started looking around the web for interesting sounding names and got some good lists from Game of Thrones, amongst other sources. I'll look around to see if I can locate, and if not I'll post them somewhere online. The other good resource I found was: https://donjon.bin.sh/ – RIanGillis Sep 05 '17 at 21:21
  • While I understand the frustration of the task you've set yourself up for I really feel this question is an invitation to effectively link-only answers, ones that Google will give you if you ask it anyway. – Ash Sep 06 '17 at 10:33
  • 1
    Since you mention Lithuanian as an example, a word of warning. In many eastern European languages, the ending of the last name of a person depends on their gender. Lithuanian seems to be no exception, they even seem to change depending on marital status ( see ). While somebody unfamiliar with the language will not notice, you may confuse some people, if your very manly barbarian is clearly identified as a married woman by his last name. – mlk Sep 06 '17 at 14:05
  • Here are a couple of lists that might be worth looking at. – Luke Sep 06 '17 at 16:35
  • 1
    have you looked at sites like this? http://www.fakenamegenerator.com/advanced.php - You can set the nationalities just like you say - unless you truly require the database for other purposes, it might be easier to use someone else's data so you can get on with the creative stuff – NKCampbell Sep 06 '17 at 18:38
  • @NateAnderson The same idea typically applies to StackOverflow as well. – Rob Sep 07 '17 at 01:52
  • @NateAnderson while it is good to wait for a bit before accepting an answer, you eventually should do so. If no answer seems satisfactory it is good practice to add additional detail to your question or comment on promising answers pointing out what irks :) – dot_Sp0T Sep 12 '17 at 14:17
  • I know I'm a little (or a lot) late to the party, but have you considered reducing your names down to their basic syllables and then using neural nets or markov chains to construct names from syllables? – Jakob Lovern Jan 20 '18 at 03:24
  • I did something like this before as part of an RPG I never finished. The Python code for it is here. I found Roman names and Viking names by doing web searches for those (individually). I wouldn't assume there's any single master database of all kinds of ethnic names, though. – workerjoe Sep 06 '18 at 18:33

6 Answers6

40

Some time ago (about 2 years) I went looking for a huge list of names. I wanted to use that list to uniquely name objects in my game-engine without having to resort to using generic uids that are hardly distinguishable. I found help on the excellent open data stackexchange.

Long story short, I present you: ftp://ftp.heise.de/pub/ct/listings/0717-182.zip

A zip-file containing about 50k human (first) names, classified by gender and popularity in each country.

dot_Sp0T
  • 12,111
  • 3
  • 54
  • 105
23

A partial answer, combining my comments on the question, plus subsequent finds:

  1. For historical names, the Society for Creative Anachronism has an administrative section, the College of Heralds, who maintain lists of registered “SCA Names”. There are rules for authenticity, and they maintain some references for acceptable names. Check their page on names at the SCA website.

  2. The Academy of St. Gabriel is an organization separate from the SCA, but who have worked closely with the SCA to assist those who seek a higher level of authenticity for their names or heraldry than the SCA requires.

  3. Wikipedia has an entire category of lists of names, both personal and family, for many cultures. Some of the lists there are of specific types of names within a culture, as well.

  4. In addition to the Lists of Names category, Wikipedia has a category Names by Culture. The pages in this category go into a little more detail about the structure and historical context of the names, rather than just being a simple list.

  5. Google, naturally, is your friend. There are innumerable baby name lists out on the web; most will be of currently popular names. You can always try to narrow it down by culture or nationality (e.g., Gujarati baby names, Romany baby names, etc.).

  6. Some countries - and some states in the United States - have restrictions on children’s names. Start with Wikipedia’s page on naming laws, or with this Google search, and if your worldbuilding is based on the culture of a country/state that has restrictions, check the references and any resources they may direct you to to find the list of approved (or disapproved) names.

(This list should by no means be considered either authoritative or exhaustive; as I come across other resources, I will update - and I encourage those with sufficient rep here in Worldbuilders to do the same.)

Jeff Zeitlin
  • 1,428
  • 1
  • 10
  • 14
  • You can mark your answer as community wiki if you like, via an [edit]. That will make it easier for others to update it, but also means you won't be earning reputation from upvotes. See https://worldbuilding.stackexchange.com/help/privileges/community-wiki and https://worldbuilding.stackexchange.com/help/privileges/edit-community-wiki. – user Sep 05 '17 at 18:15
  • "Names by Culture" - I checked it, it gives the following (5 in all) "jewish names": Axel, Haviv, Lévai, Einhorn,Yogev. Out of these only Yogev and perhaps Haviv are really "jewish" names.. – John Donn Sep 06 '17 at 10:35
  • 4
    @JohnDonn - "This list should by no means be considered either authoritative or exhaustive..." - Wikipedia is never a final authority; at best, one should consider it a good starting place for further research. Wikipedia is also community-editable; consider updating the article in question, and include citations for your edits. – Jeff Zeitlin Sep 06 '17 at 11:28
  • @JohnDonn: What are you basing your claim on? I’m not sure about Axel, but Lévai and Einhorn certainly seem to belong on that list: Lévai is a Hungarianised form of the Hebrew-origin Levy/Levi, and Einhorn, while linguistically entirely German, seems strongly associated with German Jewish families, judging from notable Einhorns. – Peter LeFanu Lumsdaine Sep 07 '17 at 12:45
  • "What are you basing your claim on?" - personal experience. – John Donn Sep 07 '17 at 16:39
  • To strengthen @JohnDonn's warning: I'm a Romanian and reading the Wiki article on "Romanian names" I can attest that it's self-contradictory and misleading. "Spanish/Latin American names have become popular; [...] names like Mario, Antonio, Alberto, Esmeralda, Gianni, Giovanni, Alessia etc are relatively common": no, not among ethnic Romanians living in Romania; and Gi{ova,}nni is not Spanish. "Middle names (second given names) are also fairly common": no, they are impossible; it's common for the one given name to have multiple components. – AlexP Sep 07 '17 at 19:32
  • @AlexP - the warning is - and has been, from the beginning - right there in the answer. :) – Jeff Zeitlin Sep 07 '17 at 19:38
13

I have a different approach for you.

Start with an excel column with English words that could be names. For example: Rump Cheek.

Next column translate that to Lithuanian via this https://www.labnol.org/internet/google-translate-for-spreadsheets/10086/

I get "Skruosto Skruostas". Which has a ring to it!

Third column is to translate into Babylonian. I used Turkish instead because it is the closest country that uses roman letters I can read. I got "Yanak Yanak". OK, but no Skruosto.

Randomly choose by percentage which column you will use. Sometimes translation from language 2 into language 3 will not be the same as from language 1 into language 3. All good.

Downside: these are not names. Probably. Probably you will not run across Skruosto when you visit Lithuania. I bet it would be a fine nickname. Keep it if you like!

Upsides: 1: Very fast to do. 2: Names sound great. 3: if any Lithuanians or Turks ever venture into your world they will soil themselves laughing.

Willk
  • 304,738
  • 59
  • 504
  • 1,237
  • 2
    It doesn't seem the like the OP is looking for alternate approaches, so I'm not sure that this answers the question. – HDE 226868 Sep 05 '17 at 20:03
  • 2
    You may be right, @HDE 226868. I am willing to risk downvotes for the chance that the OP gives me that green check for the awesomeness of this approach. If the desired end result is "give me something that is close to historical but hopefully with enough flavor to not sound derivative." this method is a good answer. – Willk Sep 05 '17 at 20:08
  • 8
    Turkey may be geographically close(ish) to Iraq, but that doesn't mean that the Turkish language has much in common with Akkadian. If you're looking for a Semitic language which uses the Latin alphabet, Maltese is the obvious candidate. – Peter Taylor Sep 05 '17 at 21:57
  • 1
    @Peter Taylor - excellent! I will read up on Maltese! – Willk Sep 05 '17 at 23:05
  • 1
    @PeterTaylor, also Assyrians still exist with their own language (which can be represented in latin alphabet) and what not. – user28434 Sep 06 '17 at 08:55
  • @Will, this is a terrific idea. As I'm hoping to populate a huge world, using Markov chains with insufficient inputs will likely make the generated names pretty derivative. Once I exhaust the resources for historical names, or for old languages for which we lack resources, I'll try this approach. – Nate Anderson Sep 07 '17 at 12:47
  • I like this Idea, You want names for a fantasy world? you only need an amount of words in a language (must not be english) and translate this to many languages. But i think of a program to do this instead of excel. You alos can decide that one of your worlds countries has names that are mixed of.. say, babylonian and zulu words and another of ... ancient greek and aztec words. Of course you will have to play with this to get "good" names – Julian Egner Oct 06 '17 at 07:24
7

It's fairly certain that by "names" you mean people's names. It is quite unclear whether you mean their "full" names, their surnames, their given names, or what.

Names have two purposes, I think: as a form of identification and as a form of address. If we assume the typical Western take on names (given & family) and avoid common issues like name changes (due to coming of age, marriage, etc.), the use of aliases/diminutives (Maria, Masha, Marusya, and Maria Vasilyevna all are for the same person), and the possibility that a person's surname may depend on sex, age, or status (Lord Kelvin = Baron Kelvin = Williams Thomson). (Not only of the person so named but on the relationship, age, sex, and status of the speaker.)

So, in your world building perhaps you should also have a couple of choices on selecting among different sets of "rules" as well as the specific character strings to use. The best site I've found is:http://www.top-100-baby-names-search.com/female-chinese-names.html which gives 100/100(m/f) names for 19 countries (of course USA & Euro countries also are extensively documented elsewhere.)

Copying and pasting those lists into an MS Excel spreadsheet would take about 30 minutes. USA SSA has .zip files for both National and State-by-state first names from ~1915 to current (2016).

See https://www.ssa.gov/oact/babynames/limits.html Wikipedia maintains a page List of people by nationality which then directs you to various nations' lists.

As far as extinct/historical names, I've no wisdom there.

Secespitus
  • 17,743
  • 9
  • 75
  • 111
  • 1
    I see you are still unregistered. Please read Why should I register my account? to see why you should register. It helps you because you can collect your reputation and get privileges, which allow you to do more on the site. For example you can start voting on questions once you reach 15 reputation and comment once you reach 50 reputation. Also markdown only makes a soft linebreak if you use two spaces before a single linebreak or it makes a paragraph if you use two linebreaks. That makes it easier to read. – Secespitus Sep 05 '17 at 20:43
  • Good point on the rules for names. Indeed, that's one of the first things I decide when I make a culture — those unimaginative two–word names do irk me so. George Lucas and Robert Jordan did some decent worldbuilding, but nominative practices was not a strong point for either. – can-ned_food Sep 06 '17 at 07:10
  • Don't forget the issue of John James II, John James III, John James IV.... and similar issues. – CaM Sep 07 '17 at 13:37
1

This won't help with many cultures throughout history, but "pipe rolls" and court records from the Middle Ages are first-hand accounts of the economy, bureaucracy, law and nobility. These are routinely used by historians of every kind - though most online resources are from England - not only to identify names and genealogies, but also the day-to-day lives of historical peoples. You'll find names as (once) common as Piers and Pate and as bizarre as Roger Fuckebythenavele.

Kabob Maraca
  • 654
  • 5
  • 9
1

You want Kate Monk's Onomastikon. Although it hasn't been updated in 15 years or so, it's still a useful resource.

Keith Morrison
  • 21,416
  • 1
  • 38
  • 76