-2

I have several million records in my mongodb database. Lets say I have some common fields on all my records like name, surname etc...

1) I need to search case Insensitive. The common answer is using regex, but I think it will decrease my Apps performance dramatically. Is this the best way to go?

2) I should be able to get results in my language(Turkish). In Turkish, the uppercase of i is İ, and uppercase of ı is I. I even dont get correct results if my string contains those letters. even with case insensitive regex search. How can I fix this if there is no database language option in mongo?

3) Do you think there is another nosql solution better suited for my purpose? (Case Insensitive Searching millions of records having different columns and Turkish charset)

Thanks

  • Possible duplicate of [How do I make case-insensitive queries on Mongodb?](http://stackoverflow.com/questions/7101703/how-do-i-make-case-insensitive-queries-on-mongodb) – cenouro Oct 20 '15 at 16:52
  • Not exactly, I also have language specific character issues so using regex can be a bad decission for me – Aytek Ustundag Oct 20 '15 at 19:01
  • From your question, your problem is with uppercase letters. But if you save the text in lowercase as the answer I have linked suggests, it is very possible it can work. – cenouro Oct 20 '15 at 19:06
  • By the way, Mongo already uses Unicode. So your problem is probably the match pattern instead of Mongo's representation. [From the docs](https://docs.mongodb.org/manual/reference/operator/query/regex/#pcre-vs-javascript), you can even use PCRE. What I suggest is that you try escaping Turkish specific letters on your regex patter. For example, instead of 'İ', use '\u{unicode of İ}'. You will have to test that, and I still advise you save strings in lowercase or uppercase for searching because the index can become messy. – cenouro Oct 20 '15 at 19:19

1 Answers1

0

MongoDB from 2.6 on has a text index (https://docs.mongodb.org/manual/reference/operator/query/text/#text-query-operator-behavior). But it does not specify what the behaviour is for the Turkish language although it is supported.

For the Latin alphabet, text search is case insensitive for non-diacritics; i.e. case insensitive for [A-z].

user392486
  • 195
  • 4