21

I was wondering if the JSON spec defined a regular language. It seems simple enough, but I'm not sure how to prove it myself.

The reason I ask, is because I was wondering if one could use regular expressions to effectively pars JSON.

Could someone with enough rep please create the tags and for me?

Tsuyoshi Ito
  • 16,466
  • 2
  • 55
  • 106
jjnguy
  • 321
  • 2
  • 6
  • 6
    I removed the tag [json] because it does not seem to be worth a tag on TCS SE. – Tsuyoshi Ito Dec 27 '10 at 20:42
  • @Tsuy, sounds good. Obviously I'm not an avid user of the site, so I'm sure you know better. – jjnguy Dec 27 '10 at 20:46
  • 1
    Remember that regex implementations frequently match more than just regular languages. E.g. you can use lookaheads in most implementations, which will accept $a^nb^n$ correctly, solving the $[^nx]^n$ problem others mentioned below. – Xodarap Jan 12 '11 at 16:58

2 Answers2

32

No, it's not regular. Since it allows arbitrary embedding of balanced delimiters, it must be at least context-free.

For example, consider an array of arrays of arrays:

[ [ [ 1, 2], [2, 3] ] , [ [ 3, 4], [ 4, 5] ] ] 

Clearly you couldn't parse that with true regular expressions.

Marc Hamann
  • 2,904
  • 1
  • 24
  • 22
  • 8
    To obtusely split hairs, the JSON representations of all arrays of arrays of arrays of integers is regular. – Charles Stewart Dec 27 '10 at 23:36
  • 17
    Then keep adding "arrays of" recursively until you are happy. ;-) – Marc Hamann Dec 28 '10 at 00:49
  • 1
    Standard JSON is context-free, but most implementations only support unique keys. I moved my unanswered question from stackoverflow to: http://cstheory.stackexchange.com/questions/4668/which-formal-language-class-are-xml-and-json-with-unique-keys – Jakob Jan 31 '11 at 13:39
  • Note that I said "at least context-free". – Marc Hamann Jan 31 '11 at 15:18
  • Expanding on @CharlesStewart's comment, does this mean that "JSON with a strict max depth IS a regular language"? Or do other features of JSON prevent this? – jchook Nov 24 '19 at 23:04
30

Since $a^n b^n$ is not a regular language, neither is JSON, since $[^n 5 ]^n$ is valid input for any $n$. Likewise, your regular expression parser would have to properly reject any input $[^m 4 ]^n$ where $m \ne n$ which you cannot do with regular expressions.

Hence, JSON is not regular.

RegexFan
  • 316
  • 2
  • 2
  • Curious, what's the superscript/bracket notation used here? – jchook Nov 24 '19 at 23:08
  • 1
    @jchook The superscript notation means 'repeated.' So $a^n$ would be $a$ repeated $n$ times. Similarly, the notation $[^n$ means $[$ repeated $n$ times. The brackets have no special meaning. A regular language cannot 'count' how many brackets are on each side, so they cannot tell if both sides have the same number of brackets. – Nick ODell Dec 26 '19 at 23:19