4

Using HTML5 (or less preferably JavaScript) is it possible to limit the maximum length of an input to a particular number of bytes?

I realise that I can limit to a number of characters with:

<input type="text" maxlength="4" />

But that's not good enough because I can input up to four two-byte chars in it.

Obviously I am validating this server-side, but I would like this on the browser-side too.

Edit: Just to be clear, I do wish to be able to support UTF-8. Sorry @elclanrs.

meshy
  • 7,806
  • 8
  • 48
  • 69
  • 1
    clientside+serverside ajax? – Alessandro Gabrielli Jun 05 '13 at 21:02
  • Or maybe just limit the characters on `keydown` with something like `/[\w\s]+/` which will be all 1byte. – elclanrs Jun 05 '13 at 21:05
  • The user can input up to four characters, each of which can be up to four bytes long when UTF-8 is used. I don’t see the point of this question. Why would you limit input to a certain amount of bytes? The data will be processed as *characters* anyway. – Jukka K. Korpela Jun 06 '13 at 03:58
  • @JukkaK.Korpela Thank you for reminding me that non-ASCII chars can be more than 4 bytes. The data *will not be processed as characters* on the server. It is an unavoidable limitation that I must use a particular number of bytes. – meshy Jun 06 '13 at 12:56
  • @AlessandroGabrielli I'm not really enamoured with that solution, as I wish to keep traffic to a minimum, but it may come to that. – meshy Jun 06 '13 at 12:57

2 Answers2

1

this script has a couple minor UX glitches that can be cleaned up, but it does accomplish the basic task outlined when i tested it in chrome:

<input id=myinp />


<script> // bind handlers to input:
   myinp.onkeypress=myinp.onblur=myinp.onpaste= function vld(e){
     var inp=e.target;
     // count bytes used in text:
     if( encodeURIComponent(inp.value).replace(/%[A-F\d]{2,6}/g, 'U').length > 4){
        // if too many bytes, try to reject:
        e.preventDefault;
        inp.value=inp.val||inp.value;
        return false;
     }
     // backup last known good value:
    inp.val=inp.value;
   }

</script>
dandavis
  • 15,617
  • 5
  • 38
  • 36
  • This looks great, thank you! According to this: http://stackoverflow.com/questions/5290182/how-many-bytes-takes-one-unicode-character UTF-8 may have up to 6 bytes for any character. A colleague explained to me that this will work only with one and two-byte chars. Do you see a way of making it work with more? – meshy Jun 06 '13 at 13:10
  • 2 bytes fits all major web-used languages AFAIK (i'd love a contra if somone has one). i modified my answer to support 6-char escape codes, but i think 4 chars should be enough for anyone, but i'm no linguist.... – dandavis Jun 06 '13 at 15:47
  • This is prone to under-count. For the text: 'I am a 19 char text', this method counts 17 bytes, which is incorrect. You could use `encodeURIComponent(inp.value.replace(/\d/g,'X')).replace(/%[A-F\d]{2,6}/g, 'U')` instead. – jlhonora Nov 03 '16 at 23:08
  • To count bytes, this may be useful : `new Blob([str]).size` .. found [here](https://stackoverflow.com/a/52254083/2628312) – Jirka Justra May 21 '22 at 21:18
0

If estimating isn't good enough, I'd filter all the non single-byte chars and count them.

Jonathan
  • 8,095
  • 4
  • 37
  • 70