Limit html text input to a particular number of bytes?

Question

Using HTML5 (or less preferably JavaScript) is it possible to limit the maximum length of an input to a particular number of bytes?

I realise that I can limit to a number of characters with:

<input type="text" maxlength="4" />

But that's not good enough because I can input up to four two-byte chars in it.

Obviously I am validating this server-side, but I would like this on the browser-side too.

Edit: Just to be clear, I do wish to be able to support UTF-8. Sorry @elclanrs.

Or maybe just limit the characters on `keydown` with something like `/[\w\s]+/` which will be all 1byte. — elclanrs, Jun 05 '13 at 21:05
The user can input up to four characters, each of which can be up to four bytes long when UTF-8 is used. I don’t see the point of this question. Why would you limit input to a certain amount of bytes? The data will be processed as *characters* anyway. — Jukka K. Korpela, Jun 06 '13 at 03:58
@JukkaK.Korpela Thank you for reminding me that non-ASCII chars can be more than 4 bytes. The data *will not be processed as characters* on the server. It is an unavoidable limitation that I must use a particular number of bytes. — meshy, Jun 06 '13 at 12:56
@AlessandroGabrielli I'm not really enamoured with that solution, as I wish to keep traffic to a minimum, but it may come to that. — meshy, Jun 06 '13 at 12:57

dandavis · Accepted Answer · 2013-06-06T15:45:33.187

1

this script has a couple minor UX glitches that can be cleaned up, but it does accomplish the basic task outlined when i tested it in chrome:

<input id=myinp />


<script> // bind handlers to input:
   myinp.onkeypress=myinp.onblur=myinp.onpaste= function vld(e){
     var inp=e.target;
     // count bytes used in text:
     if( encodeURIComponent(inp.value).replace(/%[A-F\d]{2,6}/g, 'U').length > 4){
        // if too many bytes, try to reject:
        e.preventDefault;
        inp.value=inp.val||inp.value;
        return false;
     }
     // backup last known good value:
    inp.val=inp.value;
   }

</script>

edited Jun 06 '13 at 15:45

answered Jun 05 '13 at 21:45

dandavis

15,617
5
38
36

This looks great, thank you! According to this: http://stackoverflow.com/questions/5290182/how-many-bytes-takes-one-unicode-character UTF-8 may have up to 6 bytes for any character. A colleague explained to me that this will work only with one and two-byte chars. Do you see a way of making it work with more? – meshy Jun 06 '13 at 13:10
2 bytes fits all major web-used languages AFAIK (i'd love a contra if somone has one). i modified my answer to support 6-char escape codes, but i think 4 chars should be enough for anyone, but i'm no linguist.... – dandavis Jun 06 '13 at 15:47
This is prone to under-count. For the text: 'I am a 19 char text', this method counts 17 bytes, which is incorrect. You could use `encodeURIComponent(inp.value.replace(/\d/g,'X')).replace(/%[A-F\d]{2,6}/g, 'U')` instead. – jlhonora Nov 03 '16 at 23:08
To count bytes, this may be useful : `new Blob([str]).size` .. found [here](https://stackoverflow.com/a/52254083/2628312) – Jirka Justra May 21 '22 at 21:18

score 0 · Answer 2 · answered Jun 05 '13 at 21:06

0

If estimating isn't good enough, I'd filter all the non single-byte chars and count them.

answered Jun 05 '13 at 21:06

Jonathan

8,095
4
37
70

Limit html text input to a particular number of bytes?

2 Answers2

Linked