3

I'm building an NLP classifier in python and would like to build a hosting HTML page for a demo. I want to test on a sample text to see the prediction and this is implemented in python through tokenizing the text and then padding it before predicting. Like this:

tf.tokenizer.texts_to_sequences(text)
token_list = tf.tokenizer.texts_to_sequences([text])[0]
token_list_padded = pad_sequences([token_list], maxlen=max_length, padding=padding_type)

The problem is that I'm new to javascript, so is there tokenization and padding methods in javascript like in python?

edkeveked
  • 16,776
  • 8
  • 50
  • 87
ezzeddin
  • 431
  • 1
  • 4
  • 19
  • You may want to look at https://ml5js.org/ it's a js library that is built on top on tensorflow. – JDunken Jan 02 '20 at 14:02
  • I think ml3js is pretty new and does not support functions in NLP like *tokenizer* and *pad_sequences* – ezzeddin Jan 03 '20 at 05:51

2 Answers2

1

There is not yet a tf.tokenizer in js as there is in python.

A simple js.tokenizer has been described here. A more robust approach would be to use the tokenizer that comes with universal sentence encoder

edkeveked
  • 16,776
  • 8
  • 50
  • 87
1

There is no native mechanism for tokenization in Javascript.

You can use a Javascript library such as natural or wink-tokenizer or wink-nlp. The last library automatically extracts a number of token's features that may be useful in training.

sks
  • 81
  • 6