Most Popular
1500 questions
22
votes
4 answers
How to export one cell of a jupyter notebook?
I'm currently working/prototyping into a Jupyter notebook. I want to run some of my code on a standalone iPython shell.
For now, I export my iPython code (file --> download as) and then execute it in my iPython (with %run). It works, but I would…
Manu H
- 409
- 2
- 4
- 13
22
votes
3 answers
How to perform feature engineering on unknown features?
I am participating on a kaggle competition. The dataset has around 100 features and all are unknown (in terms of what actually they represent). Basically they are just numbers.
People are performing a lot of feature engineering on these features. I…
user2409011
- 451
- 1
- 5
- 8
22
votes
3 answers
Linear regression with non-symmetric cost function?
I want to predict some value $Y(x)$ and I am trying to get some prediction $\hat Y(x)$ that optimizes between being as low as possible, but still being larger than $Y(x)$. In other words:
$$\text{cost}\left\{ Y(x) \gtrsim \hat Y(x) \right\} >>…
asPlankBridge
- 323
- 2
- 6
22
votes
2 answers
Doc2Vec - How to label the paragraphs (gensim)
I am wondering how to label (tag) sentences / paragraphs / documents with doc2vec in gensim - from a practical standpoint.
Do you need to have each sentence / paragraph / document with its own unique label (e.g. "Sent_123")? This seems useful if…
B_Miner
- 702
- 1
- 7
- 20
21
votes
4 answers
Hyperparameter search for LSTM-RNN using Keras (Python)
From Keras RNN Tutorial: "RNNs are tricky. Choice of batch size is important, choice of loss and optimizer is critical, etc. Some configurations won't converge."
So this is more a general question about tuning the hyperparameters of a LSTM-RNN on…
wacax
- 3,390
- 4
- 23
- 45
21
votes
3 answers
Does click frequency account for relevance?
While building a rank, say for a search engine, or a recommendation system, is it valid to rely on click frequency to determine the relevance of an entry?
Rubens
- 4,107
- 5
- 23
- 42
21
votes
6 answers
What do you use to generate a dashboard in R?
I need to generate periodic (daily, monthly) web analytics dashboard reports. They will be static and don't require interaction, so imagine a PDF file as the target output. The reports will mix tables and charts (mainly sparkline and bullet graphs…
aiolias
21
votes
3 answers
Feature extraction of images in Python
In my class I have to create an application using two classifiers to decide whether an object in an image is an example of phylum porifera (seasponge) or some other object.
However, I am completely lost when it comes to feature extraction techniques…
Jeremy Barnes
- 315
- 1
- 3
- 8
21
votes
3 answers
Uses of NoSQL database in data science
How can NoSQL databases like MongoDB be used for data analysis? What are the features in them that can make data analysis faster and powerful?
10land
- 369
- 3
- 10
21
votes
3 answers
What is the bleu score of professional human translators?
Machine translation models are usually evaluated using bleu score. I want to get some intuition for this score. What is the bleu score of professional human translator?
I know it depends on the languages, the translator ect. I just want to get the…
Amit Keinan
- 796
- 6
- 19
21
votes
3 answers
Dataset for Named Entity Recognition on Informal Text
I'm currently searching for labeled datasets to train a model to extract named entities from informal text (something similar to tweets). Because capitalization and grammar are often lacking in the documents in my dataset, I'm looking for out of…
Madison May
- 2,029
- 2
- 17
- 18
21
votes
6 answers
What does embedding mean in machine learning?
I just met a terminology called "embedding" in a paper regarding deep learning. The context is "multi-modal embedding"
My guess: embedding of something is extract some feature of sth,to form a vector.
I couldn't get the explicit meaning for this…
cloudscomputes
- 393
- 1
- 2
- 10
21
votes
1 answer
What is "posterior collapse" phenomenon?
I was going through this paper on Towards Text Generation with Adversarially Learned
Neural Outlines and it states why the VAEs are hard to train for text generation due to this problem. The paper states
the model ends up
relying solely on the…
thanatoz
- 2,405
- 4
- 16
- 39
21
votes
2 answers
What is the job of "RepeatVector" and "TimeDistributed"?
I read about them in Keras documentation and other websites, but I couldn't exactly understand what exactly they do and how should we use them in designing many-to-many or encoder-decoder LSTM networks?
I saw them used in the solution of this…
user3486308
- 1,270
- 5
- 18
- 28
21
votes
1 answer
Can BERT do the next-word-predict task?
As BERT is bidirectional (uses bi-directional transformer), is it possible to use it for the next-word-predict task? If yes, what needs to be tweaked?
惊天补扣
- 691
- 1
- 7
- 17