Most Popular

1500 questions
10
votes
2 answers

Machine Learning Steps

Which of the below set of steps options is the correct one when creating a predictive model? Option 1: First eliminate the most obviously bad predictors, and preprocess the remaining if needed, then train various models with cross-validation, pick…
A K
  • 103
  • 4
10
votes
4 answers

What initial steps should I use to make sense of large data sets, and what tools should I use?

Caveat: I am a complete beginner when it comes to machine learning, but eager to learn. I have a large dataset and I'm trying to find pattern in it. There may / may not be correlation across the data, either with known variables, or variables that…
user3791372
  • 398
  • 2
  • 14
9
votes
4 answers

How to combine PCA and MCA on mixed data?

Suppose I have mixed data and (python) code which is capable of doing PCA (principal component analysis) on continuous predictors and MCA (multiple correspondence analysis) on nominal predictors. Is it possible to combine results from PCA and MCA…
Wojciech J. Migda
  • 191
  • 1
  • 1
  • 3
9
votes
3 answers

What is the difference between one-hot and dummy encoding?

I am trying to understand The reason behind encoding (one-hot encoding and dummy encoding) How one-hot and dummy are different from each other
user121028
9
votes
1 answer

What tokenizer does OpenAI's GPT3 API use?

I'm building an application for the API, but I would like to be able to count the number of tokens my prompt will use, before I submit an API call. Currently I often submit prompts that yield a 'too-many-tokens' error. The closest I got to an answer…
Herman Autore
  • 93
  • 1
  • 3
9
votes
1 answer

what is the difference between "fully developed decision trees" and "shallow decision trees"?

As reading Ensemble methods on scikit-learn docs, it says that bagging methods work best with strong and complex models (e.g., fully developed decision trees), in contrast with boosting methods which usually work best with weak models (e.g.,…
Mithril
  • 383
  • 6
  • 15
9
votes
3 answers

Export weights (formula) from Random Forest Regressor in Scikit-Learn

I trained a prediction model with Scikit Learn in Python (Random Forest Regressor) and I want to extract somehow the weights of each feature to create an excel tool for manual prediction. The only thing that I found is the model.feature_importances_…
Tasos
  • 3,920
  • 4
  • 23
  • 54
9
votes
2 answers

Ethical consequences of non-deterministic learning processes?

Most advanced supervised learning techniques are non-deterministic by construction. The final output of the model usually depends on some random parts of the learning process. (Random weight initialization for Neural Networks or variable selection /…
Lucas Morin
  • 2,196
  • 5
  • 21
  • 42
9
votes
1 answer

Where does the name 'LSTM' come from?

Long short-term memory is a recurrent neural network architecture introduced in the paper Long short-term memory. Can you please tell me where the name comes from? ("Memory", as the network can store information because of the recurrence - but where…
Martin Thoma
  • 18,880
  • 35
  • 95
  • 169
9
votes
1 answer

Properties for building a Multilayer Perceptron Neural Network using Keras?

I am trying to build and train a multilayer perceptron neural network that correctly predicts what president won in what county for the first time. I have the following information for training data. Total population Median age % BachelorsDeg or…
pr338
  • 385
  • 2
  • 7
9
votes
1 answer

How to customise cost function in Scikit learn's model?

For example, when I have a problem that false negative should be penalised more, how can I incorporate that requirement in the algorithm such as SVM?
Ghostintheshell
  • 431
  • 1
  • 4
  • 7
9
votes
1 answer

What is the difference between affinity matrix eigenvectors and graph Laplacian eigenvectors in the context of spectral clustering?

In spectral clustering, it's standard practice to solve the eigenvector problem $$L v = \lambda v$$ where $L$ is the graph Laplacian, $v$ is the eigenvector related to eigenvalue $\lambda$. My question: why bother taking the graph Laplacian?…
felipeduque
  • 201
  • 1
  • 2
  • 5
9
votes
7 answers

Python library that can compute the confusion matrix for multi-label classification

I'm looking for a Python library that can compute the confusion matrix for multi-label classification. FYI: scikit-learn doesn't support multi-label for confusion matrix) What is the difference between Multiclass and Multilabel Problem
Franck Dernoncourt
  • 5,690
  • 10
  • 40
  • 76
9
votes
6 answers

Which cross-validation type best suits to binary classification problem

Data set looks like: 25000 observations up to 15 predictors of different types: numeric, multi-class categorical, binary target variable is binary Which cross validation method is typical for this type of problems? By default I'm using K-Fold. How…
IgorS
  • 5,474
  • 11
  • 31
  • 43
9
votes
1 answer

What are the inputs to the first decoder layer in a Transformer model during the training phase?

I am trying to wrap my head around how the Transformer architecture works. I think I have a decent top-level understanding of the encoder part, sort of how the Key, Query, and Value tensors work in the MultiHead attention layers. What I am…
djvaroli
  • 193
  • 1
  • 6