Questions tagged [python]

Python is a programming language commonly used for machine learning. Use this tag for any on-topic question that (a) involves Python either as a critical part of the question or expected answer, & (b) is not just about how to use Python.

Python (Wikipedia page) is a general purpose programming language designed for ease of use. It is a commonly used platform for machine learning. Two very popular threads concerned with using Python for statistics and machine learning are:

Be aware that Python-based questions are frequently migrated between Cross Validated (CV) and Stack Overflow (SO). CV fields questions with statistical / machine learning content, and SO fields questions of programming and implementation. Python questions can be on topic here when they are centrally about statistics / ML while involving Python either as a critical part of the question or expected answer. However, questions that are just about how to use Python / why it works a certain way, etc., are off topic here. Many such questions can be on topic on SO if they have a reproducible example.

We maintain a list of Python resources available on the internet in our Internet Support for Statistics Software meta.CV thread.

There is an extensive wiki for Python on SO here.

4791 questions
56
votes
2 answers

How to interpret p-value of Kolmogorov-Smirnov test (python)?

I have Two samples that I want to test (using python) if they are drawn from the same distribution. To do that I use the statistical function ks_2samp from scipy.stats. It returns 2 values and I find difficulties how to interpret them. Help please!
meri
  • 561
4
votes
1 answer

Python library for returning MLE for Beta Geometric and Beta Discrete Weibull models

There is an R package called foretell that is useful for projecting customer retention based on Beta Geometric and Beta Discrete Weibull models. I am having trouble finding something similar for python, at least one as streamlined. Does anyone know…
Kbbm
  • 143
3
votes
1 answer

Finding weight/value of each person on a team

If I have a team, with between $n_1$ and $n_2$ people per team, with results of team's head to head matchups, how would I be able to estimate each person's value? Example data (I drew this up quickly, the actual one is many lines longer, with more…
2
votes
1 answer

PyTorch Ignore padding for LSTM batch training

I realize there is packed_padded_sequence and so on for batch training LSTMs, but that takes an entire sequence and embeds it then forwards it through the LSTM. My LSTM is built so that it just takes an input character then forward just outputs the…
2
votes
1 answer

Cross Validation for regularized portfolio optimization

Hi I'm having an explanation like below. I'm trying to find the minimum global portfolio and I found following explanation I need to use validation methods to use the optimal parameters. Also i need to use the regularizers. I'm ok with adding the…
Hiru
  • 155
2
votes
0 answers

Medical Imaging in Python (PyRadiomics): Concrete steps

I am new to medical imaging, but I am trying really hard to replicate some former analyses within this topic out of interest (e.g., https://www.ncbi.nlm.nih.gov/pubmed/26337765). My questions are: Are there any online resources that provide…
Kim
  • 21
1
vote
1 answer

Figuring out a good fit for this data

I am trying to find an appropriate mathematical model/equation for this data. Physically, it is essentially linear correlations of rainfall error (y-axis) with distance (x-axis). So for very short distances, the errors are highly correlated, but…
1
vote
1 answer

How to calculate the feature importance for multi-label classification problems

I am looking for some sources about "how to calculate the feature importance for multi-label classification problems". would you give me some information with related python source code on how to apply feature importance in multi-label datasets?
1
vote
1 answer

How do I test how well my fit line predicts results?

I am a data science intern and I have been tasked with testing the time scalability of the schedule builder. Basically I have collected data and made a bunch of fit lines using the lmfit module. Now I need to run the schedule builder and see how…
Evan Walker
1
vote
0 answers

How do I monitor the performance of ML models if the ground truth is delayed for 9 months?

We have deployed a machine learning model to production around 1 year ago. I would like to somehow estimate performance of my model (binary classifier), but we only get ground truth about 9 months after the prediction is made. I've noticed that the…
1
vote
0 answers

can you have observations without choice in pylogit?

I have data in long format where lets say user 1 has 10 alternatives but did not chose any alternative so CHOICE is all 0. The problem i get is that when i include those users all model parameters are set 0. I do not understand why it is happening…
1
vote
0 answers

Getting LinAlgError SVD did not converge err in sm.OLS.fit() in the first run only

Getting LinAlgError: SVD did not converge err in sm.OLS.fit() in the first run only. In the second run, the same code runs without any change in data and code. Already tried out StakeOverFlow solutions - Most likely there are nans in the data, you…
1
vote
0 answers

Avoid Retraining a Model When Executing a Program?

I've started using OpenCV for some image processing projects and I'm wondering if there's a way to save time when it comes to processing test images against a database of faces. Issue: 10 pictures of each subject A, B, and C exist in folders on the…
ev3670
  • 11
1
vote
2 answers

How can I improve sentiment analysis of user comments?

I'm implementing sentiment analysis on the set of user comments. All comments are on the same object. At the moment I decided to have three classes - negative, neutral and positive. I got test array of 1500 comments with marked classes. Tried to use…
egens
1
vote
0 answers

How does bootstrapping work?

So I'm trying to understand bootstrapping, I watched the following video: https://www.youtube.com/watch?v=gcPIyeqymOU&t=338s And starting from 2:53 the speaker explains that through bootstrapping we can get a closer inference on the population mean…
bugsyb
  • 561
  • 1
  • 9
  • 14
1
2