0

I apologize in advance for the possibly philosophical nature of the question, however I would like to get answers from this website rather than from Philosophy exchange at the moment. I also come from a Pure mathematics background, so I might have some misconceptions about how the field of Statistics works. The scientific method and the axiomatic method of mathematics try to address the questions of distinguishing science from pseudoscience, and mathematics from "pseudo-mathematics" respectively. I wish to know an answer for the analogous question of distinguishing Statistics from "Pseudo Statistics". I will present below a potential answer to the previous questions, and I would like to hear from statisticians on this website whether I have the right answer or not.

My idea of the scientific method is basically the flowchart below:

enter image description here

For example, a scientist may be living on two dimensional surface where she always observes that the sum of angles of any triangle add up to 180 degrees. From this observation, our scientist might make the hypothesis that the surface has a euclidean geometry. This hypothesis will make a prediction that for example Pythagoras theorem is true in our surface. The scientist will test the validity of this prediction by making more experiments . If the prediction turns out to be false the hypothesis is thrown away. If the prediction turns out to agree with further experiments, then the scientist's faith in her model increases a bit and she uses the hypothesis to make further prediction and tests them in an ongoing process.

I try to modify the above scenario for statistics.

  1. Step 1: We collect some observations about a real world phenomenon. The phenomenon can be flipping a coin
  2. Step 2: The hypothesis could be that this real world phenomenon of flipping a coin $n$ times can be modelled as Bernouli process $B(n,\frac{1}{2})$. I think of hypothesis making as the process of matching mathematical structures to real world phenomena. In the above scenario, we were concerned with matching the real world phenomenon of the surface we live on to some mathematical Geometric structure, and here we are concerned with matching the real world phenomenon of flipping a coin to some mathematical probabilistic structure.
  3. Next step is to make a prediction. We agree by convention on some significance level before the start of the investigation, which could be say $99$%. If some event $E$ in the probabilistic model of our hypothesis has $P(E)\geq 0.99$, then it counts as a prediction. In our case for example, Chebyshev's inequality will give that $$P(\text{proportion of heads after $10,000$ trials will lie in the interval $]0.45,0.55[$})\geq 0.99$$. Thus, we have our prediction that if we flip the coin $10,000$ times then proportion of heads will be between 0.45,0.55
  4. Step 4: We test our prediction experimentally. We flip the coin actually 10,000 times. If the prediction agrees with empirical results then our faith in our hypothesis increases. If prediction does not agree with empirical results then hypothesis is thrown away and further investigation should take place. Maybe the coin is biased and our probabilistic model should have been $B(n,p)$, maybe the coin tosses affects the later tosses if for example the coin temperature changes or its shape changes thus changing its mechanics, hence we the assumption of independence of coin tosses is inappropriate and does not yield good predictions...etc.

Question : Do I have the correct idea about how statistics works ? If yes, on what basis do we choose the significance level ?

Amr
  • 227
  • 2
  • 8
  • 1
    I think what you are referring to is covered in https://en.wikipedia.org/wiki/Foundations_of_statistics, see bayesian inference, frequentist inference (and competing hybrid fisher / neyman schools) – seanv507 Feb 17 '23 at 13:26
  • 2
    While my answer is a bit critical about the use of a general 'the statistical method', this might be just a semantical discussion and misunderstanding of what you mean. Possibly you mean 'statistics' in some more narrow sense of a particular method or field within statistics. The concept of reverse probability, and methods to deal with it, might be related (What exactly does the term "inverse probability" mean?). – Sextus Empiricus Feb 17 '23 at 14:13

1 Answers1

11

What is the Statistical Method?

There is no the statistical method. Your title question is asking for something that doesn't exist.

I try to modify the above scenario for statistics.

Your scenario looks much like a Popperian view of the scientific method. It is not a general view of the scientific method (in the recent century several philosophers have been disagreeing with Popper) and it is certainly not statistics. You can involve statistics in it like in your modified scenario but it is not equivalent.


The term statistics originates from the word 'state' and it used to refer to 'data of the state' and evolved into 'the collection of facts'.

Zu Statist ‘Staatsmann’ wird ... bald (Anfang 18. Jh.) in der Form in der Form Statistik f. für ‘Beschreibung eines Staates, eines Landes’ und (2. Hälfte 18. Jh.) die ‘Wissenschaft von der Erfassung, Erforschung und Beschreibung von Massenerscheinungen in Natur und Gesellschaft’ ... bezeichnet

(Quote from the online Etymologisches Wörterbuch des Deutschen)

The terms statistics, relating originally just to data of the state, now relates more generally to the methods that revolve about the sampling, observation, reporting and analyzing the quantitative data in any field.

Statistics is a broad field and can involve mathematical topics such as computing the probabilities of specific types of models of the sampling process and models of observation, but also topics like computer science (making computations, working with databases), social science (developing interview techniques and sampling methods), and visualization (what sort graphs are best suited to convey data).


The scenario that you describe, where the statistician computes a p-value and determines a significance level, seems to relate to a narrow part of statistics which is called 'hypothesis testing'. It is not so much the method of statistics as statistics involves more than just that. And in addition, it is also not belonging solely to statistics. Hypothesis testing is also a part of the philosophy of science. Statisticians relate more to designing the models for performing the computations in hypothesis testing, but it are scientists in a particular field that make a decision to use a hypothesis test. It is not the statistician that says that hypothesis testing is the way to go.

on what basis do we choose the significance level ?

These levels are chosen based on experience and to balance the probabilities between false positive and false negative cases. For instance, in particle physics they now use a $5\sigma$ confidence level and it evolved into that because they perform so many experiments that they would observe too many false positive discoveries if they did not use a strict level (see Origin of "5$\sigma$" threshold for accepting evidence in particle physics?).

  • We each have to write on what we find, but in my experience (1) most scientists and social scientists aren't following fads and fashions in philosophy of science (I am an exception, but I regard with mixed feelings the habits of philosophers pronouncing on fields which they don't practise) (2) I find that most natural scientists at least are more likely to find Popper-type views as encapsulating scientific method and haven't heard of, or don't much prefer, anything published since. – Nick Cox Feb 17 '23 at 12:36
  • Thanks for your response, I am waiting to see if others will also agree with your point of view that there is no statistical method. Meanwhile, I hope by that by diving deeper into the field I will appreciate more what statistics is about – Amr Feb 17 '23 at 12:45
  • 1
    @NickCox Thomas Kuhn's The structure of scientific theories speaks about a much more diffuse and less linear proces. When we speak about the significance levels this becomes relevant. Those levels may not be applied consistently or systematically, as shown by all sorts of (intentional) misuse of p-values. The diagram displays science as objective, but there is also a great deal of subjectivity involved. – Sextus Empiricus Feb 17 '23 at 13:53
  • 1
    @amr what is the mathematical method and what is pseudo-mathematics? – Sextus Empiricus Feb 17 '23 at 13:54
  • 1
    Indeed, The Structure of Scientific Revolutions offers an alternative to Popper, but at a macrolevel of major theories, not the microlevel of looking at particular datasets, etc. – Nick Cox Feb 17 '23 at 14:00
  • @NickCox The point is that there is no the scientific method. But yes, most of the methods are iterative experimentations to verify and test theories with inbetween adjustment to those theories. However, I would disagree that it is such a clear and simple loop. If an experiment fails then we can add a few extra loops where an experimenter tries to make it work before adapting the hypothesis. An experimenter that believes in a falsified theory will retry the experiment to figure out what went wrong. An experimenter that disbelieves a non-falsified theory will retry the experiment to falsify. – Sextus Empiricus Feb 17 '23 at 14:04
  • 3
    I think we largely agree despite any indications otherwise. In my view there is scientific method which (e.g.) I try to apply but Donald J. Trump as far as I can tell does not. It implies e.g. respect for logic and evidence, without claiming that we (and certainly not I) have exclusive claim to any element. But there is limited value in trying to pin it down to a concise, precise, universal recipe, where limited does not mean zero. I dislike any emphasis on starting with observations. If a researcher starts anywhere it is with the state of knowledge in their field. – Nick Cox Feb 17 '23 at 14:12
  • That's right. Only time will tell whether fighting hurricanes with nukes or virus infections with bleach injections are correct theories/hypotheses. They are testable/falsifiable and therefore scientific. – Sextus Empiricus Feb 17 '23 at 21:26
  • 1
    @JaredSmith I am referring to said person's use of logic and evidence, which I consider a germane example. His political views are a different if related concern. – Nick Cox Feb 17 '23 at 21:34
  • @NickCox the entire* point* of that article I linked is that people do not react to political statements with that sort of dispassionate examination of dry statements. The fact that it isn't radioactive to you doesn't make it fit for general consumption on the public internet and even if you didn't personally mean it that way it contributes to a destructive culture of performative virtue signaling. Stop trying to rules-lawyer your words for a minute, put on your cognitive empathy hat, and think about how someone on the political right would perceive such a statement. – Jared Smith Feb 17 '23 at 21:40
  • Sorry, but I don't find your tone or your substance convincing. Indeed, you're getting a little personal and aggressive, I have to suggest. Again, any inferences about my own political views are up to readers. I am content to let the community decide. If there are more votes or more support for your comments than for mine, I will delete the comment in question and revise it. Otherwise at the moment you're a lone voice. – Nick Cox Feb 17 '23 at 21:59
  • 1
    @JaredSmith 1) It is not so much a political statement. It is a statement about a public figure. That you associate it with a political statement about the political right is a personal viewpoint interpretation. In light of the topic, using data to gain knowledge and applying 'the statistical method', we should flip the coin a few more times and then we see that nearly all politicians are bad at statistics, and according to this RSS study among British MP's, left wing politicians may be even worse in the coin flips. – Sextus Empiricus Feb 17 '23 at 23:33
  • 1
    2) Nick Cox's comment was a conversation with another person, and whatever political point was being made (if any political point was made), it was mostly directed to me as receiver, and I guess that the transmitter knows that the (intended) receiver has not so many problems with political statements, as the transmitter is ignorant to those things. You may be right that one should take care about how words can be received. But, I believe that eventhough the speech is open, one also has some freedom and should not need to take care with every potential bystander that might come across. – Sextus Empiricus Feb 17 '23 at 23:37
  • To expand slightly: my comment was meant to be entertaining as well as apposite. I am not trying to offend anyone and find on reflection that I wouldn't mind if someone wanted to counter with an example mentioning say Karl Marx and how (un)scientific he was. I am optimistic that the readership here recognize attempts at humour for what they are and tolerate references that aren't intended to be offensive. If someone wants to think my example poor or irrelevant that's their judgment. – Nick Cox Feb 17 '23 at 23:45
  • The terms mathematical method , pseudo mathematics are not precise terms but I thought my intention would be clear. To be precise, I used the term mathematical method to really refer to the axiomatic method and formal systems. This method solves the demarcation problem of how to judge the validity of piece of mathematical work in an algorithmic way. I was hoping for anything similar in the field of statistics – Amr Feb 18 '23 at 14:51
  • @Amr there are Kolmogorov's probability axioms that creates a foundation to probability theory. But statistics is not like purely applying axioms of probability theory. The methods include some rigid (seemingly objective) mathematical calculations/computions, but there are many components based on assumptions that leave a lot of room for subjectivity. (Darrell Huff's book 'how to lie with statistics' shows that there is not clearly the method of statistics. If you want, you could call all of statistics pseudo-mathematics.) – Sextus Empiricus Feb 18 '23 at 15:23
  • Sure. I have a very good idea about kolomogrovs axioms and probability theory as a field of pure mathematics. However mathematical probability theory does not answer the question of how to apply probability to the modelling of non deterministic data – Amr Feb 18 '23 at 15:37