visit
"Each pair of cards of this deck has the same suit."
Finally, we have to verify if the first intuition is statistically significant.
The probability to have two cards with the same suits is 1/4. As a consequence, the probability to have 3 consecutive pairs with the same suit is a bit less than 2%. This is a meaningful discovery:Every pair of cards of this deck has the same suit with a p-value of 0.02.
Except it is not. This deck has nothing particular, you only detected your own enthusiasm to find a result.The data scientist imagination paradox : The more imaginative you are, the less significant will be your observations.
Let's imagine there is a finite amount of patterns N on data. And let's suppose that all these patterns are all as likely to happen on our data by chance. In other words, there is a 1/N chance to observe this pattern on random data.
And now let's have two data scientists :Captain Obvious and Marvin look at our data and a pattern in the data after looking at K samples. Which one of them should we follow?
Let's compute the probability that they detected something particular in random noise (assuming all the patterns are independent).When we see a pattern in data, we underestimate all the other patterns that would have surprised us the same way.
1. Ask a question
2. Construct a hypothesis
3. Test with an experiment
4. Draw conclusions
3. Test with experiments
2. Construct a hypothesis
1. Ask a question that match our hypothesis and experiments
4. Draw conclusion
If you decide on your hypothesis after making your experiments, you are confirming your bias, not learning.
Let's go back on the deck of card. After making the hypothesis "Each pair of cards of this deck has the same suit." I need to draw 3 new pairs of cards as an experiment to confirm.