Intuitively we tend to think of different kinds of relationships when we consider quantitative and qualitative information.
We tend to think of relative information (like “smiling”) and about absolute information (like “farting”).
But what about qualitative information? How does one think about learning if the information is not relative to the learner’s expectations?
For example, is it the difference between two groups that can be measured?
Is it the similarity between two groups that is measured?
Is it the difference between groups that can be defined mathematically?
Is it the difference between the strengths of each individual between each group?
At first glance, it appears that none of these are good ways to measure a relationship.
As an extreme example, how could comparing the relative sizes of individual slices of a tomato ever be used to compare the relative strengths of either group?
But as we will see, one can certainly define a relationship quantitatively, even if it is just a linear relationship between two sets.
The same argument can be made for a more qualitative relationship.
That is a relationship that would not be measured quantitatively, and would not be defined mathematically.
What kind of relationships might we consider to be proportional? And what might they tell us about the law of large numbers?
Proportional relationships
Proportional relationships are based on a relationship that results in an outcome that is proportional to the size of the relationship.
For example, if the relationship in question is the size of two mountains, it is proportional to the length of a line that connects the two mountains.
Just as we would consider these types of relationships, so too might we consider proportional relationships in general.
And just as we would consider the data in a dataset to be relevant for the mathematics, so too might we consider the relationships within the dataset to be relevant for the law of large numbers.
For example, consider how much this one might help to explain the law of large numbers.
Consider again the example of three nested lines (this time with widths, not heights)
Each line has the same two endpoints, but the two endpoints in one line (or the two endpoints in the second line) have very different weights (I’m going to ignore the endpoints in the third line because they would not be considered “parts” of the same line.
Endpoints in this example would be considered parts of the weights of a set).
What is the “coefficient of variation” (also called “generalized volume”) of the lines? It is the ratio of the weights of the endpoints in each line to the weights of the endpoints in the next line.
It measures the size of the slope between the two endpoints.
That’s one “proportional relationship.”
Or consider this case, where there are two groups, and the numbers in each group are independently drawn from a normal distribution:
As we see, all the points on one side of this line have the same number of points (that is, there is no group with zero points).
In other words, the slope between two lines that are chosen at random from a normal distribution is the same (and proportional) as the slope between two lines that are drawn from two random normal distributions (which means the slopes in each case are the same).
This brings us to the next question. How would we describe this relationship?
Clearly, it is proportional.
In fact, it is so proportional that if you draw from a normal distribution and divide by 1, that number is even more proportional.
In other words, the numbers in the normal distribution are even more proportional.
So, how might we define the slope between two random normal distributions?
The answer is: It is the average of the slopes between the two lines.
That is, the slope between the two lines is the average of the slopes between the two random normal distributions.
In other words, the slopes in a random normal distribution are proportional to the slopes in a normal distribution.
Which makes sense.
A line that is drawn at random from a normal distribution will have the same slope as all other lines drawn from a normal distribution
Thus, the slope between a pair of lines (i.e., the slope between any two lines from a normal distribution) is proportional to the slope between any two lines from a normal distribution.
That’s a very close explanation of the law of large numbers.
It works for one parameter but can be generalized to many parameters.
As we build more and more data, the predictions become more and more accurate.
And as we build more and more data, we find the law of large numbers much more than we ever expected.
With that said, I hope it isn’t the case that, given this story, we should regard the law of large numbers as obvious.
Rather, I hope it is the case that you can appreciate how our brains tend to work, and how each bit of evidence can help to explain the events that took place long ago.