Inequality proof
The setting in Silverman's book is the following: Let be a string of n alphabetic characters and let be the frequency with which letter i appears in the string .
Also, define .
Our setting will be a slightly more general. Instead of 25 letters, we assume there are letters. We would like to prove that (for )
The Left Hand side of the inequality corresponds to an arbitary distribution while the Right Hand side corresponds to the unifrom distribuation in which each letter occurs with an equal probability (completely random text).
Roughly Speaking
Let's see what happens in an extreme case. If and then what we want to show is
Since we may repalce with , so, the last inequality becomes .
Proof
Two letters (Prerequisite: Caculus I)
If the minimu value of the function , where , occurs at then we are done.
From the relation we can consider as a function in one variable, i.e. It is easier to deal with functions of one variable instead of two.
From Calculus I
we know attains the minimum value at the boundary points or on the critical points.
Finding the critical points
. Thus,there is, only, one critical poitn
Computing the global minimum/maximum
0 | n | |||
---|---|---|---|---|
As we can see the least value in the above table is which corresponds to , as required!
Three letters or more (Calculus III)
Let subjects to
We are interested in finding points that maximize/minimize . As in the case Two letters
, if we show that the minimum occurs at then we are done.
Recall: The method of Lagrange multipliers find the maximum/minimum when the gradient of a specific function equals to zero. Below a break down of the of the method
which can be implemented in SageMath
Compute where is an extra variable.
Since s are linearly independent then for each
Solve these equations to find the critical points
Substitute each critical point in i.e. compute then record it in some set, say . The maximum or minimum value of subjects to the constraint must be in
1: For each variable we have and
2:
3: Hence, for all . . If we find the value of then we found all s value, since they are equal to each other. Using the last result,
4: . We know that is an extreme value, but we don't know whether it is a minimum value or not. Let us plug in with any other point. For instance, then it is clear that since it is an extreme point then it must be the minimum of under the constraint
We found that the minimum value occurs at Q.E.D