# A nice Limit

From @infseriesbot, prove the identity: $\gamma_n = \displaystyle\int_1^\infty \frac{\{x\}}{x^2} (n-\log(x)) \log^{n-1} \, dx$.

We have $\{x\}=x-i \, \text{for} \, i \leq x \leq i+1$,

so $\text{rhs}=\displaystyle\sum _{i=1}^{\infty } \displaystyle\int_i^{i+1} \frac{(x-i) (n-\log (x)) \log ^{n-1}(x)}{x^2} \, dx,$

and since $\displaystyle\int \frac{(x-i) (n-x \log ) \left(x \log ^{n-1}\right)}{x^2} \, dx=\log ^n(x) \left(-\frac{i}{x}-\frac{\log (x)}{n+1}+1\right)$ $\displaystyle \int_i^{i+1} \frac{(x-i) (n-\log (x)) \log ^{n-1}(x)}{x^2} \, dx=\frac{(i+1) \log ^{n+1}(i)+(-(i+1) \log (i+1)+n+1) \log ^n(i+1)}{(i+1) (n+1)}$,

allora: $\text{rhs}=\underset{T\to \infty }{\text{lim}}\left(\frac{(n-(T+1) \log (T+1)+1) \log ^n(T+1)}{(n+1) (T+1)}+\sum _{i=2}^T \frac{\log ^n(i)}{i}\right)$,

and since $\frac{\log ^n(T+1) (n-(T+1) \log (T+1)+1)}{(n+1) (T+1)}\overset{T}{\to}\frac{\log ^{n+1}(T)}{n+1}$,

From the series representation of the Stieltjes Gamma function, $\gamma_n$: $\underset{T\to \infty }{\text{lim}}\left(\sum _{i=1}^T \frac{\log ^n i}{i}-\frac{\log ^{n+1}(T)}{n+1}\right)=\gamma _n$

# Estimating the Bernouilli Parameter, a non-Bayesian Max Entropy Method

### Introduction and Result

A maximum entropy alternative to Bayesian methods for the estimation of independent Bernouilli sums.

Let $X=\{x_1,x_2,\ldots, x_n\}$, where $x_i \in \{0,1\}$ be a vector representing an n sample of independent Bernouilli distributed random variables $\sim \mathcal{B}(p)$. We are interested in the estimation of the probability p.

We propose that the probablity that provides the best statistical overview, $p_m$ (by reflecting the maximum ignorance point) is $p_m= 1-I_{\frac{1}{2}}^{-1}(n-m, m+1)$, (1)

where $m=\sum_i^n x_i$ and $I_.(.,.)$ is the beta regularized function.

### Comparison to Alternative Methods

EMPIRICAL: The sample frequency corresponding to the “empirical” distribution $p_s=\mathbb{E}(\frac{1}{n} \sum_i^n x_i)$, which clearly does not provide information for small samples.

BAYESIAN: The standard Bayesian approach is to start with, for prior, the parametrized Beta Distribution $p \sim Beta(\alpha,\beta)$, which is not trivial: one is contrained by the fact that matching the mean and variance of the Beta distribution constrains the shape of the prior. Then it becomes convenient that the Beta, being a conjugate prior, updates into the same distribution with new parameters. Allora, with n samples and m realizations: $p_b \sim Beta(\alpha+m, \beta+n-m)$ (2)

with mean $p_b = \frac{\alpha +m}{\alpha +\beta +n}$. We will see below how a low variance beta has too much impact on the result.

### Derivations

Let $F_p(x)$ be the CDF of the binomial $\mathcal{B} in(n,p)$. We are interested in $\{ p: F_p(x)=q \}$ the maximum entropy probability. First let us figure out the target value q.

To get the maximum entropy probability, we need to maximize $H_q=-\left(\;q \; log(q) +(1-q)\; log (1-q)\right)$. This is a very standard result: taking the first derivative w.r. to q, $\log (q)+\log (1-q)=0, 0\leq q\leq 1$ and since $H_q$ is concave to q, we get $q =\frac{1}{2}$.

Now we must find p by inverting the CDF. Allora for the general case, $p= 1-I_{\frac{1}{2}}^{-1}(n-x,x+1)$.

And note that as in the graph below (thanks to comments below by überstatistician Andrew Gelman), we can have a “confidence band” (sort of) with $p_\alpha= 1-I_{\alpha}^{-1}(n-x,x+1)$ ;

in the graph below the band is for values of: $\alpha= \frac{1}{2}, \frac{3}{4}$.

### Application: What can we say about a specific doctor or center’s error rate based on n observations?

Case (Real World): A thoraxic surgeon who does mostly cardiac and lung transplants (in addition to emergency bypass and aortic ruptures) operates in a business with around 5% perioperative mortality. So far in his new position in the U.S. he has done 60 surgeries with 0 mortality.

What can we reasonable say, statistically, about his error probability?

Note that there may be selection bias in his unit, which is no problem for our analysis: the probability we get is conditional on being selected to be operated on by that specific doctor in that specific unit.

Assuming independence, we are concerned with $Y = 0, 1, \ldots,n$ a binomially distributed r.v. $\sim \mathcal{B}(n,p)$ where n is the number of trials and $p$ is the probability of failure per trial. Clearly, we have no idea what p and need to produce our best estimate conditional on, here, $y=0$.

Here applying (1) with $m=0$ and $n=60$, we have $p=0.01148$.

Why is this preferable to a Bayesian approach when, say, n is moderately large?

A Bayesian would start with a prior expectation of, say .05, and update based on information. But it is highly arbitrary. Since the mean is $\frac{\alpha}{\alpha +\beta}$, we can eliminate one parameter. Let us say we start with $Beta(\alpha, 19 \alpha )$ and have no idea of the variance. As we can see in the graph below there are a lot of shapes to the possible distribution: it becomes all in the parametrization.

### Acknowledgments

Thanks to Saar Wilf for useful discussions.