The context of this question is actually finance, however the question at heart is a statistical/mathematical one. I've removed all finance jargon and only left in the bare minimum details required. All ideas are welcomed.
Question:Suppose I have the following expression:
^2}+1 ......... (1))
where

^2 = \frac{1}{T} \sum_{t=1}^T (x_t - \bar{x})^2)
Assume

is some fixed constant. Note:

is just the total number of observations of

.
I have data on

for each 'entity' (here an entity just simply refers to a firm/company). In total, I have 2228 entities and for each entity I have

observations of

.
For each entity, I substitute the

observations of

into Eqn.
)
and obtain a value for

. Thus in total, I have 2228 values of

.
Now, a large value of

means the entity is "bad" and a small value of

means the entity is "good". However, the problem is how large does a value of

have to be in order to classify an entity as "bad"? That is, what is the threshold such that if

exceeds the threshold value, then we can classify the entity as "bad"?
For example, let's say the threshold is

, if entity A had a

while entity B had a

, then entity A is "good" while entity B is "bad".
My attempts so far:My first attempt was to try get data on an entity that is known to be "bad" and then calculate its

and use this value as the threshold. The problem is that I cannot obtain data on "bad" entities (due to proprietary data and private licensing issues...)
For my next attempt, I obtained the empirical distribution by applying a kernel density estimator (fancy term for just obtaining the histogram and then using an estimation technique to get an estimated probability density function of this histogram) on the 2228 values of

. Then I calculate the 99th percentile (for robustness, I also calculated the 97.5th and 95th percentile) of this pooled distribution and use this value as the threshold. However, the main critique is that this is too arbitrary and there is not enough rationale for using this method.
Main problem:So I am wondering if anyone has any ideas on how what statistical/mathematical techniques/methods I can apply to derive appropriate thresholds for

. Currently, I really have no idea on what tools are available for this problem. I have tried an abundant amount of techniques that I could think of: Extreme value theory (doesn't really work because it's used to derive the distributions of maximums), Bayesian sampling (doesn't really apply here since applying priors don't really help in solving the apparent problem), Asymptotic distribution (perhaps this has been the most "successful" to date but falls short because one can only derive an asymptotic distribution of

for ONE entity, however I am really after the distribution of

of the entire POOLED sample of entities).