Today, Thomas Metz made me aware of a dataset about ministers in Eastern German federal states (Bundesländer) by Sebastian Jäckle. The dataset includes the variable “duration of incumbency” in days for 291 ministers between 1990 and 2011.
I was curious to look at the distribution of duration with the intention to be brave as a physicist and infer a simple stochastic model which reproduces that distribution. I copied the duration data into a matlab vector
duration, made histograms, fits for different distributions and KS-Tests. As
duration is a discrete random variable (days starting from inauguration), distributions living on the nonnegative integers are the natural candidates. The classical one-parameter distributions Poisson and geometric failed to deliver fitting distributions, but the negative binomial (NB) did surprisingly well.
The best fit yielded parameters and . The Kolmogorov-Smirnov test did not reject that duration data came from the distribution with these parameters (p=0.32), but rejects under reasonably small changes of the two parameters. Thus, it is reasonable to assume
What model does this imply? Looking at the days in the incumbency of a minister. Let us assume that every day can either be a success or failure which happens with probability . The negative binomial is the distribution of the number of successful days until failures occur (there is an extension to non-integer number of failures). Our model is thus, that a minister's incumbency ends after a certain number of failures (what ever that means in practice). The best fit suggests that under this model 1.79 failures are allowed during a minister's incumbency and that failures are relatively rare events happening with probabilty 0.11% every day, i.e. on average the first failure happens approximately at day 900.
Here ist the matlab code which delivers the results
% Kolmogorov-Smirnov test
You can test other two-parameter distributions by changing
dist, e.g. to
'gam'. If you want to check one-parameter distribution you have to further remove
,par(2) from the code. It turns out that also the gamma distribution delivers a fit which is not rejected. This is reasonable because it is sometimes seen as the continuous-valued version of the negative binomial. Also the Weibull distribution was not rejected (although with much lower p-value), this shows that also other models might be appropriate. As always with statistics and real world data, I assume that the KS-Test would reject my theory when we have a larger dataset, as certainly some deviations from the negative binomial trend get dominant (e.g. caused by election cycles).
I hope this "finger exercise" on finding a simple stochastic model that fits is inspiring for political scientists, although every political theory would likely immediately reject it.