Here n is the sample size; the number of observations or number of data points you are working with. k is the number of parameters which your model estimates, and θ is the set of all parameters.
L(θ̂) represents the likelihood of the model tested, given your data, when evaluated at maximum likelihood values of θ. You could call this the likelihood of the model given everything aligned to their most favorable.
Comparing models with the Bayesian information criterion simply involves calculating the BIC for each model. The model with the lowest BIC is considered the best, and can be written BIC* (or SIC* if you use that name and abbreviation).
We can also calculate the Δ BIC; the difference between a particular model and the ‘best’ model with the lowest BIC, and use it as an argument against the other model. Δ BIC is just BICmodel – BIC*, where BIC* is the best model.
If Δ BIC is less than 2, it is considered ‘barely worth mentioning’ as an argument either for the best theory or against the alternate one. The edge it gives our best model is too small to be significant. But if Δ BIC is between 2 and 6, one can say the evidence against the other model is positive; i.e. we have a good argument in favor of our ‘best model’. If it’s between 6 and 10, the evidence for the best model and against the weaker model is strong. A Δ BIC of greater than ten means the evidence favoring our best model vs the alternate is very strong indeed.
Suppose you have a set of data with 50 observation points, and Model 1 estimates 3 parameters. Model 2 estimates 4 parameters. Let’s say the log of your maximum likelihood for model 1 is a; and for model 2 it is 2a. Using the formula k log(n)- 2log(L(θ)):
Since the evidence that the Bayesian Information Criterion gives us for model 1 will only be ‘worth mentioning’ if 1.7 – 2a > 2, we can only claim conclusive results if -2a > 0.3; that is to say, a < -0.15.
Fabozzi, Focardi, Rachev & Arshanapalli. The Basics of Financial Econometrics: Tools, Concepts, and Asset Management Applications. Appendix E: Model Selection Criterion: AIC and BIC. Retrieved from http://onlinelibrary.wiley.com/store/10.1002/9781118856406.app5/asset/app5.pdf;jsessionid=A6726BA5AE1AD2A5AF007FFF78528249.f03t01?v=1&t=je8jr983&s=09eca6efc0573a238457d475d3ac909ec816a699 on March 1, 2018
Alpha levels and beta levels are related: An alpha level is the probability of a type I error, or rejecting the null hypothesis when it is true. A beta level, usually just called beta(β), is the opposite; the probability of of accepting the null hypothesis when it’s false. You can also think of beta as the incorrect conclusion that there is no statistical significance (if there was, you would have rejected the null).
Beta is directly related to the power of a test. Power relates to how likely a test is to distinguish an actual effect from one you could expect to happen by chance alone. Beta plus the power of a test is always equal to 1. Usually, researchers will refer to the power of a test (e.g. a power of .8), leaving the beta level (.2 in this case) as implied.
In theory, the lower beta, the better. You could simply increase the power of a test to lower the beta level. However, there’s an important trade-off. Alpha and beta levels are connected: you can’t lower one without raising the level of the other. For example, a Bonferroni correction reduces the alpha level (i.e. the probability of making a type II error) but inflates the beta level (the probability of making a type II error). False positives are minimized, but with the payoff that the possibility of false negatives are increased.
A bilinear function (or bilinear form) is a function that’s bilinear for all arguments, which can be scalar or vector (Vinberg, 2003; Haddon, 2000). In other words, it is a linear function of x for every fixed y-value and a linear function of y for every x-value (Shilov & Silverman, 1963).
An inner product on real-numbered vector space V; This bilinear form is positive definite and symmetric (its variables are unchanged under any permutation; In other words, if you switch out two of the variables, you end up with the same function),
A symmetric bilinear function is where f(u, v) = f(v, u) for all u and v. Multilinear functions are a generalization of bilinear functions; generally speaking, differential forms are alternating multilinear functions (Harvard, 2017).
The binomial test is used when an experiment has two possible outcomes (i.e. success/failure) and you have an idea about what the probability of success is. A binomial test is run to see if observed test results differ from what was expected.
Example: you theorize that 75% of physics students are male. You survey a random sample of 12 physics students and find that 7 are male. Do your results significantly differ from the expected results?
Solution: Use the binomial formula to find the probability of getting your results. The null hypothesis for this test is that your results do not differ significantly from what is expected.
Out of the two possible events, you want to solve for the event that gave you the least expected result. You expected 9 males (i.e. 75% of 12), but got 7, so for this example solve for 7 or fewer students.
0.158, which is the probability of 7 or fewer males out of 12. Doubling this (for a two tailed test), gives 0.315. These are your p-values. With very few exceptions, you’ll always use the doubled value.
Statistics MagicAs the p-value of 0.315 is large (I’m assuming a 5% alpha level here, which would mean p-values of less than 5% would be significant), you cannot reject the null hypothesis that the results are expected. In other words, 7 is not outside of the range of what you would expect.
If, on the other hand, you had run the test with 4 males (p=.333 and q=.666), the doubled p-value would have been .006, which means you would have rejected the null.
A block plot helps you figure out what the most important factors in your experiment are, including interactions. A basic block plot will show if a factor is important. It will also show if that factor stays the same (i.e. if it is robust) for all settings of other factors.
Block plots can also assess statistical significance. Statistical significance is usually tested with ANOVA. However, ANOVA is based on the assumption of normality. Block plots don’t have this assumption, so they can be extremely useful for non-normal data.
The vertical axis represents the response variable. The taller the bar, the more impact on the response variable, so the more important the factor. Where the blocks are relative to each other is not important. In the following box plots, all of the block heights in the factor 1 plot are taller than all of the block heights in the factor 2 plot. Therefore, primary factor 1 is more important than primary factor 2.
To assess how statistically significant each factor is, look at where each level falls within the bars. The characters inside each bar (these may be symbols, numbers or some other notation) represent the levels.
In the following plot for Factor 1, the response for level 2 is higher than the response for level 1 in each of the bars. The level ordering in plot 2 is inconsistent (sometimes 1 is above 2 and vice-versa), so factor 2 is not statistically significant.
To figure out if factor 1 is statistically significant, you first have to calculate the probability of that particular level ordering to happen. If you’re rusty on how to find probabilities, start here: What is the Probability of A and B.
There are only two ways the levels can be ordered (1 then 2 or 2 then 1). So the probability of one block being ordered 1 then 2 is ½. The probability of all six blocks showing 1 and then 2 is: ½ * ½ * ½ * ½ * ½ * ½ = 1/(26) = 1 / 64 = 0.02.
Finally, compare your probability to your chosen significance level. If the probability you calculate is less than the significance level, then that factor is significant. At a 5% significance level, this block ordering (and therefore Factor 1) is statistically significant.
To assess interactions, look at whether the heights of the bars are changing in a systematic fashion. While the block plot for Factor 1 appears to be random, the blocks for Factor 2 seem to be decreasing steadily (up to a point), so this may warrant further attention.
Simply put, Bolzano’s theorem (sometimes called the intermediate zero theorem) states that continuous functions have zeros if their extreme values are opposite signs (- + or + -). For example, every odd-degree polynomial has a zero.
If a function f on the closed interval [a, b] ⊂ ℝ → ℝ is a continuous function and it holds that f(a) f(b) < 0, then there is at least one x ∈ (a, b) such that f(x) = 0
Given a function, you can use the theorem to prove that the function has at least one root. The theorem states nothing about what the value for the function’s zero will be: it merely states that the zero exists.
Here, you’ve been given a function (x3 + x – 1) set to zero. So if the function has at least one solution, then that solution is a root (i.e. a zero). In order to apply Bolzano’s theorem, you need to find out two things:
Step 2: Locate the endpoints and see if they have opposite signs. Here, you’re given the function and the endpoints [0, 1], so plug the endpoints into the function and see what values come out: