Physical Review A

Covering atomic, molecular, and optical physics and quantum information.

  • Collections
  • Editorial Team

Hypothesis testing and entropies of quantum channels

Phys. rev. a 99 , 032317 – published 14 march 2019.

  • Citing Articles (16)
  • INTRODUCTION
  • HYPOTHESIS TESTING OF TWO CHANNELS
  • RELATIVE ENTROPIES AND ENTROPIES OF…
  • ENTANGLEMENT IN HYPOTHESIS TESTING
  • PROPERTIES OF RELATIVE ENTROPIES OF…
  • ACKNOWLEDGMENTS

Hypothesis testing is an important task in mathematics and physics. Hypothesis testing of two random variables is related to the Kullback-Leibler divergence of the two corresponding distributions. Similarly, quantum hypothesis testing of two quantum states is characterized by the quantum relative entropy. While a quantum state can be abstracted as a device that only has outputs but no inputs, the most general quantum device is a quantum channel that also has inputs. In this work, we extend hypothesis testing to general quantum channels. In both the one-shot and asymptotic scenario, we study several quantifiers for hypothesis testing under different assumptions of how the channels are used. As the quantifiers are analogs to the quantum relative entropy of states, we call them the quantum relative entropy of channels. Then, we define the entropy of channels based on relative entropies from the target channel to the completely depolarizing channel. We investigate the properties that the quantum relative entropy of channels should satisfy, and we study its interplay with entanglement. With the broad applications of the quantum relative entropy of states, our results can be useful for understanding general properties of quantum channels.

Figure

  • Received 16 September 2018

DOI: https://doi.org/10.1103/PhysRevA.99.032317

©2019 American Physical Society

Physics Subject Headings (PhySH)

  • Research Areas

Authors & Affiliations

  • Department of Materials, University of Oxford, Parks Road, Oxford OX1 3PH, United Kingdom
  • * [email protected]

Article Text (Subscription Required)

References (subscription required).

Vol. 99, Iss. 3 — March 2019

Access Options

  • Buy Article »
  • Log in with individual APS Journal Account »
  • Log in with a username/password provided by your institution »
  • Get access through a U.S. public or high school library »

hypothesis testing kl divergence

Authorization Required

Other options.

  • Buy Article »
  • Find an Institution with the Article »

Download & Share

Quantum states and channels. (a) A quantum state ρ can be regarded as a device that has null input and only outputs ρ out = ρ . (b) A quantum channel is a generalized device that inputs ρ in and outputs ρ out . When ρ in has dimension zero or ρ out is a classical state, a quantum channel can be regarded as a state preparation or a demolition measurement, respectively.

Hypothesis testing of two channels. (a) For each use of the quantum channel, no extra ancilla is allowed. (b) One party of the maximally entangled state Φ + is input to the channel. (c) One party of a joint state is input to the channel.

Sign up to receive regular email alerts from Physical Review A

  • Forgot your username/password?
  • Create an account

Article Lookup

Paste a citation or doi, enter a citation.

Robust Kullback-Leibler Divergence and Universal Hypothesis Testing for Continuous Distributions

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

ECE 830 Spring 2015 Statistical Signal Processing instructor: R. Nowak

Lecture 7: Hypothesis Testing and KL Divergence

1 Introducing the Kullback-Leibler Divergence

iid Suppose X1,X2,...,Xn ∼ q(x) and we have two models for q(x), p0(x) and p1(x). In past lectures we have seen that the likelihood ratio test (LRT) is optimal, assuming that q is p0 or p1. The error probabilities can be computed numerically in many cases. The error probabilities converge to 0 as the number of samples n grows, but numerical calculations do not always yield insight into rate of convergence. In this lecture we will see that the rate is exponential in n and parameterized the Kullback-Leibler (KL) divergence, which quantifies the differences between the distributions p0 and p1. Our analysis will also give insight into the performance of the LRT when q is neither p0 nor p1. This is important since in practice p0 and p1 may be imperfect models for reality, q in this context. The LRT acts as one would expect in such cases, it picks the model that is closest (in the sense of KL divergence) to q. To begin our discusion, recall the likelihood ratio is

n Y p1(xi) Λ = p (x ) i=1 0 i The log likelihood ratio, normalized by dividing by n, is then

n 1 X p1(xi) Λˆ = log n n p (x ) i=1 0 i

p1(xi) Note that Λˆ n is itself a random variable , and is in fact a sum of iid random variables Li = log which p0(xi) are independent because the xi are. In addition, we know from the strong law of large numbers that for large n,

ˆ a.s. hˆ i Λn → E Λn n h i 1 X Λˆ = [L ] E n E i i=1 = E [L1] Z p (x) = log 1 q(x)dx p0(x) Z p (x) q(x) = log 1 q(x)dx p0(x) q(x) Z  q(x) q(x)  = log − log q(x)dx p0(x) p1(x) Z q(x) Z q(x) = log q(x)dx − log q(x)dx p0(x) p1(x)

1 Lecture 7: Hypothesis Testing and KL Divergence 2

R q(x) The quantity log p(x) q(x)dx is known as the Kullback-Leibler Divergence of p from q, or the KL diver- gence for short. We use the notation

Z q(x) D(q||p) = q(x) log dx p(x) for continuous random variables, and

X qi D(q||p) = q log i p i i hˆ i for discrete random variables. The above expression for E Λn can then be written as hˆ i E Λn = D(q||p0) − D(q||p1)

H ˆ 1 Therefore, for large n, the log likelihood ratio test Λn ≷ λ is approximately performing the comparison H0

H1 D(q||p0) − D(q||p1) ≷ λ H0

since Λˆ n will be close to its mean when n is large. Recall that the minimum probability of error test (assuming equal prior probabilities for the two hypotheses) is obtained by setting λ = 0. In this case, we have the test

H1 D(q||p0) ≷ D(q||p1) H0 For this case, using the LRT is selecting the model that is “closer” to q in the sense of KL divergence.

Example 1 Suppose we have the hypotheses

iid 2 H0 : X1,...,Xn ∼ N (µ0, σ ) iid 2 H1 : X1,...,Xn ∼ N (µ1, σ ) Lecture 7: Hypothesis Testing and KL Divergence 3

Then we can calculate the KL divergence:

√ 1  1 2! p (x) 2 exp − 2 (x − µ1) log 1 = log 2πσ 2σ 1  1 2 p0(x) √ exp − (x − µ ) 2πσ2 2σ2 0 1 = − (x − µ )2 − (x − µ )2 2σ2 1 0 1 = − −2xµ + µ2 + 2xµ − µ2 2σ2 1 1 0 0 Z p1(x) D(p1||p0) = log p1(x) dx p0(x)   p1 = Ep1 log p0  1  = − −2xµ + µ2 + 2xµ − µ2 Ep1 2σ2 1 1 0 0 1 = − 2(µ − µ ) [x] + µ2 − µ2 2σ2 0 1 Ep1 1 0 1 = − −2mu2 + µ2 + 2µ µ − µ2 2σ2 1 1 1 0 0 1 = µ2 − 2µ µ + µ2 2σ2 0 0 1 1 (µ − µ )2 = 1 0 2σ2 So the KL divergence between two Gaussian distributions with different means and the same variance is just proportional to the squared distance between the two means. In this case, we can see by symmetry that D(p1||p0) = D(p0||p1), but in general this is not true.

2 A Key Property

The key property in question is that D(q||p) ≥ 0, with equality if and only if q = p. To prove this, we will need a result in probability known as Jensen’s Inequality:

Jensen’s Inequality: If a function f(x) is convex, then

E [f(x)] ≥ f(E [x]) A function is convex if ∀ λ ∈ [0, 1]

f (λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y)

The left hand side of this inequality is the function value at some point between x and y, and the right hand side is the value of a straight line connecting the points (x, f(x)) and (y, f(y)). In other words, for a convex function the function value between two points is always lower than the straight line between those points.

Now if we rearrange the KL divergence formula, Lecture 7: Hypothesis Testing and KL Divergence 4

Z q(x) D(q||p) = q(x) log dx p(x)  q(x)  = log Eq p(x)  p(x) = − log Eq q(x) we can use Jensen’s inequality, since − log z is a convex function.

 p(x) ≥ − log Eq q(x) Z p(x)  = − log q(x) dx q(x) Z  = − log p(x)dx

= − log(1) = 0

Therefore D(q||p) ≥ 0.

3 Bounding the Error Probabilities

The KL divergence also provides a means to bound the error probabilities for a hypothesis test. For this we will need the following tail boud for averages of independent subGaussian random variables.

−bt2/2 SubGaussian Tail Bound: If Z1,...,Zn are independent and P(|Zi − EZi| ≥ t) ≤ ae , ∀ i, then

! 1 X 2 Z − [Z] >  ≤ e−cn P n i E i and

! 1 X 2 [Z] − Z >  ≤ e−cn P E n i i b with c = 16a .

Proof: Follows immediately from Theorems 2 and 3 in http://nowak.ece.wisc.edu/ece901_concentration.pdf.

p1(xi) Now suppose that p0 and p1 have the same support and that the log likelihood ratio statistic Li := log p0(xi) −bt2/2 has a subGaussian distribution; i.e., P(|Li − ELi| ≥ t) ≤ ae . For example, if p0 and p1 are Gaussian distributions with a common variance, then Zi is a linear function of xi and thus is Gaussian (and hence ˆ 1 P subGaussian). Note that Λn = n i Li is an average of iid subGaussian random variables. This allows us to use the tail bound above. Lecture 7: Hypothesis Testing and KL Divergence 5

H ˆ 1 iid Consider the hypothesis test Λn ≷ 0. We will now assume that the data X1,...,Xn ∼ q, with q either H0 p0 or p1. We can write the probability of false positive error as

ˆ  PFP = P Λn > 0|H0 ˆ hˆ i hˆ i  = P Λn − E Λn|H0 > −E Λn|H0 | H0

hˆ i The quantity −E Λn|H0 will be the  in tail bound. We can re-express it as

Z hˆ i p1(x) Ep0 Λn|H0 = p0(x) log dx p0(x) Z p0(x) = − p0(x) log dx p1(x)

= −D(p0||p1)

Applying the tail bound, we get

ˆ  PFP = P Λn − (−D(p0||p1)) > D(p0||p1) | H0

2 ≤ e−cnD (p0||p1) .

Thus the probability of false positive error is bounded in terms of the KL divergence D(p0||p1). As n or D(p0||p1) increase, the error decreases exponentially. The bound for the probability of a false negative error can be found in a similar fashion:

ˆ  PFN = P Λn D(p1||p0) | H1

2 ≤ e−cnD (p1||p0) .

Web Analytics

Kullback–Leibler divergence for Bayesian nonparametric model checking

  • Research Article
  • Published: 04 June 2020
  • Volume 50 , pages 272–289, ( 2021 )

Cite this article

  • Luai Al-Labadi   ORCID: orcid.org/0000-0003-3182-9850 1 ,
  • Vishakh Patel 1 ,
  • Kasra Vakiloroayaei 1 &
  • Clement Wan 1  

370 Accesses

9 Citations

Explore all metrics

Bayesian nonparametric statistics is an area of considerable research interest. While recently there has been an extensive concentration in developing Bayesian nonparametric procedures for model checking, the use of the Dirichlet process, in its simplest form, along with the Kullback–Leibler divergence is still an open problem. This is mainly attributed to the discreteness property of the Dirichlet process and that the Kullback–Leibler divergence between any discrete distribution and any continuous distribution is infinity. The approach proposed in this paper, which is based on incorporating the Dirichlet process, the Kullback–Leibler divergence and the relative belief ratio, is considered the first concrete solution to this issue. Applying the approach is simple and does not require obtaining a closed form of the relative belief ratio. A Monte Carlo study and real data examples show that the developed approach exhibits excellent performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

hypothesis testing kl divergence

A general guide in Bayesian and robust Bayesian estimation using Dirichlet processes

Ali Karimnezhad & Mahmoud Zarepour

Robust Bayes estimation using the density power divergence

Abhik Ghosh & Ayanendranath Basu

hypothesis testing kl divergence

Imprecise Dirichlet Process With Application to the Hypothesis Test on the Probability That X ≤ Y

Alessio Benavoli, Francesca Mangili, … Marco Zaffalon

Abramowitz, M., & Stegun, I. A. (1972). Handbook of mathematical functions with formulas, graphs, and mathematical tables . New York: Dover.

MATH   Google Scholar  

Alizadeh Noughabi, H. (2010). A new estimator of entropy and its application in testing normality. Journal of Statistical Computation and Simulation , 80 , 1151–1162.

MathSciNet   MATH   Google Scholar  

Alizadeh Noughabi, H., & Arghami, N. R. (2010). A new estimator of entropy. Journal of the Iranian Statistical Society , 9 , 53–64.

Al-Labadi, L. (2018). The two-sample problem via relative belief ratio. https://arxiv.org/abs/1805.07238 .

Al-Labadi, L., & Abdelrazeq, I. (2017). On functional central limit theorems of Bayesian nonparametric priors. Statistical Methods and Applications , 26 , 215–229.

Al-Labadi, L., & Evans, M. (2017). Optimal robustness results for relative belief inferences and the relationship to prior-data conflict. Bayesian Analysis , 12 , 705–728.

Al-Labadi, L., & Evans, M. (2018). Prior-based model checking. Canadian Journal of Statistics , 46 , 380–398.

Al-Labadi, L., Patel, V., Vakiloroayaei, K., & Wan, C.(2018). A Bayesian nonparametric estimation to entropy. https://arxiv.org/abs/1903.00655 .

Al-Labadi, L., & Zarepour, M. (2013a). A Bayesian nonparametric goodness of fit test for right censored data based on approximate samples from the beta-Stacy process. Canadian Journal of Statistics , 41 , 466–487.

Al-Labadi, L., & Zarepour, M. (2013b). On asymptotic properties and almost sure approximation of the normalized inverse-Gaussian process. Bayesian Analysis , 8 , 553–568.

Al-Labadi, L., & Zarepour, M. (2014a). Goodness of fit tests based on the distance between the Dirichlet process and its base measure. Journal of Nonparametric Statistics , 26 , 341–357.

Al-Labadi, L., & Zarepour, M. (2014b). On simulations from the two-parameter Poisson–Dirichlet process and the normalized inverse-Gaussian process. Sankhyā A , 76 , 158–176.

Al-Labadi, L., & Zarepour, M. (2017). Two-sample Kolmogorov–Smirnov test using a Bayesian nonparametric approach. Mathematical Methods of Statistics , 26 , 212–225.

Al-Labadi, L., Zeynep, B., & Evans, M. (2017). Goodness of fit for the logistic regression model using relative belief. Journal of Statistical Distributions and Applications ,. https://doi.org/10.1186/s40488-017-0070-7 .

Article   MATH   Google Scholar  

Al-Labadi, L., Zeynep, B., & Evans, M. (2018). Statistical reasoning: Choosing and checking the ingredients, inferences based on a measure of statistical evidence with some applications. Entropy , 20 , 289. https://doi.org/10.3390/e20040289 .

Article   Google Scholar  

Al-Omari, A. I. (2014). Estimation of entropy using random sampling. Journal of Computation and Applied Mathematics , 261 , 95–102.

Al-Omari, A. I. (2016). A new measure of entropy of continuous random variable. Journal of Statistical Theory and Practice , 10 , 721–735.

Andrews, D. F., & Herzberg, A. M. (1985). Data–A collection of problems from many fields for the student and research worker . Berlin: Springer.

Baskurt, Z., & Evans, M. (2013). Hypothesis assessment and inequalities for Bayes factors and relative belief ratios. Bayesian Analysis , 8 , 569–590.

Berger, J. O., & Guglielmi, A. (2001). Bayesian testing of a parametric model versus nonparametric alternatives. Journal of the American Statistical Association , 96 , 174–184.

Bondesson, L. (1982). On simulation from infinitely divisible distributions. Advances in Applied Probability , 14 , 885–869.

Bouzebda, S., Elhattab, I., Keziou, A., & Lounis, T. (2013). New entropy estimator with an application to test of normality. Communications in Statistics - Theory and Methods , 42 , 2245–2270.

Carota, C., & Parmigiani, G. (1996). On Bayes factors for nonparametric alternatives. In J. M. Bernardo, J. Berger, A. P. Dawid, & A. F. M. Smith (Eds.), Bayesian Statistics 5 . London: Oxford University Press.

Google Scholar  

Correa, J. C. (1995). A new estimator of entropy. Communications in Statistics—Theory and Methods , 24 , 2439–2449.

Cover, T. M., & Thomas, J. A. (1991). Elements of information theory (2nd ed.). New York: Wiley.

Ebrahimi, N., Pflughoeft, K., & Soofi, E. (1994). Two measures of sample entropy. Statistics and Probability Letters , 20 , 225–234.

Evans, M. (2015). Measuring Statistical Evidence Using Relative Belief, Monographs on Statistics and Applied Probability (Vol. 144). Boca Raton: CRC Press, Taylor and Francis Group.

Evans, M., & Moshonov, H. (2006). Checking for prior-data conflict. Bayesian Analysis , 1 , 893–914.

Evans, M., & Swartz, T. (1994). Distribution theory and inference for polynomial-normal densities. Communications in Statistics—Theory and Methods , 23 , 1123–1148.

Evans, M., & Tomal, J. (2018). Measuring statistical evidence and multiple testing. FACET , 3 , 563–583.

Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics , 1 , 209–230.

Florens, J. P., Richard, J. F., & Rolin, J. M. (1996). Bayesian encompassing specification tests of a parametric model against a nonparametric alternative. Technical Report 9608, Universitsé Catholique de Louvain, Institut de statistique.

Grzegorzewski, P., & Wieczorkowski, R. (1999). Entropy-based goodness-of-fittest for exponentiality. Communications in Statistics—Theory and Methods , 28 , 1183–1202.

Holmes, C. C., Caron, F., Griffin, J. E., & Stephens, D. A. (2015). Two-sample Bayesian nonparametric hypothesis testing. Bayesian Analysis , 2 , 297–320.

Hsieh, P. (2011). A nonparametric assessment of model adequacy based on Kullback–Leibler divergence. Statistics and Computing , 23 , 149–162.

Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association , 96 , 161–173.

Ishwaran, H., & Zarepour, M. (2002). Exact and approximate sum representations for the Dirichlet process. Canadian Journal of Statistics , 30 , 269–283.

Jordan, M. I. (2011). What are the open problems in Bayesian statistics? ISBA Bulletin , 18 , 1–4.

McVinish, R., Rousseau, J., & Mengersen, K. (2009). Bayesian goodness of fit testing with mixtures of triangular distributions. Scandivavian Journal of Statistics , 36 , 337–354.

Noughabi, H. A., & Arghami, N. R. (2013). General treatment of goodness-of-fit tests based on Kullback–Leibler information. Journal of Statistical Computation and Simulation , 83 , 1556–1569.

Pérez-Rodríguez, P., Vaquera-Huerta, H., & Villaseñor-Alva, J. A. (2009). A goodness-of-fit test for the Gumbel distribution based on Kullback–Leibler information. Communications in Statistics: Theory and Methods , 38 , 842–855.

Rudin, W. (1974). Real and Complex Analysis (2nd ed.). New York: McGrawHill.

Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica , 4 , 639–650.

Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal , 27 (379–423), 623–656.

Swartz, T. B. (1999). Nonparametric goodness-of-fit. Communications in Statistics: Theory and Methods , 28 , 2821–2841.

van Es, B. (1992). Estimating functionals related to a density by a class of statistics based on spacings. Scandinavian Journal of Statistics , 19 , 61–72.

Vasicek, O. (1976). A test for normality based on sample entropy. Journal of Royal Statistical Society B , 38 , 54–59.

Verdinelli, I., & Wasserman, L. (1998). Bayesian goodness-of-fit testing using finite-dimensional exponential families. Annals of Statistics , 26 , 1215–1241.

Viele, K. (2007). Nonparametric estimation of Kullback–Leibler information illustrated by evaluating goodness of fit. Bayesian Analysis , 2 , 239–280.

Wieczorkowski, R., & Grzegorzewski, P. (1999). Entropy estimators-improvements and comparisons. Communications in Statistics—Simulation and Computation , 28 , 541–567.

Wolpert, R. L., & Ickstadt, K. (1998). Simulation of Lévy random fields. In D. Day, P. Muller, & D. Sinha (Eds.), Practical nonparametric and semiparametric Bayesian statistics (pp. 227–242). Berlin: Springer.

Zarepour, M., & Al-Labadi, L. (2012). On a rapid simulation of the Dirichlet process. Statistics and Probability Letters , 82 , 916–924.

Download references

Author information

Authors and affiliations.

Department of Mathematical and Computational Sciences, University of Toronto Mississauga, Mississauga, ON, L5L 1C6, Canada

Luai Al-Labadi, Vishakh Patel, Kasra Vakiloroayaei & Clement Wan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Luai Al-Labadi .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Al-Labadi, L., Patel, V., Vakiloroayaei, K. et al. Kullback–Leibler divergence for Bayesian nonparametric model checking. J. Korean Stat. Soc. 50 , 272–289 (2021). https://doi.org/10.1007/s42952-020-00072-7

Download citation

Received : 08 July 2019

Accepted : 01 May 2020

Published : 04 June 2020

Issue Date : March 2021

DOI : https://doi.org/10.1007/s42952-020-00072-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Bayesian Non-parametric
  • Dirichlet process
  • Kullback–Leibler divergence
  • Model checking
  • Relative belief ratio

Mathematics Subject Classification

  • Find a journal
  • Publish with us
  • Track your research

Help | Advanced Search

Computer Science > Information Theory

Title: robust kullback-leibler divergence and universal hypothesis testing for continuous distributions.

Abstract: Universal hypothesis testing refers to the problem of deciding whether samples come from a nominal distribution or an unknown distribution that is different from the nominal distribution. Hoeffding's test, whose test statistic is equivalent to the empirical Kullback-Leibler divergence (KLD), is known to be asymptotically optimal for distributions defined on finite alphabets. With continuous observations, however, the discontinuity of the KLD in the distribution functions results in significant complications for universal hypothesis testing. This paper introduces a robust version of the classical KLD, defined as the KLD from a distribution to the L'evy ball of a known distribution. This robust KLD is shown to be continuous in the underlying distribution function with respect to the weak convergence. The continuity property enables the development of a universal hypothesis test for continuous observations that is shown to be asymptotically optimal for continuous distributions in the same sense as that of the Hoeffding's test for discrete distributions.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

How to Calculate KL Divergence in R (With Example)

In statistics, the Kullback–Leibler (KL) divergence is a distance metric that quantifies the difference between two probability distributions.

If we have two probability distributions, P and Q, we typically write the KL divergence using the notation KL(P || Q), which means “P’s divergence from Q.”

We calculate it using the following formula:

KL(P || Q) = ΣP(x) ln (P(x) / Q(x))

If the KL divergence between two distributions is zero, then it indicates that the distributions are identical.

The easiest way to calculate the KL divergence between two probability distributions in R is to use the KL() function from the philentropy package.

The following example shows how to use this function in practice.

Example: Calculating KL Divergence in R

Suppose we have the following two probability distributions in R:

Note : It’s important that the probabilities for each distribution sum to one.

We can use the following code to calculate the KL divergence between the two distributions:

The KL divergence of distribution P from distribution Q is about 0.589 .

Note that the units used in this calculation are known as nats , which is short for natural unit of information .

Thus, we would say that the KL divergence is 0.589 nats .

Also note that the KL divergence is not a symmetric metric. This means that if we calculate the KL divergence of distribution Q from distribution P, we will likely get a different value:

The KL divergence of distribution Q from distribution P is about 0.497 nats .

Also note that some formulas use log base-2 to calculate the KL divergence. In this case, we refer to the divergence in terms of bits instead of nats.

To calculate the KL divergence in terms of bits, you can instead use log2 in the unit argument:

The KL divergence of distribution P from distribution Q is about 0.7178 bits .

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Generate a Normal Distribution in R How to Plot a Normal Distribution in R

How to Group by Day in Pandas DataFrame (With Example)

How to use the equivalent of runif() in python, related posts, how to create a stem-and-leaf plot in spss, how to create a correlation matrix in spss, how to convert date of birth to age..., excel: how to highlight entire row based on..., how to add target line to graph in..., excel: how to use if function with negative..., excel: how to use if function with text..., excel: how to use greater than or equal..., excel: how to use if function with multiple..., how to extract number from string in pandas.

IMAGES

  1. Lecture 7: Hypothesis Testing and KL Divergence 1 Introducing the

    hypothesis testing kl divergence

  2. Six Sigma Green Belt Tutorial

    hypothesis testing kl divergence

  3. (PDF) Hypothesis Testing, Information Divergence and Computational Geometry

    hypothesis testing kl divergence

  4. Understanding KL Divergence

    hypothesis testing kl divergence

  5. Geometry of the best error exponent in Bayesian classification [54]....

    hypothesis testing kl divergence

  6. KL Divergence

    hypothesis testing kl divergence

VIDEO

  1. PERMANENT INCOME HYPOTHESIS:MACROECONOMICS

  2. Numb Experience

  3. Value Proposition Design by Alexander Osterwalder: 7 Algorithmically Discovered Lessons

  4. Hypothesis testing: Step 4 in MegaStat Explained

  5. Two-Sample Hypothesis Testing

  6. Hypothesis Testing for Population Proportion and Mean Part 1

COMMENTS

  1. PDF Lecture 7: Hypothesis Testing and KL Divergence

    Lecture 7: Hypothesis Testing and KL Divergence 3 2 A Key Property The key property in question is that D(qjjp) 0, with equality if and only if q= p. To prove this, we will need a result in probability known as Jensen's Inequality: Jensen's Inequality: If a function f(x) is convex, then

  2. PDF Lecture 7: Hypothesis Testing and KL Divergence 1 Introducing the

    Lecture 7: Hypothesis Testing and KL Divergence 2 The quantity R log q(x) p(x) q(x)dxis known as the Kullback-Leibler Divergence of pfrom q, or the KL diver-gence for short. We use the notation D(qjjp) = Z q(x)log q(x) p(x) dx for continuous random variables, and D(qjjp) = X i q ilog q i p i for discrete random variables. The above expression ...

  3. Hypothesis testing and total variation distance vs. Kullback-Leibler

    Hypothesis testing and total variation distance vs. Kullback-Leibler divergence. Ask Question Asked 12 years, 5 months ago. Modified 10 years, 7 months ago. ... Thus, my second question is: is KL-divergence bound only applicable to one specific hypothesis testing method (it seems to come up around the log-likelihood ratio method a lot) or can ...

  4. PDF Lecture Notes 27 36-705

    Kullback-Leibler divergence: Again we suppose that Qdominates P. The KL diver-gence between two distributions: KL(P;Q) = Z log p(x) q(x) ... damental" in hypothesis testing, a natural question is why do we need all these di erent distances? The answer is a bit technical, but roughly, when we want to compute a lower bound (i.e. ...

  5. hypothesis testing

    From the motivation and background for the G-test, wiki page, it shows the connection between KL divergence and the Chisquared test and the G-test. There is a clear pvalue for the Chi squared test, in that it can be applied for hypothesis testing. Is there an equivalent for KL divergence (or even the G-test)?

  6. On Likelihood Functions to Minimize KL Divergence in Binary Hypothesis

    Kullback-Leibler (KL) divergence is widely used to determine lower bounds on detection error probability for binary hypothesis testing in covert communications

  7. Sub-Gaussian Error Bounds for Hypothesis Testing

    We might conveniently define the KL divergence difference in this case as (2), and still find the equivalence between (1) and (3). Using the KL divergence in the context of hypothesis testing can be beneficial. Firstly, it provides a clear geometric meaning to the likelihood ratio test, as well as to the general idea underlying hypothesis ...

  8. Robust Kullback-leibler Divergence and Its Applications in Universal

    The Kullback-Leibler (KL) divergence is one of the most fundamental metrics in information theory and statistics and provides ariousv operational interpretations in the context of mathematical communication theory and statistical hypothesis testing. The KL divergence for discrete distributions has the desired continuity property which

  9. New bounds for the empirical robust Kullback-Leibler divergence problem

    Abstract. This paper deals with the bounds of the empirical robust Kullback-Leibler (KL) divergence problem that is proposed in the literature to be used for universal hypothesis testing (UHT). The original problem formulation relies on the bounds derived from the Lévy ball. New bounds are proposed, and they are shown to be more tight.

  10. Kullback-Leibler divergence

    In mathematical statistics, the Kullback-Leibler (KL) divergence (also called relative entropy and I-divergence), denoted (), is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when ...

  11. Interpreting Kullback-Leibler divergence with the ...

    We now have another interpretation of D: the KL divergence from P A to Q measures how much power we lose with the likelihood ratio test if we misspecify the alternative hypothesis P A as Q. The non-negativity of D in (12) is essentially a restatement of the Neyman-Pearson lemma.

  12. Hypothesis testing and entropies of quantum channels

    Hypothesis testing is an important task in mathematics and physics. Hypothesis testing of two random variables is related to the Kullback-Leibler divergence of the two corresponding distributions. Similarly, quantum hypothesis testing of two quantum states is characterized by the quantum relative entropy. While a quantum state can be abstracted as a device that only has outputs but no inputs ...

  13. PDF 1 Second-Order Asymptotics of Hoeffding-Like Hypothesis Tests

    considered class includes the KL divergence, we obtain the second-order term of the Hoeffiding test as a special case. I. INTRODUCTION Statistical hypothesis testing is known to have applications in areas such as information theory, signal processing, and machine learning. The most simple form of hypothesis testing is binary hypothesis testing,

  14. PDF 1 Minimax Optimal Estimation of KL Divergence for Continuous Distributions

    Kullback-Leibler (KL) divergence has a broad range of applications in information theory, statistics and machine learning. For example, KL divergence can be used in hypothesis testing [1], text classi-fication [2], outlying sequence detection [3], multimedia classification [4], speech recognition [5], etc.

  15. Kullback-Leibler KL Divergence

    Kullback-Leibler divergence (also called KL divergence, relative entropy information gain or information divergence) is a way to compare differences between two probability distributions p (x) and q (x). More specifically, the KL divergence of q (x) from p (x) measures how much information is lost when q (x) is used to approximate p (x).

  16. Robust Kullback-Leibler Divergence and Universal Hypothesis Testing for

    Abstract: Universal hypothesis testing (UHT) refers to the problem of deciding whether samples come from a nominal distribution or an unknown distribution that is different from the nominal distribution. Hoeffding's test, whose test statistic is equivalent to the empirical Kullback-Leibler divergence (KL divergence), is known to be asymptotically optimal for distributions defined on finite ...

  17. Lecture 7: Hypothesis Testing and KL Divergence 1 Introducing The

    Lecture 7: Hypothesis Testing and KL Divergence 1 Introducing The; Improved Security Proofs in Lattice-Based Cryptography: Using the Rényi Divergence Rather Than the Statistical Distance; Minimum Description Length Model Selection; Minimum Divergence Estimators, Maximum Likelihood and the Generalized Bootstrap; 2.4.8 Kullback-Leibler Divergence

  18. PDF Lecture 7: Hypothesis Testing and KL Divergence 1 Introducing the

    Lecture 7: Hypothesis Testing and KL Divergence 2 The quantity R log q(x) p(x) q(x)dxis known as the Kullback-Leibler Divergence of pfrom q, or the KL diver-gence for short. We use the notation D(qjjp) = Z q(x)log q(x) p(x) dx for continuous random variables, and D(qjjp) = X i q ilog q i p i for discrete random variables. The above expression ...

  19. [2101.00136] Sub-Gaussian Error Bounds for Hypothesis Testing

    Abstract: We interpret likelihood-based test functions from a geometric perspective where the Kullback-Leibler (KL) divergence is adopted to quantify the distance from a distribution to another. Such a test function can be seen as a sub-Gaussian random variable, and we propose a principled way to calculate its corresponding sub-Gaussian norm.

  20. Kullback-Leibler divergence for Bayesian nonparametric ...

    See also Al-Labadi , Al-Labadi et al. , Al-Labadi et al. and Evans and Tomal for examples of using relative belief ratios in different hypothesis testing problems. Although the KL divergence sits atop most distance/divergence measures (Viele 2007), it follows clearly from the previous discussion that its use alongside the Dirichlet process is ...

  21. Systematic Bayesian posterior analysis guided by Kullback-Leibler

    First, we test our KL divergence ranking on an established example of Bayesian hypothesis formation. Our top-ranked parameter matches the one previously identified to produce alternative hypotheses. In the second example, we apply our ranking in a novel study of a computational model of prolactin-induced JAK2-STAT5 signaling, a pathway that ...

  22. Robust Kullback-Leibler Divergence and Universal Hypothesis Testing for

    Universal hypothesis testing refers to the problem of deciding whether samples come from a nominal distribution or an unknown distribution that is different from the nominal distribution. Hoeffding's test, whose test statistic is equivalent to the empirical Kullback-Leibler divergence (KLD), is known to be asymptotically optimal for distributions defined on finite alphabets. With continuous ...

  23. How to Calculate KL Divergence in R (With Example)

    In statistics, the Kullback-Leibler (KL) divergence is a distance metric that quantifies the difference between two probability distributions. ... ANOVA Chi-Square Tests Confidence Intervals Hypothesis Testing P-values and Effect Size Random Variables Regression Sampling Distributions All. ANOVA. Three-Way ANOVA: Definition & Example. January ...