bsy's Explanation of Zero Knowledge Proofs

Abstractly, a zero-knowledge proof is an interactive proof with a prover and a verifier, where the prover convinces the verifier of a statement (with high probability) without revealing any information about how to go about proving that statement. Hopefully the following example will make it all clear.

First, our assumptions. We're going to do arithmetic modulo n, where n = pq, and p and q are primes. Factoring n is assumed to be intractable.

Rabin showed in [RabinFunc] that finding square roots modulo n is equivalent to factoring n. That is, if you have an algorithm that can find a square root of a number modulo n, then you can use that algorithm to factor n. Our zero-knowledge proof will consist of rounds of interaction which shows that the prover knows a square root of a published number, where we do not reveal any new information about the square root. It is known that there exists a square root to this number (public knowledge), i.e., it is a quadratic residue. The factors of the modulus n may be entirely secret. (U. Feige et al show a refinement in [FFS] which allows the published number to be a non-quadratic residue of a particular form as well, thus revealing less information; in either case, runs of the protocol itself reveals no new information.)

The prover, P, publishes the quadratic residue v for which P claims to know a root s.

When P wishes to prove its knowledge of s to the verifier, V, P runs several rounds of interaction. In each round, P choses a new random number r and sends x = r² mod n to V. Now, V choses a random bit b, and sends it to P. P replies with y = r s^b. To verify P's claim, V computes y² and compares it with x v^b.

Now, let's do the analysis. The first claim is that only P can successfully complete the protocol for both possible values of b. This is clear, since knowing y₀ = r when b = 0 and y₁ = r s when b = 1 means you also know s, since y₁/y₀ = s. The second claim is that an imposter P' who does not actually know s can succeed with a probability of exactly 1/2 each round: to see this, notice that if P' guesses correctly that b = 0, then it can just follow the protocol and succeed; on the other hand, if P' guesses that b = 1, P' can generate x by chosing a random number t and setting x = t² / v. The response is y = t. The third claim is that no new information is released. To see this, consider what an eavesdropper E hears. In the case of the random bit b = 0, E sees a random numer r and its square x; in the case of b = 1, E sees the numbers rs and x = (rs)²/v. These are numbers that the eavesdropper could have generated in a closet. More precisely, a simulator S can run both sides of the protocol, and by using advanced information as to the value of the random bit (model is a poly-time TM with an auxiliary input tape of random bits), S can simulate the protocol without knowledge of s.

Each round of the proof shows that there is a 1/2 chance that a prover P'' (we don't know whether P'' is P or P') might not actually know s. Iterating 20 times gives a probability of less than 2^-20 or .0000009536 that P'' does not actually know s.

Such zero-knowledge proofs can be used for authentication -- the value of v can be generated from a randomly chosen s, and v is widely published as the public authentication ``puzzle''. A successful zero-knowledge proof showing knowledge of s authenticates identity. In [StrongboxIn25th], Doug Tygar and I show how to obtain superexponential scaling in security modulo the factorization assumption, run the protocol in constant rounds while retaining the zero-knowledge property, and simultaneously perform key exchange.

Note that using only a zero-knowledge proof of identity over a communication channel such as that provided by TCP/IP does not suffice to provide a secure communication channel: an attacker who has access to a link in the Internet through which your packets all cross may wait until the authentication protocol completes, and then `hijacks' your connection. Furthermore, doing the obvious protocol piggy-backing to run zero-knowledge authentication at the same time as, say, anonymous Diffie-Hellman key exchange is subject to message tampering -- an attacker may substitute her/his own Diffie-Hellman values in your packets, using your zero-knowledge authentication as a subroutine. Care must be taken when composing two cryptograhic protocols to ensure that the needed security properties are retained.

-bsy

-------------------- bibtex format bibliographic entries --------------------

@TechReport(RabinFunc,
Author="Michael Rabin",
	Institution="Laboratory for Computer Science,
		 Massachusetts Institute of Technology",
	Title="Digitalized Signatures and Public-Key
        	Functions as Intractable as Factorization",
	Key="Rabin",
	Year=1979,
	Month="January",
	Number="MIT/LCS/TR-212")

@InProceedings(FeigeFiatShamir,
	Key="Feige",
	Author="Uriel Feige and Amos Fiat and Adi Shamir",
	Title="Zero Knowledge Proofs of Identity",
	Year=1987,
	Pages="210-217",
	Booktitle="Proceedings of the 19th ACM Symp. on Theory of Computing",
	Month="May")

@Inproceedings(StrongboxIn25th,
	Key="Tygar and Yee",
	Author="J. D. Tygar and Bennet S. Yee",
	Title="Strongbox:  A System for Self Securing Programs",
	Organization = "ACM",
	Booktitle="CMU Computer Science:
		25th Anniversary Commemorative",
	Year = 1991)

bsy+www@bennetyee.org, last updated Mon Dec 13 23:22:08 PST 2004. Copyright 2004 Bennet Yee.
email bsy.