Can any computational challenge be transformed to proof-of-work?

Question

The seemingly pointlessness of cryptocurrency mining raised the question of useful alternatives, see these questions on Bitcoin, CST, MO. I wonder whether there exists an algorithm that can convert practically any computational challenge $\mathcal C$ (whose solution can be verified efficiently) into another such challenge $\Psi(\mathcal C)$ (which is used for proof-of-work) such that

The function $\Psi$ is randomized, using some (public) random sequence $r$.
Solving $\Psi(\mathcal C)$ is typically as hard as solving $\mathcal C$.
If a solution $x$ is found for $\Psi(\mathcal C)$, then a solution $\Psi^{-1}(x)$ can be efficiently computed for the original challenge $\mathcal C$.
Knowing a solution for $\mathcal C$ does not help in finding a solution for $\Psi(\mathcal C)$.

$\;\:\:$4' (Update). As pointed out by Noah in a comment, the previous condition should be strengthened to requiring that preprocessing $\mathcal C$ should also not give any advantage in solving $\Psi(\mathcal C)$.

This last condition is required so that no one can be put into an advantageous position just because they know a solution of $\mathcal C$. Using this method, people could submit computational problems that they want solved and a central authority could pick some worthy of solving (like finding aliens vs. breaking passwords). Note that it doesn't seem to be an issue if the problem takes even a week to solve (I guess those aliens can't be that good in hiding ;), as this could result in a bigger reward for a solution. Anyways, these topics are not related to the solution of my theoretical problem, but of course I'm happy to discuss them in the comments/on the forum.

A possible solution would be the following: $\Psi$ maps $\mathcal C$ into $(\mathcal C,HASH_r)$, that is, to solving $\mathcal C$ and some other, computationally hard challenge. One problem with this is that knowing a solution to $\mathcal C$ does make solving $\Psi(\mathcal C)$ somewhat easier (how much easier depends on the difficulty of $HASH_r$). Another issue is that $\Psi(\mathcal C)$ became more difficult than $\mathcal C$.

This is a very interesting question, it came to me also a while ago. Considering the amount of energy that is being wasted for proof of work in cryptocurrencies, the answer can have a profound effect. — Kaveh, Jan 01 '18 at 12:56
One suggestion for improving the question: I think it would be helpful to rigorously define "proof-of-work challenge". — Kaveh, Jan 01 '18 at 12:58
Maybe this might be relevant: https://eprint.iacr.org/2017/203.pdf — Andreas Björklund, Jan 01 '18 at 13:20
@Andreas Yes, I've seen this, but I don't think it's that relevant. — domotorp, Jan 01 '18 at 19:57
@Kaveh I don't really understand what kind of definition would be required. A "proof-of-work challenge" can be any computational problem (whose solution can be verified efficiently). — domotorp, Jan 01 '18 at 20:00
What is the difference between a "computational challenge" and a "proof-of-work challenge"? — Or Meir, Jan 01 '18 at 22:42
I am asking what are the conditions exactly that an NP problem needs to satisfy for it to be used for cryptocurrencies' proof of work. — Kaveh, Jan 01 '18 at 23:40
This seems interesting. I know that the focus is on the thereotical problem. However, I would like to hint at the problem of determining the complexity (and adjusting the complexity according to the network power) of "useful computational problems". — 3psil0n, Jan 01 '18 at 20:17
Yes, of course that's also an issue, but I think it's better to focus on the above question first. Btw, what I had in mind was that problems that are eventually harder to solve, can be rewarded with more "cryptomoney," so if a problem is unsolved for X minutes, then X coins would be awarded. (Here X can even be 100000, especially if we allow multiple open challenges at a time.) — domotorp, Jan 01 '18 at 20:33
@Or Nothing. I've edited the question to clear everyone's doubts. — domotorp, Jan 02 '18 at 04:05
@Kaveh As long as we are considering only one instance and not a series of scalable challenges, I don't think there are any requirements (except to be in $NP$). — domotorp, Jan 02 '18 at 04:07
Sure, but the very definition of proofs of work usually requires to consider several challenges, as the core property that defines them is non-amortizability. This is the reason why works such as eprint.iacr.org/2017/203.pdf have been done - you need non-amortizability guarantees for almost all applications of PoWs, especially cryptocurrencies. Anyway, are you looking for a publicly verifiable solution, or would a privately verifiable one suffice? Do you want a practically efficient scheme, or are you ok with a theoretical solution? — Geoffroy Couteau, Jan 02 '18 at 07:36
Why is non-amortizability important if there are enough proposals, and solving each can last hours? I suppose the solution should be publicly verifiable. For this question, I'm ok with a theoretical solution. — domotorp, Jan 02 '18 at 14:03
I think for condition (4), you probably want a slightly stronger requirement that any (reasonably sized) preprocessing that depends on $\mathcal{C}$ and $\Psi$ (but not the public randomness $r$) does not give much advantage in solving $\Psi(\mathcal{C})$. — Noah Stephens-Davidowitz, Jan 02 '18 at 18:09
@domotorp: non-amortizability seems crucial because without it, nothing guarantees that one couldn't solve a thousand challenge using the time and computational power needed to solve a single challenge. For cryptocurrencies applications, that would mean one can prove that he worked N times at the cost of working a single time. — Geoffroy Couteau, Jan 02 '18 at 20:39
Another remark: doesn't 3 imply 2? If a solution for $\Psi(\mathcal{C})$ implies finding a solution for $\mathcal{C}$ efficiently, then it necessarily follows that $\Psi(\mathcal{C})$ is at least as hard to compute as the best algorithm for $\mathcal{C}$. Otherwise, one would directly get a better algorithm by constructing $\Psi(\mathcal{C})$, solving it, and recovering a solution to $\mathcal{C}$ from it. — Geoffroy Couteau, Jan 02 '18 at 20:57
I made a better answer to https://cstheory.stackexchange.com/a/39905/5385. Does it answer your question also? How would I make it answer your question? — Joshua Herman, Jan 03 '18 at 03:30
@Geoffroy 1: Now I see what you mean, indeed non-amortizability is required. 2: 3 doesn't imply 2, because $\Psi(\mathcal C)$ might be harder than $\mathcal C$ (see the comments on Noah's answer). — domotorp, Jan 03 '18 at 05:06
@Joshua While your solution is an example for useful PoW, I don't see why it would satisfy the requirements I've posed. — domotorp, Jan 03 '18 at 05:20
@domotorp I see, I had interpreted 2 as "solving $\Psi(\mathcal{C})$ is at least as hard as solving $\mathcal{C}$". Then if you want it to be "not harder" than $\mathcal{C}$ the theoretical solutions I could think of won't work - they incur a polynomial blowup to the complexity of solving the challenge. — Geoffroy Couteau, Jan 03 '18 at 08:24
@domotorp why do you think that eprint.iacr.org/2017/203.pdf is not relevant to your question? — Alon Rosen, Jan 04 '18 at 08:39
@Alon I don't think it extends to any computational challenge $\mathcal C$, it is just about a specific problem. But please correct me if you think I'm wrong and it satisfies 1-4 for any $\mathcal C$. — domotorp, Jan 04 '18 at 09:02
Even though it does not provide a reduction from any worst-case problem in P, the paper gives useful PoW based on a broad set of problems. Specifically, any problem reducible to Orthogonal Vectors (OV), including all graph problems that are statable in first-order logic. It also applies to the k-OV problem (conjectured to require roughly n^k time), as well as other problems from the fine-grained complexity world. So while perhaps not as general as you would expect, the results are still quite general. And for the problems I mentioned above, properties 1-4 are indeed satisfied. — Alon Rosen, Jan 04 '18 at 12:42
@Alon You know what, you're right! This is indeed almost the same thing that I was looking for. — domotorp, Jan 04 '18 at 19:54
@Noah There is no option to award him a bounty because he hasn't provided an answer - comments cannot be awarded. If he leaves an answer, then you can set up another bounty for this question and award it to him. — domotorp, Jan 08 '18 at 19:42

Noah Stephens-Davidowitz · Accepted Answer · 2018-01-13T02:13:32.280

8

(Note: Andreas Björklund suggested a solution in the comments that I believe is better than the one described below. See http://eprint.iacr.org/2017/203, by Ball, Rosen, Sabin, and Vasudevan. In short, they give proofs of work based on problems like Orthogonal Vectors whose hardness is well understood and to which many problems (e.g., k-SAT) can be reduced relatively efficiently. Their PoW instance $\Psi(\mathcal{C})$ is as hard as a worst-case Orthogonal Vectors, even if the input instance $\mathcal{C}$ is easy, so that they avoid a major drawback of the solution described below.

The solution described below might benefit from its simplicity---it can be described to a non-expert---but it seems to me to be much less interesting theoretically.)

A solution is possible if one makes the strong assumption that "the fastest algorithm for $\mathcal{C}$ is fundamentally randomized" (and if we model a cryptographic hash function as a random oracle). One way to formalize this is to say that

$\mathcal{C} \in \mathsf{TFNP} \setminus \mathsf{FP}$ (otherwise, I guess it's not really a valid challenge);
the fastest randomized algorithm for $\mathcal{C}$ runs in expected time $T$ on a typical instance; and
there exists an efficiently computable function $f$ from $\{0,1\}^k$ to the domain of solutions to $\mathcal{C}$ for $k \approx \log_2 T$ such that there always exists an $s \in \{0,1\}^k$ with $f(s)$ a solution to $\mathcal{C}$.

Notice that the assumption that $k \approx \log_2 T$ implies that brute-force search of $\{0,1\}^k$ is essentially the optimal algorithm for $\mathcal{C}$. So, this is quite a strong assumption. On the other hand, if $\mathcal{C}$ does not satisfy these properties, then it's hard for me to imagine satisfying both your conditions (2) and (4).

Then, given a hash function $H : \{0,1\}^* \to \{0,1\}^k$, which we model as a random oracle, we define $\Psi_H(\mathcal{C}; r)$ as follows, where $r \in \{0,1\}^\ell$ for some $\ell \gg k$ is the random input to $\Psi_H$. The goal is to output $x \in \{0,1\}^*$ such that $f(H(r,x))$ is a solution to $\mathcal{C}$. In other words, $(r,x)$ should hash to "good random coins" for the above algorithm.

Let's see that this satisfies all of your conditions.

"The function $\Psi$ is randomized, using some (public) random sequence $r$." Check!
"Solving $\Psi(\mathcal{C})$ is typically as hard as solving $\mathcal{C}$." Notice that the simple randomized algorithm for $\Psi_H(\mathcal{C},r)$ runs in expected time at most $2^k$ plus polynomial overhead, and by assumption $2^k \approx T$ is essentially the running time of the optimal algorithm for $\mathcal{C}$.
"If a solution $x$ is found for $\Psi(\mathcal{C})$, then a solution $\Psi^{-1}(x)$ can be efficiently computed for the original challenge $\mathcal{C}$." This can be done by computing $f(H(r,x))$, which is a solution to $\mathcal{C}$ by assumption.
"Knowing a solution for $\mathcal{C}$ does not help in finding a solution for $\Psi(\mathcal{C})$." By definition, solving $\Psi_H(\mathcal{C}; r)$ requires finding $x$ such that $f(H(r,x))$ is a solution to $\mathcal{C}$. Since we modeled $H$ as a random oracle, we can lower bound the expected running time of any algorithm that solves this problem by the optimal expected query complexity of the query problem in which $H$ is given by a black box and we are asked to find a solution to the same problem. And, again because $H$ is a random oracle, the expected query complexity is just the inverse of the fraction of elements $x \in \{0,1\}^k$ that are solutions (up to a constant factor). By assumption, the optimal expected running time of any algorithm for $\mathcal{C}$ is $T \approx 2^{k}$, which implies that this fraction cannot be much larger than $2^{-k}$. Since $\ell \gg k$ and $r \in \{0,1\}^\ell$ is chosen uniformly at random, this is even true with preprocessing that is allowed to depend on $H$ and $\mathcal{C}$ (but not $r$), and in particular it is true even if we know a solution to $\mathcal{C}$ in advance.

edited Jan 13 '18 at 02:13

answered Jan 02 '18 at 18:05

Noah Stephens-Davidowitz

1,214
8
10

This is a very nice solution. The only place where I see a possibility of improvement is condition (2). For many problems in $NP$, there are algorithms in $c^n$ time for some $c<2$. It would be nice if something like this could be preserved, but I'm not sure if it can be done. Surely your method is superior already to the ones used currently for cryptocurrencies! – domotorp Jan 02 '18 at 20:02
In fact, maybe not even much needs to be changed in the blockchain. Just the community can agree that at some given time an $x$ needs to be appended to the blockchain whose hash solves whichever practical problem. In fact, maybe the standard blockchain can continue, and this could just be an independent, solo challenge. Possibly on the market such a solo instance will be worth more than traditional coins, just like Rogue One is better than sw7 or sw8. – domotorp Jan 02 '18 at 20:09
Glad you like it :). I just want to clarify that while my conditions on $\mathcal{C}$ do imply that "brute-force search over some search space is essentially optimal," they do not imply that brute-force search over the original search space is essentially optimal. E.g., for SAT, this is not the same as requiring the fastest algorithm to run in $2^n$ time. – Noah Stephens-Davidowitz Jan 02 '18 at 20:31
In case of composition -for example the computational problem admits a problem definition in which the computational problem can be composed of smaller problems, whose solution is easier, and there is a solution which is not based on composition, would non-amortizability account for this? – user3483902 Jan 03 '18 at 07:12
I think another issue with this solution is what you've pointed out in a comment to my question, namely, that if someone can preprocess $\mathcal C$ in an efficient way, they can get a serious advantage. I think this is quite a sensitive issue. Imagine that I submit a problem whose solution (in a standard format) can be checked in $n$ time, but I have a secret method to check it in $\sqrt n$ time. This gives me quite an advantage for solving $\Psi(\mathcal C)$. – domotorp Jan 03 '18 at 13:37
@user3483902 I'm sorry, but I don't understand your question. – Noah Stephens-Davidowitz Jan 03 '18 at 13:58
@domotorp I guess it depends what you mean by "quite an advantage". By assumption, the fastest algorithm for $\Psi(\mathcal{C})$ runs in superpolynomial time $T \cdot \mathrm{poly}(n)$ and you're describing an advantage in the polynomial factor. – Noah Stephens-Davidowitz Jan 03 '18 at 14:08
From my experience in this area you should make more and more restrictions to the task to eliminate shortcuts. That’s why I moved from the problem from a single knot to a set of knots . – Joshua Herman Jan 03 '18 at 15:42
If this is to be implemented in practice, then even a constant factor advantage might make a difference - we're talking money, so people won't like it if someone can mine faster. But this condition might be quite hard to satisfy. – domotorp Jan 03 '18 at 17:01
@domotorp Yes, that seems like it would be very hard at this level of generality... – Noah Stephens-Davidowitz Jan 03 '18 at 17:25
To elaborate, for any construction (not necessarily the one above) one of these things has to be true: (1) there is a publicly known algorithm for $\Psi(\mathcal{C})$ that runs $\alpha$ times faster than the algorithm that brute-force searches a solution to $\Psi(\mathcal{C})$, converts it to $\mathcal{C}$, and checks it; (2) the publicly known algorithm for checking solutions to $\Psi(\mathcal{C})$ cannot be improved by a factor of $\alpha$; or (3) there exists specialized knowledge that gives some people an advantage. If there exists $C$ for which either (1) or (2) is false, then (3) is true – Noah Stephens-Davidowitz Jan 03 '18 at 17:36
My problem is with the "checks it" part of $\mathcal C$. For example, if we want to find a solution for a CNF, then if it has a clause of the form $(x_1\vee x_2\vee x_3)$, we can skip solution candidates that start with $3$ 'falses' - this is a $7/8$ times faster algorithm than checking all solutions. – domotorp Jan 03 '18 at 19:27
@domotorp Yes. What I was trying to say above is that this issue is more-or-less inherent. If someone can test a possible solution to $\mathcal{C}$ faster, then they will necessarily have an advantage unless either computing $\Psi(\mathcal{C})$ is significantly easier than the brute-force approach. Perhaps we can find a construction in which $\Psi(\mathcal{C})$ is in fact easier than brute force, which would in some sense handle exactly the cases that the above construction does not. – Noah Stephens-Davidowitz Jan 03 '18 at 20:47
Would there be a way for an adversary to increase the time it required to check by creating a pathological answer or question that would have a large constant factor to check? – Joshua Herman Jan 03 '18 at 20:48
@JoshuaHerman I guess that's part of the reason that we want $\mathcal{C} \in \mathsf{TFNP}$, to guarantee that every instance has a short answer that is efficiently checkable. Of course, as OP pointed out above, the specific details matter, so we may want to restrict our attention to problems in which the average-case answer is about as long and hard to check as the worst-case. I'm not sure how big of a problem this is, though it does seem to me that many/most "natural" TFNP problems don't have "unusually hard instances," so easy instances might be more of an issue than hard ones. – Noah Stephens-Davidowitz Jan 03 '18 at 21:02
I would suggest instead of TFNP you would use problems in #P and make the counting test run in P . Gaming one arbitrary problem in TFNP may work. But, having a counting problem for you to game would at least make you have to game at least #P queries. Then if you see an issue you just increase the counting problem size. (Like set an arbitrary cap on the number of solutions you have to count and if you have an issue up the cap.) – Joshua Herman Jan 04 '18 at 22:45

score 1 · Answer 2 · answered Mar 05 '18 at 05:40

The following simple technique which I call the solution lottery technique (SLT) can be used in conjunction with other techniques (such as having multiple POW problems, the technique mentioned in Noah Stephens-Davidowitz's answer, etc) to help transform computational challenges into viable proof of work problems. The SLT helps ameliorate issues with cryptocurrency mining problems other than conditions 1-4.

Suppose that $\mathcal{C}$ is a computational challenge of the form “find a suitable hash $k$ along with a string $x$ such that $(k,x)\in D$.”

Problem $\Psi(\mathcal{C})$ setup: Suppose that $D$ is a set, $H$ is a cryptographic hash function, and $C$ is some constant. Suppose furthermore that $\textrm{Data}(k,x)$ is a piece of information that is easy to obtain after one determines that $(k,x)\in D$ but which cannot be obtained otherwise.

Problem $\Psi(\mathcal{C})$ objective: Find a pair $(k,x)$ such that $k$ is a suitable hash and where $(k,x)\in D$, and where $H(k||x||\textrm{Data}(k,x))<C$.

Let us now investigate how problem $\Psi(\mathcal{C})$ satisfies requirements 1-4.

We have to assume $\mathcal{C}$ is already randomized for the SLT to satisfy this property.

2-3. $\Psi(\mathcal{C})$ will typically become more difficult than $\mathcal{C}$ and this is a good thing. The difficulty of a proof-of-work problem needs to be finely tunable, but the original problem $\mathcal{C}$ may or may not have a finely tunable level of difficulty (remember that the difficulty in mining Bitcoin is adjusted every two weeks). The difficulty of problem $\Psi(\mathcal{C})$ is equal to the difficulty of finding some suitable $(k,x)\in D$ multiplied by $\frac{2^{n}}{C}$. Therefore, since the constant $C$ is finely tunable, the difficulty of $\Psi(\mathcal{C})$ is also finely tunable.

Even though the problem $\Psi(\mathcal{C})$ is more difficult than the original problem $\mathcal{C}$, almost all of the work for solving the problem $\Psi(\mathcal{C})$ will be spent on simply finding a pair $(k,x)$ with $(k,x)\in D$ rather than computing hashes (one cannot compute whether $H(k||x||\textrm{Data}(k,x))<C$ or not until one has computed $\textrm{Data}(k,x)$ and one cannot compute $\textrm{Data}(k,x)$ unless one verifies that $\textrm{Data}(k,x)\in D$).

Of course, the fact that $\Psi(\mathcal{C})$ is more difficult than $\mathcal{C}$ presents some new concerns. For a useful problem, it is most likely the case that one would want to store the pairs $(k,x)$ where $(k,x)\in D$ in some database. However, in order to receive the block reward, the miner must only reveal a pair $(k,x)$ where $(k,x)\in D$ and $H(k||x||\textrm{Data}(k,x))<C$ instead of all the pairs $(k,x)\in D$ regardless of whether $H(k||x||\textrm{Data}(k,x))<C$ or not. One possible solution to this problem is for the miners to simply reveal all pairs $(k,x)$ where $(k,x)\in D$ out of courtesy. Miners will also have the ability to reject chains if the miners have not posted their fair share of pairs $(k,x)\in D$. Perhaps, one should count the number of pairs $(k,x)\in D$ for the calculation as to who has the longest valid chain as well. If most of the miners post their solutions, then the process of solving $\Psi(\mathcal{C})$ will produce just as many solutions as the process of solving $\mathcal{C}$.

In the scenario where the miners post all of the pairs $(k,x)\in D$, $\Psi(\mathcal{C})$ would satisfy the spirit of conditions 2-3.

$\Psi(\mathcal{C})$ may or may not satisfy condition $4$ depending on the specific problem.

$\textbf{Other Advantages of this technique:}$

The SLT offers other advantages than conditions 1-4 which are desirable or necessary for a proof-of-work problem.

Improving the security/efficiency balance: The SLT will help in the case that $\mathcal{C}$ may be too easy to solve or too difficult to verify. In general, $\Psi(\mathcal{C})$ is much more difficult to solve than $\mathcal{C}$, but $\Psi(\mathcal{C})$ is about as easy to verify as $\mathcal{C}$.
Removal of a broken/insecure problem: The SLT could be used to algorithmically remove bad POW problems in a cryptocurrency with a backup POW-problem and multiple POW problems. Suppose that an entity finds a very quick algorithm for solving problem $\mathcal{C}$. Then such a problem is no longer a suitable proof-of-work problem and it should be removed from the cryptocurrency. The cryptocurrency must therefore have an algorithm that removes $\mathcal{C}$ from the cryptocurrency whenever someone has posted an algorithm that solves problem $\mathcal{C}$ too quickly but which never removes problem $\mathcal{C}$ otherwise. Here is an outline of such a problem removal algorithm being used to remove a problem which we shall call Problem $A$.

a. Alice pays a large fee (the fee will cover the costs that the miners incur for verifying the algorithm) and then posts the algorithm which we shall call Algorithm K that breaks Problem $A$ to the blockchain. If Algorithm K relies upon a large quantity of pre-computed data $PC$, then Alice posts the Merkle root of this pre-computed data $PC$.

b. Random instances of Problem A are produced by the Blockchain. Alice then posts the portions of the pre-computed data which are needed for Algorithm K to work correctly along with their Merkle branch in order to prove that the data actually came from $PC$. If Alice's algorithm fed with the pre-computed data $PC$ quickly, then the problem is removed and Alice receives a reward for posting the algorithm that removes the problem from the blockchain.

This problem removal procedure is computationally expensive on the miners and validators. However, the SLT removes most of the computational difficulty of this technique so that it can be used if needed in a cryptocurrency (instances which this technique is used will probably be quite rare).

Mining pools are more feasible: In cryptocurrencies, it is often very difficult to win the block reward. Since the block rewards are very difficult to win, miners often mine in things called mining pools in which the miners combine their resources in solving a problem and in which they share the block reward in proportion to the amount of “near misses” they have found. A possible issue for $\mathcal{C}$ is that it may be difficult to produce a qualitative notion of what constitutes as a “near miss” for the problem $\mathcal{C}$ and the algorithm for finding a near miss may be different from the algorithm for solving $\mathcal{C}$. Since the pool miners will be looking for near misses, they may not be very efficient at solving $\mathcal{C}$ (and hence, few people will join mining pools). However, for $\Psi(\mathcal{C})$, there is a clear cut notion of a near miss, namely, a near miss is a pair $(k,x)$ where $(k,x)\in D$ but where $H(k||x||\textrm{Data}(k,x))\geq C$, and the algorithm for finding near misses for $\Psi(\mathcal{C})$ will be the same as the algorithm for finding solutions to $\Psi(\mathcal{C})$.
Progress freeness: A proof-of-work problem $P$ is said to be progress free if the amount of time it takes for an entity or group of entities to find next block on the blockchain follows the exponential distribution $e^{-\lambda x}$ where the constant $\lambda$ is directly proportional to the amount of computational power that entity is using to solve Problem $P$. Progress freeness is required for cryptocurrency mining problems in order for the miners to receive a block reward in proportion to their mining power to achieve decentralization. The SLT certainly helps mining problems achieve progress freeness.

Can any computational challenge be transformed to proof-of-work?

2 Answers2