-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Novel constructions for Proof-of-Replication #4
Comments
It seems that the given PoRep constructions support only static data. What about the updates? |
@mohammatt, currently you would have to send the updated data (or the diffs) to the prover, both of you must agree on the hash of the data, and you must re-perform the seal. It would be interesting to look into PoRep with a more efficient data update! |
Sorry to insist with that question, but what prevents a peer A or a group of peers colluding together, having zero storage to pretend storing data D just by sending to a single entity B the PoRep setup for each of them, so B will do the job and store the replicas for data D then get from A (and others) the challenges and reply back via A (and others)? B will indeed be storing n non deduplicated replicas of data D, but still those that are pretending storing data D are not the right ones The incentive to do this might be that it's cheaper/easier to store in one big storage than n different ones, or to hide the real location of some contents |
Hey thanks for your question! Say that A and B both promise you to store your file F1. Instead of storing it themselves they give to to C. Now, in order to reply PoRep challenges, A and B proxy the challenges to C and get respectively their proofs and send them to you. Now:
|
So I am correct and this is a serious issue I think, because A and B (in your example) are sybils Indeed the incentive does not look completely obvious and C is still storing two copies, now I gave two examples and probably people will find others, A and B could just be home users, cheating and in addition slowing down the network, and from a legal standpoint illegal content could be deemed to be stored by A and B, while C is storing it Proof of retrieval might not solve this, proof of location, yes, now how to do it? Maybe like I sketched already, even if a kind of brutal (and maybe trivial but I don't see other means for now): overload periodically and sequencially one of the provers, until it starts failing the challenges, challenge the others, those that fail the test are cheating and banned |
I agree with @Ayms, PoRep does not prevent outsourcing the outsourced data. |
If A and B are asked to store the same file, they can outsource to C, but C has to store twice the same file since A and B must be able to prove that they have independent physical copies. Outsourcing is not prevented in the sense that you cannot outsource your data, but it's prevented in the sense that you cannot rely on someone else storing the data that you should be storing. In other words, if A and B both are meant to store the same file, at any point in time in the network, there must be two copies, A cannot just rely on the copy that B has. As long as A and B have a way to store their own copies, then, they are not cheating! |
@nicola, I think we are talking about different things. What you are saying can be achieved using different replicas/encodings. I.e., whether A and B are storing the data or moving it onto C, there are always two (or whatever number is agreed on) copies. My question is about the "physically independent copies". Does it mean that the two copies are guaranteed to be stored on different machines? |
@mohammatt this looks like a problem of lack of shared definitions. I hope this helps. From @nicola's comments above:
"Physically independent copies" means that every bit of each copy is physically stored on hardware somewhere. PoRep does not ensure that the bits of both copies are kept on units of hardware that satisfy any separation requirements (e.g. different disks, different disks controlled by separate machines). @nicola refers to this separate problem above as "making sure that your copies are on two different locations." This concern is certainly useful to think about, but it is separate from the PoRep abstraction's goals. Yes, you can "outsource" the data in the sense that a single machine A can store replicas 1 and 2. No you cannot outsource the data in the sense @nicola defines in the first comment because machine A cannot store a single copy, say 1, and be able to prove that it is storing copies 1 and 2. --edit-- |
???, no, the outsourcing issue can have a cascading effect, here A (1) is better C who delegated 2 (from B) to D, then C can prove that it is storing 1 (from himself) and 2 (from D) when A (1) and/or B (2) get challenged @mohammatt states that we can't know if A is storing on different devices or just one that might crash, which indeed is misleading in terms of wording in the specs and should be more clear As well as you should highlight clearly my concern or put it as a problem to solve If we take IPFS alone, this is not an issue, what we want is just to retrieve the pieces, we don't care if the peers are delegating storage, using only one device, or whatever they do Now, IPFS alone might not fly, all P2P networks failed because of the lack of incentive for people to run peers/nodes, except bittorrent where the incentive is mostly to download "illegal" copyrighted content and where everybody freeride but not completely in fact, then the network succeeds to sustain itself That's why there is Filecoin, now the peers are rewarded to store data, from my standpoint if they store everything on only one device that crashes without backup, then it is their problem, we can expect that others are storing the content too, but if we can't be sure about whom is storing what, then that's a different story, because sybils will be rewarded just to do quasi nothing, and it's incoherent with the Filecoin concepts to reward/pay those that are storing I gave a fraudulent case off this thread, a simple one is the same as above: a group colluding to buy more storage from only one server and sharing the lower costs than having several servers (btw this extends @mohammatt remark too, everything is most likely stored on one physical device only, maybe raid duplicated, but still on one central machine...), a bit like a mining pool but much worse, members just do nothing If this can't be considered as a major issue then maybe we could wonder why this was one of my first thoughts and just hope that I am the only one in the world to have such bad intents |
Hey thanks for re-iterating, but I don't understand your attack, I am sorry :(
So Proof-of-Replication guarantees that at any point in time, if A and B promised to find a way to store 2 copies of your data, then there must be two copies sitting around somewhere or A and B will lose their collateral. In this setting, there is no way A and B can lie about the total amount of storage that they can get access to (either their own, found for free somewhere or by paying someone)
Now, what you are saying is correct, A and B can go to Amazon and pay amazon to store the files they are meant to store (C=Amazon). However, in this case, they need to pay Amazon, however, they can only pay this strategy rationally only if the they make a profit, meaning that the cost of storage in Filecoin is higher than Amazon. I hope this clarify! |
Apparently it's difficult for the team to admit the issue and your are playing with words, while you already admitted it from the previous posts "same" below = same person or colluding ones under the control of one "same"
Too bad :-)
Please see below
Yes, then please you guys stop changing what is doing A,B and C every post, or stop presenting irrelevant situation for them
Doing nothing === just relaying the data, which indeed is doing nothing in terms of storage, I thought that this was implicit since this is what we are talking about A,B and C are the same, it's not supposed to be difficult to understand
Yes, this is the umpteenth time that this is repeated here, this is all what PoRep does, insuring a non deduplicated storage by "we don't know whom"
"physically independent copies" is unclear, cf above posts (not from myself)
Please see above, they are the same again, if one server is not enough you can just add others
Of course
This is an "image" of the fraudulent case that I sent off this thread, funny that you mention it, do you really think that A and B hiding behind an anonymizer network would store on Amazon? But indeed the incentive to store on Amazon vs Filecoin rewards might be unlikely, as unlikely as storing alone probably, then back to the mining pool collusion
No, but as already suggested you can just mention it clearly as a non issue in the specs, so everybody is aware of it Are you sure that you guys know how things are working in reality or are you just playing here and trying to feign/elude the problem? |
Closing issue. Current discussions better routed to https://github.com/protocol/cryptonetlab |
Still outstanding to see that anonymity is still nowhere in your roadmap/issues and/or Web 3 stuff |
Work in Progress: This is a work in progress. For comments and suggestions contact us at research@protocol.ai
We as Protocol Labs actively support these areas of research with grants, bounties and direct collaborations. We plan to fund research related to these open problems. Reach out if you want to work on or are working on these problems.
Proof-of-Replication
Introduction
This document presents Proof-of-Replication (PoRep), motivates its use cases, provides a definition and outlines the list of related open problems. For a more formal presentation of PoRep schemes, please refer to our Technical Report on Proof-of-Replication.
Motivation
Centralized Cloud. The current cloud infrastructure operates in a centralized setting: the cloud provider is trusted by the client to offer their service correctly, for example storing data, or perform computation. The centralized setting makes it difficult for a client to spot malfunctioning and malicious behaviors. This leads to a cloud infrastructure where only trusted providers participate.
Untrusted Cloud. In the past decade there has been extensive work on delegating computation and storage in an untrusted cloud setting. If the cloud provider operations were to be verifiable, then clients would not need to trust the providers. The intuition is that a provider must generate a proof for their the service that a client can verify. In a naive scenario, a client could store the data on a cloud storage provider and challenge the provider to serve the entire data back in order to verify that the storage provider is still storing the data. However, this straw-man solution would be impractical for standard cloud usage. Practical solutions rely on cryptographic assumptions where the storage provider generates a short cryptographic proof that the client can cheaply verify. There are several cryptographic proof systems for proving storage, we refer in particular Proof-of-Retrievability (PoR) and Proof-of-Data-Possession (PDP).
Decentralized Storage Networks. Decentralized Storage Networks are peer-to-peer systems where nodes (either altruistically or with incentives) store data of other peers. Similar to the untrusted cloud setting, client nodes verify cryptographic proofs rather than relying on trust to ensure that other nodes are storing their data.
Proof-of-Replication. Current schemes only guarantee that the storage provider had the file at the time the proof was generated. However, this does not guarantee that the file was stored through time and that some storage was dedicated to that particular file. In addition, in systems where providers are rewarded proportionally to their storage (e.g. the mining reward in Filecoin), current existing schemes cannot be used, since storage providers can lie about their storage. Proof-of-Replication is a novel proof system, where storage providers cannot lie about storing files, since they would have to store a unique physical permutation (that we refer to as replica) for each file or copy they claim to store.
Note: For a more detailed use-cases for Proof-of-Replication, please refer to our Technical Report (Section 1.3).
Problems with current PoR and PDP schemes
State-of-the-art Proof-of-Retrievability (PoR) and Proof-of-Data-Possession schemes used today in verifiable cloud storage and in decentralized storage networks, only guarantee that a provider had possession of the data at the time of the challenge/response interaction and they are subject to the following three attacks:
Note: For a more detailed survey of the different Proofs-of-Storage, please refer to our Technical Report (Section 1.2)
Problem Definition
Proof-of-Replication allows a storage provider to convince a user that some data D has been replicated to its own unique storage. This definition is stricter than PoR and PDP, since it forces the prover to store a unique copy of the file.
Definition
Proof-of-Replication enables a prover P to convince a verifier V that P is storing a replica R, a physically independent copy of some data D, unique to P. The scheme is defined by a tuple of polynomial time algorithms (Setup, Prove, Verify). The assumption is that generation of a replica after Setup must be difficult (if not impossible) to generate.
Note: For a formal definition of Proof-of-Replication, please refer to our Technical Report (Section 2).
Time-bounded Proof-of-Replication (Visual)
Timing assumption. Time-bounded Proof-of-Replication are constructions of PoRep with timing assumptions. The assumption is that generation of the replica (hence the Setup) takes some time t that is substantially larger than the time it takes to produce a proof (hence time(Prove)) and the round-trip time (RTT) for sending a challenge and receiving a proof.
Distinguishing Malicious provers. A malicious prover that does not have R, must obtain it (or generate it), before the Prove step. A verifier can distinguish an honest prover from a malicious prover, since the malicious one will take too long to answer the challenge. A verifier will reject if receiving the proof from the prover takes longer than a timeout (bounded between proving time and sealing time).
The following figure shows the different steps of the Proof of Replication on a timeline. We refer to the the process of generating a replica as "sealing".
Note: For a formal definition of Time-bounded Proof-of-Replication, please refer to our Technical Report (Section 2.1).
Beyond Time-bounded Proof-of-Replication
There exist other Proof-of-Replication schemes that do not rely on timing assumption but instead of trust assumption. For example, the replica could be generated via a multi-party computation with the assumption that all the involved parties will not collude in the future.
Open Problems
While there are some candidate constructions of the Proof-of-Replication, there are still improvements for making such constructions more practical. An ideal PoRep construction should have as many of the following properties as possible.
Desirable properties
Concrete Open Problems
Useful Readings
The text was updated successfully, but these errors were encountered: