Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Spec refining: Support of IPLD pointers as links #3

Closed
nicola opened this issue Jul 25, 2016 · 20 comments
Closed

Spec refining: Support of IPLD pointers as links #3

nicola opened this issue Jul 25, 2016 · 20 comments
Assignees

Comments

@nicola
Copy link
Member

nicola commented Jul 25, 2016

I define the following terms:

IPLD pointers (HASH/path):

  • hash/object pointer HASH
  • plus and an attribute pointer (e.g. /friends/0/name)

IPLD links: {'/': HASH}

As far as I remember, current implementations only support links that have hash pointers, however, IPLD links should support full IPLD pointers. So that these things can happen:

hashNicola
{
  name: {
    'first': 'Nicola',
    'last': 'Greco'
  }
}
hashNicola2
{
  fullname: {'/': 'hashNicola/name' }
}

hashNicola2/fullname/nicola === 'Nicola'

cc @jbenet, @mildred, @dignifiedquire

@dignifiedquire
Copy link
Member

I am 👍 on adding this to the libraries and making it explicit in the spec.

@jbenet
Copy link
Contributor

jbenet commented Aug 6, 2016

  • Yeah, this should be the case. 👍 to this.
  • It's necessary for full addressability.
  • I meant the current spec to enable this, but this should be explained better.

@jbenet
Copy link
Contributor

jbenet commented Aug 6, 2016

  • Why the new abstraction "IPLD Pointer"? it seems exactly the same as an "IPLD Path" to me. Seems superfluous. maybe i'm missing something.

@nicola
Copy link
Member Author

nicola commented Aug 7, 2016

Perfect, it sounds like we are approving this! 🎉

I have called them pointers to have a consistent wording in this issue (I am trying to use this new term). The way I am seeing it is that a path is from after the CID onwards (I treat the CID like a sort of origin). Path is still a valid (and preferred name)

@jbenet
Copy link
Contributor

jbenet commented Aug 7, 2016

We've used structures like "/ipfs//a/b/c" as IPFS Paths for a long
time. We've used "/ipld//a/b/c" to be IPLD Paths for months. The path
is the whole thing. This is important for treating these "fs-IRIs" as just
unix paths.

I do not think we gain much from IPLD Pointers vs the existing notion of
IPLD Paths.

Every name we add causes friction and baggage for newcomers. Since they're
very expensive, abstractions should prove to have substantial winnings to
remain in our model.
On Sun, Aug 7, 2016 at 04:27 Nicola Greco notifications@github.com wrote:

Perfect, it sounds like we are approving this!

I have called them pointers to have a consistent wording in this issue (I
am trying to use this new term). The way I am seeing it is that a path is
from after the CID onwards (I treat the CID like a sort of origin). Path is
still a valid (and preferred name)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcoaqD5L9KLv6_QFtks9NXYFsPD0beks5qdZbTgaJpZM4JT-rc
.

@nicola
Copy link
Member Author

nicola commented Aug 7, 2016

🎉 Let's call them IPLD paths then, I will update the rest of my writings soon

@jakobvarmose
Copy link

jakobvarmose commented Aug 22, 2016

Will this allow paths to traverse multiple objects? I.e. is this object valid:

hashTest:
{test: {"/":"hashNicola2/fullName/first"}}

This would add a lot of complexity because each of the path components could potentially also resolve multiple other objects. So essentially, path resolving is no longer a constant-time operation but may resolve a potentially infinite number of objects.

So I think only links in their canonical format should be allowed within objects:

hashTest:
{test: {"/":"hashNicola/name/first"}}

@nicola
Copy link
Member Author

nicola commented Aug 22, 2016

Hey, thanks for chipping in!

potentially infinite number of objects

it would not be infinite (it can't!)

I do see your concern in making IPLD slightly more complex than just pointing to simple hashes (or hashes + path of the attribute the hash itself) so avoiding having a path that works across object.

However, I don't see why this could be a problem. The IPLD should abstract away the resolving of a hash, in that particular case, we'll need to resolve two hashes. However, I guess it would be equivalent to link to a hash that links to another hash and do the hop yourself.

A great idea I had from thinking about this is that we could layer up different IPLD.

  • IPLD-level-0: links are only hashes
  • IPLD-level-1: links are hashes + paths that can be resolved in the object itself
  • IPLD-level-2: links are hashes + path that can have multiple hops to be resolved

One of the purpose of the IPLD path was originally IPLD-level-2, imagine having a merkle tree, and we want to point to the first leaf, then I could just do hash/left/left. It would not be possible without level-2 to do this type of pointing.

Although it is great to have this type of layers in mind, I think it would be rather complex to distinguish amongst them and the parsers are just slightly more complex, that it would be trivial. In this case however, we turn resolving into O(n) n being the number of hops across objects, however, if that is the amount of hops, there is little we can do (so, maybe for time-critical application it might be a great idea to structure your data as in IPLD-level-0/1)

cc @jbenet

@jakobvarmose
Copy link

it would not be infinite (it can't!)

Not in the matematical sense, but a simple-looking path like hashEvil/a could download 1 billion objects (or more).

Example:

hashEvil: {
  a: {/: "hash1/a"}
}
hash1: {
  a: {/: "hash2/a"}
}
...
[999999997 objects omitted]
...
hash999999999: {
  a: {/: "hash1000000000/a"}
}
hash1000000000: {
  a: "someValue"
}

@jbenet
Copy link
Contributor

jbenet commented Aug 23, 2016

Thanks for noting this.

However: Resolvers could have a resolution max. Nothing prevents infinite
loops in DNS, HTTP 302, etc. or just good old regular "a href"s. The
unlimitedness is by design. Note that the same exact thing (downloading a
billion objects) can happen if your graph is just very deep.

This is an implementation detail that's subject to the times. In 20 years,
our limitations will look foolish. Resolvers and retrievers can establish
limits if they wish. The mathematical model will not pull in such
complexity.

Also note that we could have a "resolve paths" operation that transforms a
level2 graph into level0.
On Mon, Aug 22, 2016 at 14:20 Jakob Varmose Bentzen <
notifications@github.com> wrote:

it would not be infinite (it can't!)

Not in the matematical sense, but a simple-looking path like hashEvil/a
could download 1 billion objects (or more).

Example:

hashEvil: {
a: {/: "hash1/a"}
}
hash1: {
a: {/: "hash2/a"}
}
...
[999999997 objects omitted]
...
hash999999999: {
a: {/: "hash1000000000/a"}
}
hash1000000000: {
a: "someValue"
}


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcoXvF6ylUvsTkKdCtlOPSPJew-wphks5qiehjgaJpZM4JT-rc
.

@Stebalien
Copy link
Contributor

So, someone has brought the fact that we haven't fixed this up and we really should. However, we'll have to decide how to encode these in CBOR.

Option 1: Just append it. That is, we could just have CID/path and take advantage of the fact CIDs are are length delimited.
Option 2: Use an array. We'd use the CID tag on an array instead of a byte string. The array would then be populated with the path components. The advantage is that this would allow us to support references (at this point, they aren't really paths) with slashes in them (do we even want to do that)?

@diasdavid ^^

@daviddias
Copy link
Member

Option 1: Just append it. That is, we could just have CID/path and take advantage of the fact CIDs are length delimited.

👍 I vote for this one

@ghost ghost assigned mikeal May 6, 2019
@ghost ghost added awaiting review status/in-progress In progress and removed status/deferred Conscious decision to pause or backlog labels May 6, 2019
@mikeal
Copy link
Contributor

mikeal commented May 6, 2019

I’m trying to mentally rebase this against the current stack and I’m running into some problems.

  1. We’d need to add this as another “kind” to the Data Model, which adds another barrier to creating new codecs that support the full Data Model.
  2. What exactly does “path” mean?
    • Is this restricted to a path contained in the same block?
    • If this path could traverse through multiple blocks, is this restricted to pathing at the Data Model or does it need to be schema aware (in other words, if it refers to data in a HAMT do we need to interpret it through a collection lookup or is it already parsed down into a Data Model path).
  3. Where does validation live? If this refers to data that does not exist, who would be responsible for presenting an error? All of our encoders/decoders operate on a per-block basis, they don’t have any means to validate a link that traverses through many blocks.

I’m also lacking clarity on what the use case here is. I’m just not aware of any particular use case this solves or aware of any place where we can’t do something because we don’t have this.

TBH, this throws a wrench in most of what we’ve built and the direction we’ve taken so I’m inclined to close it for now and return to the concept much later when we tackle some form of “mutable links.”

@Stebalien
Copy link
Contributor

I believe this is the same as #83.

We’d need to add this as another “kind” to the Data Model, which adds another barrier to creating new codecs that support the full Data Model.

I assume we'd just replace the Cid kind with a more generalized Link or Pointer kind.

Is this restricted to a path contained in the same block?

No.

If this path could traverse through multiple blocks, is this restricted to pathing at the Data Model or does it need to be schema aware (in other words, if it refers to data in a HAMT do we need to interpret it through a collection lookup or is it already parsed down into a Data Model path).

Data model, I assume. We can resolve non-datamodel links when serializing the DAG.

@mikeal
Copy link
Contributor

mikeal commented May 6, 2019

I assume we'd just replace the Cid kind with a more generalized Link or Pointer kind.

See, this sort of requires us to tear up everything we’ve build recently. We’ve already built quite a bit on top of the Data Model, so changing it now has much broader implications.

And if this path stretches multiple blocks then it’s a much better fit for the Schema layer anyway. We could create a “Link Type” at the Schema layer that is much more flexible and could accomplish this as well as open the door for extensions down the road for IPNS based links. There’s actually a long list of “more things we need to be able to do with links,” similar to our long list of collections people need, so it’s probably better to do what we did with Maps and split between the simple “kind” that is in the Data Model and the “type” which is extensible and at the Schema layer.

This would also make the implementations easier, since we would only need to implement these once in schemas rather than for every codec that supports the data model.

@Stebalien
Copy link
Contributor

See, this sort of requires us to tear up everything we’ve build recently. We’ve already built quite a bit on top of the Data Model, so changing it now has much broader implications.

It shouldn't make much of a difference at all. The issue is that "links" can currently only point to blocks. This issue is about allowing those links to point to nodes.

This already came up: https://github.com/ipld/specs/blob/master/REQUIREMENTS.md#linked.

@mikeal
Copy link
Contributor

mikeal commented May 6, 2019

Perhaps in your conceptual model this doesn’t break anything, but in the actual code we’ve written most things break. Just one example: every IPLD codec in JS returns instances of CID for links.

The number of assumptions in code that depends on current libraries that assume a “Link === CID” are also quite large. AFAIK every piece of code that even handles links makes this assumption.

Changing the link kind definition is a substantial breaking change, even just adding this as another kind in the data model is a substantial amount of new code we’d have to write throughout the stack.

By comparison, doing a Link Type in schemas can be done without breaking anything. Even if we had far more code written and dependent on schemas we would be able to add the type without breaking anything. And most importantly, we don’t need to get everything perfect in the first implementation because iterating in the Schema Layer is 100x easier than at the Data Model layer.

This already came up: https://github.com/ipld/specs/blob/master/REQUIREMENTS.md#linked.

Nobody is arguing that this should not be supported, we just don’t want to support it at the Data Model layer for all the reasons we’ve already mentioned.

@Stebalien
Copy link
Contributor

I'm aware this is hard, but this is absolutely critical. The alternative is:

  1. Introduce a Link type at the schema layer.
  2. Never use CIDs anywhere in schemas. Instead, always use this Link type.
  3. Never use the /ipld namespace, always use some path namespace that actually understands this link type and can traverse paths through it.
  4. Implement this everywhere.

But that seems like even more work.

@mikeal
Copy link
Contributor

mikeal commented May 29, 2019

I’ve got an early implementation of the multi-block type system which can support this. I should have a demo ready in a few days.

We expect most people to “live” in this layer anyway, so implementing this above the data model should be fine. The reality is, any non-trivial use cases require features we just don’t have with only the data model, so user facing abstractions will always be a layer up.

Never use the /ipld namespace

My assumption has always been that once we actually write a spec for this that it would be for Layer 2 paths. Not just for use cases like (Link + Path) but so that it can support HAMT and other multi-block collections. Given that /ipfs transparently resolves through HAMT’s I had assumed our goal was always to support the same in IPLD once we had a way to do it modularly.

@rvagg
Copy link
Member

rvagg commented Aug 14, 2019

Closing due to staleness as per team agreement to clean up the issue tracker a bit (ipld/team-mgmt#28). This doesn't mean this issue is off the table entirely, it's just not on the current active stack but may be revisited in the near future. If you feel there is something pertinent here, please speak up, reopen, or open a new issue. [/boilerplate]

@rvagg rvagg closed this as completed Aug 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants