Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

ipfs.files.get never calls the callback for files that aren't in the network #1314

Closed
justinmchase opened this issue Apr 17, 2018 · 10 comments
Closed

Comments

@justinmchase
Copy link

justinmchase commented Apr 17, 2018

  • Version:
    js-ipfs version: 0.28.2-
    Repo version: 6
    System version: x64/linux
    Node.js version: v8.9.4
  • Platform:
    Linux justin-pc 4.4.0-17643-Microsoft Read file error on Windows #1000-Microsoft Thu Apr 05 15:09:00 PST 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem:
    Unknown

Type:

Question

Severity:

Medium

Description:

I am just doing some testing and I went here to find some test images on the network:
https://www.reddit.com/domain/ipfs.pics

But using the hashes of these images using the api ipfs.files.get is never calling my callback. In other cases where the images do exist I successfully get images back.

Question:

What is the average / max times it could take to find a file in the network? Will the call to ipfs.files.get timeout eventually? If so how can I set the timeout value and what is the default?

Steps to reproduce the error:

This Never Calls the Callback

const cid = 'QmNdi3iaWYDoz91Szu6M4zH7r9Qom7bCrb2YyhLvjcp8TH'
ipfs.files.get(cid, (err, files) => {
    if (err) return console.log(err)
    console.log(files)
})

This Succeeds Pretty Quickly

const cid = 'QmQ2r6iMNpky5f1m4cnm3Yqw8VSvjuKpTcK1X7dBR1LkJF'
ipfs.files.get(cid, (err, files) => {
    if (err) return console.log(err)
    console.log(files) // two files, folder and cat.gif
})
@victorb
Copy link
Member

victorb commented Apr 18, 2018

This is part of a larger issue of how to do timeouts in js-land. Our general view is that timeouts should be defined by the caller and either have 0 as a default (meaning, never timeout) or a very large value (couple of hours?) as it really depends on the users network and intent. Sometimes it's valuable to be able to just let it load and get the content once it's there, sometimes you want fallbacks if it doesn't work.

In any way, the solutions are discussed in a issue on interface-ipfs-core, so let's continue there: https://github.com/ipfs/interface-ipfs-core/issues/58

Expressing your use-case and needs there would be valuable!

@victorb victorb closed this as completed Apr 18, 2018
@victorb
Copy link
Member

victorb commented Apr 18, 2018

Actually, something you might be able to do in the meantime is to call findprovs to figure out how many peers are currently saying they have the content. If below X number of peers, assume that the content is non-existing currently, and don't do the get call at all.

@justinmchase
Copy link
Author

justinmchase commented Apr 18, 2018

@victorbjelkholm But how long will that call take? Will that also need a timeout? It seems like its a non-deterministic request and there may be no way in reality to know the difference between a simply slow response and a null response due to the data not existing.

Is the findprovs operation more deterministic than get? How would you know when its done and no peer has the file?

@mitra42
Copy link

mitra42 commented Apr 19, 2018

I've been doing a Promise.race with a timeout on all IPFS calls because of this issue of not being able to distinguish a slow response from a bad CID - made particularly bad by the tendency of "good" CIDs not to be available in JS land because of lack of DHT integration. It would be great to see the timeout as a optional parameter to the calls because at the moment the timeout triggers fairly often on large files. (ie. if I put a 15 second timeout on a call, then a large file on a slow dialup could trigger a timeout when I only really want to timeout if the file cant be found anywhere).

@justinmchase
Copy link
Author

@mitra42 can you say more about the DHT integration problem? I am not familiar exactly, though it sounds like you're saying it's trying to download the content of files rather than just give you back info about them.

I have been using the getReadStream and then as files come in I am desteoying the content steam of each file. I'm not sure if that does any good though.

But I agree with what you said, I was thinking of adding my own timeout the problem is that unless there is a way to about the call it is still running in the background and will eventually callback and you have to do some work to ignore the result at that point.

@mitra42
Copy link

mitra42 commented Apr 20, 2018

DHT isnt integrated on JS-IPFS, which means that in JS-IPFS can only find files that are on peers connected to the same nodes. I'm told that once websocketstar-relay is implemented this will be solved. For example this means if we add a file on our IPFS instance at archive.org and then try and access it on browsers which are connected to the main WSS relay (which is at ipfs.io) then it won't find them. If we access the file via https://ipfs.io then the WSS can now find them (and files.cat returns immediately).

The main symptom we see is exactly the one you report, that the call to files.cat never returns.

In our implementation we use Promise.race (which takes care of the eventual callback problem you mention) and then when the files.cat call fails we fallback to trying to "fetch" from https://ipfs.io also with a timeout since that call never returns. If that fails, we fallback to getting direct with HTTP from archive.org.

Unfortunately this has the problem that we are timing out files we could probably retrieve since files.cat doesn't return within 5 seconds.

Hopefully websocket-star-relay will allow us to work around the various issues that mean that the files we add are invisible to the whole network, until then there appears to be no way to setup JS-IPFS clients so they can find any validly added IPFS file, so this timeout - along with coding for some fallback way to find files, appears to be a necessary interim workaround.

@justinmchase
Copy link
Author

Is there a issue or PR that you know of for that work, so I can follow it more closely? Thanks a lot for the explanation.

@mitra42
Copy link

mitra42 commented Apr 20, 2018

Not sure which work you are refering to:

@justinmchase
Copy link
Author

Does this mean that files I am hosting via my instance can't be found from the outside also? I uploaded a file to my server but when I try to get it by has from https://gateway.ipfs.io/ipfs/ it fails to find the file. That is also related to the DHT issue you mentioned @mitra42 ?

@mitra42
Copy link

mitra42 commented May 12, 2018

@justinmchase I don't know if that is the same issue, I do know that when the JS-IPFS is used, that there are a number of situations in which files you add in golang are not available to the browser, and files added in the browser are not available in golang. The problems that cause this include: Websocket star relaying, DHT; failure of one go node to announce etc, but I'm still not clear which cases we see are bugs in IPFS (that are being worked on), which are bugs that can be worked around, and which are faults in either our code or specific configuration.

This is the wrong issue to discuss those problems, however, when they occur the lack of a timeout/faiure in files.get means that you should probably wrap the code in a promise-timeout, and then fallback to fetching from https://ipfs.io/ipfs or another http url if you have one.

I don't quite understand how files.get works internally, but if it was possible then I think the ideal solution would be to allow a flag/hint to files.get that tells it stop when the likelihood of finding the file gets low ... i.e. I like to be able to tell the underlying code that I don't mind if a file takes 30 seconds or longer to fetch, if the underlying library is making progress in finding blocks, but if its not finding any more blocks then I want it to fail so that I can for example warn the user, or use a fallback URL.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants