Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refill request for meta pathway allocator KFDA #171

Closed
kernelogic opened this issue Sep 22, 2024 · 13 comments
Closed

Refill request for meta pathway allocator KFDA #171

kernelogic opened this issue Sep 22, 2024 · 13 comments
Assignees
Labels
DataCap - Refreshed Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@kernelogic
Copy link

kernelogic commented Sep 22, 2024

Application: https://github.com/filecoin-project/Allocator-Registry/blob/main/Allocators/990.json

Allocation report: https://kfda.filweb3.com/report

Allocator address: f1yvo2nutvy6a4ortrvv2tguhpsi64a7fgrc42owy

Request amount: 20 PiB

Hello, this is the first refill request from me. As a meta pathway automatic allocator, I have built my own report page to show case what's been distributed.

I am still in process of business development so the client base growth is still in progress.

@Kevin-FF-USA Kevin-FF-USA self-assigned this Sep 23, 2024
@Kevin-FF-USA Kevin-FF-USA added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Sep 23, 2024
@galen-mcandrew
Copy link
Collaborator

@kernelogic Hello, thanks for starting this compliance report.

From a very initial investigation, we will need to see some more details. We need to see additional evidence that the DataCap you allocated met your own "automated" pathway requirements. Specifically, you stated these things in your allocator application:

  • "Upon registration with KFDA, clients are required to verify their email address (social login is also an option)."
  • "Clients then submit a dataset onboarding request, which can be an AWS open dataset or their private dataset. Provided KFDA has access to the metadata (file list and size), it will conduct automated content verification throughout the lifecycle of the DC distribution."
  • "Clients who passed checks need to pay Fil to get DC allocation"
  • "KFDA is open to any clients who wish to onboard data that is publicly retrievable and verifiable."
  • "KFDA has the ability to develop its own CID checker bot. "
  • "Allocation Tranche Schedule to clients:"
  • "Lassie to verify data content."
  • "KFDA is developing a comprehensive multi-tenant web platform, with a prototype already in place. "
  • "KFDA plans to set up servers in multiple regions to conduct retrieval sampling."

The attached allocation report is a good first draft, but does not have sufficient details. For example, there is no bookkeeping or on-chain supporting details. Where is the allocation history, the evidence of data sampling and metadata compliance, the evidence of fee-based mechanisms, the deal-making distribution details, and the proof of retrievability?

Looking forward to developing some great, efficient methods to show compliance in these non-manual pathways.

@kernelogic
Copy link
Author

kernelogic commented Sep 30, 2024

@galen-mcandrew Thank you very much for reviewing the report page and provide the feedback.

I should point out that the "comprehensive multi-tenant web platform" has been running since launch which is located at https://kfda.filweb3.com/

The report page https://kfda.filweb3.com/report is constantly evolving with more data and more clients when available.

I have added the following information to the report page since last iteration:

  1. Client verified email
  2. Fil paid for the allocation
  3. Allocation grant transaction record
  4. Retrieval sampling record
  5. Payment address show case the monetization record
    image

@Kevin-FF-USA Kevin-FF-USA added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Oct 1, 2024
@Kevin-FF-USA
Copy link
Collaborator

Hi @kernelogic ,
Thanks for your patience on this. Awaiting the metrics review with RKH before making the refresh distribtuions. Wanted to let you know you will be receiving follow up direction by Thursday for the refresh. I'll keep the SLACK thread and this Issue updated for easier tracking for you.

@filecoin-watchdog
Copy link
Collaborator

@kernelogic I have a few observations that I wanted to share with you and @galen-mcandrew
What your report provides:

  • DataSet Name
  • Data Owner Name
  • Data Owner Country
  • Data Owner Continent
  • Data Set Industry
  • Website
  • Client Email
  • Client Address
  • Amount in TiB
  • Fil per TiB
  • Fil Total
  • Payment Address
  • Grant Tx
  • Deals by Provider
    In each of those they provide Deal Audit report with the following:
    • Audited On
    • Success
    • Deal ID
    • Client
    • Provider
    • Data CID
    • Command
    • Standard Output

However there are a few discrepancies with your application and also just concerns I have with the information provided

  • "Upon registration with KFDA, clients are required to verify their email address (social login is also an option)."
    • This is provided in the report but it is the same email for all clients
  • "Clients then submit a dataset onboarding request, which can be an AWS open dataset or their private dataset. Provided KFDA has access to the metadata (file list and size), it will conduct automated content verification throughout the lifecycle of the DC distribution."
    • Again, yes you provide that in the report, but it is the same dataset over and over again
  • "KFDA is open to any clients who wish to onboard data that is publicly retrievable and verifiable."
    • While the data is open the retrievability scores for SPs are extremely low. Moreover even though an SP has 0 retrievability success they are being used over and over.
  • Further, on the SP point as well:
  • The same SP ID appears in several deals, but each time leads to a different page eg.
  • "KFDA has the ability to develop its own CID checker bot. "
    • where can the code for that be found?
  • "Allocation Tranche Schedule to clients:" you put in your application:
    • First: 100 TiB minimal
    • Second: 300 TiB maximal
    • Third: 1 PiB maximal
    • Fourth: 2 PiB maximal
    • Max per client overall: No limit as long as the metadata fits replication goal.
    • All the allocations you gave all were 600 TiB
  • "KFDA is developing a comprehensive multi-tenant web platform, with a prototype already in place. "
    • Where is that and can it be reviewed?
  • "KFDA plans to set up servers in multiple regions to conduct retrieval sampling."
    • All of the SPs you are using are in Hong Kong.
  • "Clients will be able to view every decision making logs in KFDA platform. Upon request, decision making logs can also be exported and provided to Fil+ team."

Appreciate your time and thank you in advance for answering my questions!

@kernelogic
Copy link
Author

kernelogic commented Oct 23, 2024

  1. "Upon registration with KFDA, clients are required to verify their email address (social login is also an option)."
    This is provided in the report but it is the same email for all clients
    Answer: each row in the report represents one tranche allocated to a client. Due to the limit of initial 5PiB allocation and development time constraints I was only able to develop 1 client. That's why you are seeing all the rows are from the same email.

  2. "Clients then submit a dataset onboarding request, which can be an AWS open dataset or their private dataset. Provided KFDA has access to the metadata (file list and size), it will conduct automated content verification throughout the lifecycle of the DC distribution."
    Again, yes you provide that in the report, but it is the same dataset over and over again
    Answer: not the same dataset. There are 3 datasets being stored so far: Hubble Space Telescope, NREL Wind Integration National Dataset, World Bank - Light Every Night. You are seeing multiple rows with the same dataset because they are multiple tranches of the same dataset

  3. "KFDA is open to any clients who wish to onboard data that is publicly retrievable and verifiable."
    While the data is open the retrievability scores for SPs are extremely low. Moreover even though an SP has 0 retrievability success they are being used over and over.
    Answer: it is well known boost retrieval is very fragile (team even stated boost software is not production ready) and requires constant maintenance effort to provide retrieval. It is in my pathway's principle not to discriminate SPs due to retrieval success. (only encourage successful retrieval by giving discount.)

  4. "The same SP ID appears in several deals, but each time leads to a different page "
    Answer: expected. Each tranche has a different client address to track deals.

  5. "KFDA has the ability to develop its own CID checker bot. "
    where can the code for that be found?
    Answer: the backend CID checker code is developed but not open sourced. If you require to see it I can send you screenshots.

  6. "All the allocations you gave all were 600 TiB"
    Answer: platform is under active development. The incremental amount per tranche feature is planned to develop in the next phase.

  7. "KFDA is developing a comprehensive multi-tenant web platform, with a prototype already in place. "
    Where is that and can it be reviewed?
    Answer: https://kfda.filweb3.com/

  8. "KFDA plans to set up servers in multiple regions to conduct retrieval sampling."
    All of the SPs you are using are in Hong Kong.
    Answer: It is in my pathway's principle not to discriminate SPs due to geological location. (only encourage geological distribution by giving discount.)

  9. "Clients will be able to view every decision making logs in KFDA platform. Upon request, decision making logs can also be exported and provided to Fil+ team."
    Answer: Currently the retrieval checker logs is publicly available on the report page, by clicking on the SP ID. Other decision logs will be available when discount system is fully developed.

To emphize, I developed this automatic allocation meta pathway in hope of revolutionize DC allocation, rather than reinventing the wheel to obey all existing criterias with yet another UI.

@willscott
Copy link
Collaborator

flagging that the report page doesn't appear to be acting like a 'log' currently.

e.g. new allocations are showing up in between previous rows, not only at the top / bottom of the list. This is going to make it hard to do any sort of compliance auditing / linking this data with your onchain activity.

While that report isn't open source, it's pretty hard to really trust what's on that report, or to have confidence in kyc / client validation.

@kernelogic
Copy link
Author

As a quick response, I have changed the sorting order on the report page to be timestamp descending order, as well as added the timestamp column to the front to show when the allocation happened.

@Kevin-FF-USA
Copy link
Collaborator

Hi @kernelogic,
This application was the first pathway refresh of its type, wanted to thank you for the patience while we collected information and set a baseline for how these can be tracked and audited going forward.

Wanted to keep you updated on progress. Have this on the top for the final review of Governance before a RKH decision. Thanks for all the information and you should see this issue update shortly.

@kernelogic
Copy link
Author

Thanks for letting me know. I will keep waiting.

@filecoin-watchdog
Copy link
Collaborator

@kernelogic, I've noticed that we no longer have access to your reports. Could you share that again?

@kernelogic
Copy link
Author

@filecoin-watchdog it is back online now. Last night there was a power outage in Vancouver due to heavy rain.

@galen-mcandrew
Copy link
Collaborator

Appreciate the patience and responses here. Like we've all noted, this is different from the other manual pathways, and takes longer to investigate.

That said, I would love to collaborate going forwards on ways to streamline the compliance audit here. Hopefully we can work to develop the reporting log so that we can see a faster aggregate and verification of compliance. For example, linking to the client addresses still requires manual verification to see the FIL payments as historical transactions. Is there another way to pull these message IDs into a report, and could that report be open-sourced to work for others taking deal payments as proof of diligence/compliance? This is just one area where I think we could improve, I'm open to discussing other options.

Given the work being done here around market-based onboarding tools, we would like to request an additional 10PiB of DataCap.

@kernelogic
Copy link
Author

Thanks Galen, with additional DC granted I will be able to resume development of the tool for better transparency and easier verification

@Kevin-FF-USA Kevin-FF-USA added DataCap - Refreshed and removed Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards labels Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataCap - Refreshed Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

5 participants