Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Refresh] <2nd> Review of <Lendmi> #280

Open
LendMi opened this issue Jan 26, 2025 · 6 comments
Open

[DataCap Refresh] <2nd> Review of <Lendmi> #280

LendMi opened this issue Jan 26, 2025 · 6 comments
Assignees
Labels
Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC DataCap - Throttled Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@LendMi
Copy link

LendMi commented Jan 26, 2025

  1. Type of allocator: [manual]

  2. Paste your JSON number: [yes]

  3. Allocator verification: [1014]

  4. Allocator Application

  5. Compliance Report

  6. Previous reviews

Current allocation distribution

Client name DC granted
Column Sonar Data Archive on AWS 0.85 PiB
Distributed Archives for Neurophysiology Data Integration 1.65 PiB

I.Column Sonar Data Archive on AWS (LendMi-Finance/LendMi-Storage#21)

  • DC requested: 8 PiB
  • DC granted so far: 0.85 PiB

II. Dataset Completion
aws s3 ls --no-sign-request s3://noaa-wcsd-pds/

S3 Bucket: "noaa-wcsd-pds"
├── data
│ ├── processed
│ │ ├── SH1305
│ │ │ ├── 18kHz
│ │ │ │ ├── SaKe_2013-D20130522-T134850.csv
│ │ │ │ ├── SaKe_2013-D20130522-T140446_to_SaKe2013-D20130522-T145239.csv
│ │ │ │ ├── ...
│ │ │ ├── 38kHz
│ │ │ │ ├── ...
│ │ │ ├── 70kHz
│ │ │ │ ├── ...
│ │ │ ├── 120kHz
│ │ │ │ ├── ...
│ │ │ ├── 200kHz
│ │ │ │ ├── ...
│ │ │ ├── bottom
│ │ │ │ ├── SaKe_2013-D20130522-T134850.csv
│ │ │ │ ├── SaKe_2013-D20130522-T140446_to_SaKe2013-D20130522-T145239.csv
│ │ │ │ ├── ...
│ │ │ ├── multifrequency
│ │ │ │ ├── SaKe_2013-D20130522-T134850.csv
│ │ │ │ ├── SaKe_2013-D20130522-T140446_to_SaKe2013-D20130522-T145239.csv
│ │ │ │ ├── ...
│ │ │ ├── ...
│ │ ├── GU1002
│ │ │ ├── ...
│ │ ├── AL0502
│ │ │ ├── ...
│ │ ├── ...
│ ├── raw
│ │ ├── Bell_M_Shimada
│ │ │ ├── SH1305
│ │ │ │ ├── EK60
│ │ │ │ │ ├── SaKe_2013-D20130623-T063450.raw
│ │ │ │ │ ├── SaKe_2013-D20130623-T064452.raw
│ │ │ │ │ ├── SaKe_2013-D20130623-T064452.bot
│ │ │ │ │ ├── ...
│ │ ├── Gordon_Gunter
│ │ │ ├── GU1002
│ │ │ │ ├── EK60
│ │ │ │ │ ├── ...
│ │ ├── Albatross_IV
│ │ │ ├── AL0502
│ │ │ │ ├── EK60
│ │ │ │ │ ├── ...
│ │ │ │ ├── ...
│ │ ├── ...

III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
Yes(The client disclosed the SPs in advance and amended the application form.)

IV. How many replicas has the client declared vs how many been made so far:

8 vs 8

V. Please provide a list of SPs used for deals and their retrieval rates

SP ID % Retrieval Meet the >75% retrieval?
f02927642 81.75% YES
f02953218 93.04% YES
f01082888 6.84% NO
f03081958 18.55% NO
f02865213 93.31% YES
f01084413 38.68% NO
f01084941 3.70% NO
f02887063 90.41% YES

Allocation summary

  1. Notes from the Allocator
    We diligently conduct due diligence on customers and inquire about their technical solutions.
    We require all SPs to support Spark.
    We allocate 100T in the first round.
    We continued support only after reviewing the data the customer provided on Spark.

  2. Did the allocator report up to date any issues or discrepancies that occurred during the application processing?
    Yes, We focus on SP retrieval rates, the list of SPs provided and updated in the issue, whether SPs are using VPNs, and the distribution of SP data when supporting customers. We did not continue support until the customer corrected the errors.

  3. What steps have been taken to minimize unfair or risky practices in the allocation process?
    Sign 100 TiB in the first round.
    Focus on bot data.
    Each round of signing has a long time. We hope to observe customers to determine their trustworthiness, rather than providing immediate support.

  4. How did these distributions add value to the Filecoin ecosystem?
    We started looking for new datasets.

  5. Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application
    Yes

  6. Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.
    Yes

@filecoin-watchdog filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 27, 2025
@filecoin-watchdog
Copy link
Collaborator

filecoin-watchdog commented Jan 30, 2025

@LendMi

Distributed Archives for Neurophysiology Data - NEW

  • The client declared themselves as a Data Preparer but did not provide any information on data preparation. The allocator followed up with questions, but the client’s responses were unclear.

  • This dataset has already been stored multiple times: Search results.

  • The allocator’s application states that data distribution should include North America and Asia. Given that the client disclosed Moscow and London in their SP list, this should have been addressed.

  • The allocator actively engaged with the client, identifying discrepancies and requesting clarifications.

ColumnSonar Data Archive on AWS - NEW

  • This dataset has already been stored under different allocations:

  • The allocator’s application states that data distribution should include North America and Asia. Since the client disclosed a Russian SP, this should have been addressed.

  • 3 SPs have very low retrieval rates, which should be improved.

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Jan 30, 2025
@LendMi
Copy link
Author

LendMi commented Feb 1, 2025

Hello, thank you for your guidance. Before supporting any client, we always ask them the following three questions:

  1. Can you confirm that all your SPs support Spark retrieval?
  2. It looks like this is a public dataset—do you have any experience downloading and storing it?
  3. How do you process the data?

If a client’s response is too brief, we also ask for more data samples and clarify where they plan to start downloading the data. Among the two clients we have supported, one provided screenshots of the data download process, and the other provided command-line steps.

We pay close attention to SP retrieval rates. The average retrieval rate for the Distributed Archives for Neurophysiology Data Integration project is around 50%, and we are consistently urging them to improve it. For the Column Sonar Data Archive on AWS, the retrieval rate was around 40% during the first 100T allocation and the subsequent 256T allocation. Because their retrieval rate fluctuated significantly, we initially paused signing. However, they later promised not to work with low-retrieval SPs and to add new SPs with a 90% retrieval rate, after which we continued to support them at 512T. It appears they have kept their word, as the newly added SPs indeed have very high retrieval rates.

Regarding the point that “The allocator’s application states data distribution should include North America and Asia, and since the client disclosed a Russian SP, this should have been addressed,” we appreciate your careful observation. Russia is generally considered part of Europe, so we will request that the client adjust their statements accordingly.

Finally, you mentioned the issue of repeated datasets. We will work to resolve that. In our next allocation round, we will close these two LDNs and ask the client to source new datasets. Thank you for informing us about how to screen new datasets more effectively as allocators.

@filecoin-watchdog
Copy link
Collaborator

@LendMi Thank you for your response. Please remember that publicly available open data intended for community retrieval should also include an index as part of the process. This index should enable users to connect sealed data with the original dataset, allowing those who wish to use this backup for computing purposes to do so effectively. You can read more about it here: #125

@filecoin-watchdog filecoin-watchdog added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. labels Feb 3, 2025
@LendMi
Copy link
Author

LendMi commented Feb 5, 2025

@filecoin-watchdog Thank you for your guidance, you mentioned "This index should enable users to connect sealed data with the original dataset", which is the goal of the Filecoin community working together, we will strive to move towards this goal.
We performed a manual random search on SP. Thank you.

Image

Image

Image

Image

@galen-mcandrew
Copy link
Collaborator

Seeing better diligence than previous audit rounds, and looking forward to continued improvement. Summarizing some key areas to focus on:

  • Data preparation diligence, such as indexing standards
  • Accurate dataset size calculations, with minimal overhead or padding
  • Continue to reduce excessive duplicates across the network
  • Continue to increase geographic distribution to ensure data availability
  • Updating bookkeeping and records to maintain accuracy, such as updating SP lists
  • Increasing retrieval rates

We are requesting an additional 2.5PiB of DataCap for this pathway, and we expect to see improvement in supporting onboarding real client data.

@Kevin-FF-USA Kevin-FF-USA added Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC DataCap - Throttled and removed Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards labels Feb 13, 2025
@Kevin-FF-USA
Copy link
Collaborator

Kevin-FF-USA commented Feb 14, 2025

Hi @LendMi

Friendly update on this refresh.

We are currently in the process of moving to a Metaallocator. In order for the tooling to work correctly an allocator can only use the DataCap balance they received through direct allocation from Root Key Holders, >>> or the DataCap received through Metaallocator. As a result, some of the metrics pages like Datacapstats, Pulse and other graphs might be a little confused during this update.

You will not lose any of the DataCap, but you will see that your refresh is amount of DC from refresh + remaining DC an allocator has left.

No action needed on your part, just a friendly note to thank you for your contributions and patience, and you may notice changes in your DataCap balance while the back end is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting RKH Refresh request has been verified by Public, Watchdog, and Governance - now awaiting release of DC DataCap - Throttled Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

4 participants