-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataCap Refresh] <4th> Review of <TOPPOOL> #255
Comments
Hi @TOPPOOL-LEE, |
Hello, @Kevin-FF-USA |
U.S. National Library of Medicine
|
@TOPPOOL-LEE |
@filecoin-watchdog hey, thank you for your patient comments and kind guidance. Pangeo In the previous round, we allocated a total of 0.75 PiB for this client. They collaborated with four SPs distributed across two continents. Before the allocation of 1 PiB in this round, we reminded them that they needed to increase their SPs. U.S. National Library of Medicine We are looking for new datasets, rejecting duplicate data storage and reducing redundancy. We check for duplicate datasets by entering the dataset names at https://allocator.tech. If this method is incorrect, how should we properly assess it? The size of the data case and the size of the data stored on the website are two different concepts. the client's data case was submitted using XML files, and due to limitations of the SQL server management system, the capacity of the XML data can only reach around 3GB.However, when you unzip it, you will find that there are 109G. we also looked at the website https://www.nlm.nih.gov, where you can find more information at https://support.nlm.nih.gov/kbArticle/?pn=KA-04293, which hosts a large amount of data. You mentioned the geographical location of the SPs. we only allocated 256 TiB to them. If you allow us to continue allocating for them, I believe we can see more SPs from other continents joining. If you advise us not to allocate for them anymore, we will suspend support. Stanford University The dataset you mentioned comes from https://www.crcv.ucf.edu/data/UCF101.php.so good.we were already aware of it before approval. Since the client provided 13,320 items in their data case, and the UCF101.php website also shows exactly 13,320 items. The data on the UCF101.php site is provided by multiple universities and research institutions, and the UCF101 dataset contains more than two million frames. Before we approved, we asked them about the data source and size and requested they reduce the total amount of data applied for. They disclosed seven SPs, in the latest round, nine SPs participated, as the client added new SPs after the last round of signatures. Please trust that we will inquire about and intervene in the client's actions before supporting them in the next round. Cell Painting Gallery SP (f02984331) is based in Singapore, but the client's disclosed location is in Australia. We have reminded the client about this. Thank you for your reminder—it has been corrected. In fact, we did not intend to support this client; however, in the last round, we promised to support our existing clients. but our allocator suddenly encountered a bot bug (the issue of duplicate data), causing all old clients' bots to report duplicate data, so we could not support them. Meanwhile, to verify whether the duplicate data was a client technical solution bug or a bot bug, we actively sought new clients and found four different technical solution clients, confirming that it was indeed a bot bug. Therefore, in this round, we must support our existing clients, so we continued to support this client.Now, We had shut it down. |
Yes, they are the joshua-ne client. Due to the bot bug, we continuously sought new clients and technical solutions for verification, We found four clients, one of which is joshua-ne client. Please check the content:#233 (comment) We had disclosed it. If we have collaborated with other allocators, we will disclose and admit it honestly. We will ensure that we are open, sincere and brave. |
In this round, we followed the advice of the governance team:
|
@TOPPOOL-LEE
I am providing my evaluation as a community member. I don’t have the authority to forbid anything, but I’m glad you value my guidance enough to consider incorporating it. 😊
That’s correct; however, the dataset is only 6GB. What is being stored beyond that? Were you able to download any portion of the data to confirm its content?
Did they respond to your request? You still haven’t clarified whether it is legal to store this dataset. Did the client provide any evidence that they have consent to use this data? Cell Painting Gallery
I am not accusing you of anything. My concern is whether this client is compliant with the rules and whether you are aware of the potential parallel cooperation between your client and joshua-ne. |
Thank you for your comprehensive guidance. U.S. National Library of Medicine Stanford University We have contacted the client to provide more content; if not, we can reject. Cell Painting Gallery Thanks for your suggestions, we need to improve in these three areas:
|
@TOPPOOL-LEE
|
Well, in response to your concerns, we downloaded the data again.Considering that there is too much content, we use links to display.You can open any link to check. |
@TOPPOOL-LEE I’m glad you were able to retrieve the data! Would it be possible for you to share a method that others could use to access it as well? As it stands, it seems that only you can retrieve it. What you’ve shared aligns with the sample provided by the client, but it still doesn’t explain why the client required 2 PiB to accommodate the dataset size. That said, I believe we’ve explored this topic thoroughly. From our discussion, it seems you might not have insight into why the client needed such a large amount of data. |
We have made a list that anyone can see it. You can open the text, copy the link、open、 view the content |
Basic info
Type of allocator: [manual]
Paste your JSON number: [https://github.com/v5 Notary Allocator Application: TOP POOL notary-governance#1046]
Allocator verification: [yes]
Allocator Application
Compliance Report
Previous reviews
Current allocation distribution
I. Cell Painting Gallery
II. Dataset Completion
<(https://registry.opendata.aws/cellpainting-gallery/)>
III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
<Initial:
f02984331 --Singapore
f02883857 -- Singapore
f02852273 -- United Kingdom
f02973061 -- Russia
f02889193---Vietnam
Final:
f02984331 --Singapore
f02883857 -- Singapore
f02852273 -- United Kingdom
f02973061 -- Russia
f02889193---Vietnam
f01996719---CN >
IV. How many replicas has the client declared vs how many been made so far:
<The client added 1 SP and disclosed it. This better met the requirements of the three continents.>
6 vs 7
V. Please provide a list of SPs used for deals and their retrieval rates
We are very concerned about SP's retrieval and have reminded them several times.
We approved this time only after they passed our random testing of sectors.
TOPPOOL-LEE/Allocator-Pathway-TOP-POOL#42 (comment)
TOPPOOL-LEE/Allocator-Pathway-TOP-POOL#42 (comment)
I. | Stanford University
II. Dataset Completion
<(http://storage.googleapis.com/thumos14_files/UCF101_videos.zip)>
III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
<Initial:
f01422327, Japan
f02252024, Japan
f02252023, Japan
f01111110, Vietnam
f01909705, Vietnam
f03232064, Malaysia
f03232134, Malaysia
Final:
f01422327, Japan
f02252024, Japan
f02252023, Japan
f01111110, Vietnam
f01909705, Vietnam
f03232064, Malaysia
f03232134, Malaysia >
IV. How many replicas has the client declared vs how many been made so far:
Although the client filled in 4 backups when filling out the application form, The SPs that the client cooperated with are the same as the disclosed 7 SPs.
We have reminded they twice to pay attention to modifying the backup numbers, and we will continue to urge.
7 vs 7
V. Please provide a list of SPs used for deals and their retrieval rates
I. Pangeo Community
II. Dataset Completion
<aws s3 ls --no-sign-request s3://cmip6-pds/
aws s3 ls --no-sign-request s3://esgf-world/>
III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
<Initial:
f03231154, HongKong
f03228906, China
f03157910, Shenzhen, China
f03157905, Shenzhen, China
f03218576 US
f03215853 US
f01025366, Qingdao, China
f0122215,Qingdao, China
Final:
f03231154 HongKong
f03218576 US
f03215853 US
f03157905 Shenzhen, China
We have made recommendations on the retrieval rate of SPs and the cooperation of SPs.
TOPPOOL-LEE/Allocator-Pathway-TOP-POOL#53 (comment)
IV. How many replicas has the client declared vs how many been made so far:
8 vs 4
V. Please provide a list of SPs used for deals and their retrieval rates
I. U.S. National Library of Medicine
II. Dataset Completion
https://ftp.ncbi.nih.gov/biosample/biosample_set.xml.gz
III. Does the list of SPs provided and updated in the issue match the list of SPs used for deals?
IV. How many replicas has the client declared vs how many been made so far:
V. Please provide a list of SPs used for deals and their retrieval rates
First round, no cidbot data
Allocation summary
Notes from the Allocator
<We asked Sloan Digital Sky Survey why they use VPNs and closed it.
We started supporting new datasets [DataCap Application] <Byte Tunneling> - <ByteTunneling_data_store_bc_fil_02> TOPPOOL-LEE/Allocator-Pathway-TOP-POOL#58 and rejected repeated data set applicants [DataCap Application] Digital Earth Africa TOPPOOL-LEE/Allocator-Pathway-TOP-POOL#67
Of the four clients in this round, there are two new dataset.>
Did the allocator report up to date any issues or discrepancies that occurred during the application processing?
<! Yes, we always pay attention to clients' applications and operations, and the comments of the governance team. We always work according to the suggestions of the governance team.
When we face clients, we can always find changes in clients' operations in the first time, including changes in retrieval rate, errors caused by network upgrades, etc. We always urge them to continue to improve.
When bot has a bug, we will report it to the technical team in time.
When we suspect that a client's technical solution has a loophole, we will compare it with the technical solutions of multiple clients.
When we are not sure whether the spark retrieval system has not been updated, we will manually search sectors randomly.>
What steps have been taken to minimize unfair or risky practices in the allocation process?
<1. When we find that the client is not trustworthy, we will reject them directly.
<2. When we are not sure whether the client is trustworthy, we will slow down, so that the number of quotas for their support is relatively small.
<3. We pay close attention to the bot data. When the bot does not appear, we will choose to wait for the bot to appear. We will continue to support only if the cid report is intact.>
How did these distributions add value to the Filecoin ecosystem?
<We started looking for new datasets. Now, we have two new datasets. we will continue.>
Please confirm that you have maintained the standards set forward in your application for each disbursement issued to clients and that you understand the Fil+ guidelines set forward in your application
< Yes>
Please confirm that you understand that by submitting this Github request, you will receive a diligence review that will require you to return to this issue to provide updates.
< Yes>
The text was updated successfully, but these errors were encountered: