Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Application] <Byte Tunneling> - <ByteTunneling_data_store_bc_fil_02> #58

Open
2 tasks done
Dengminer opened this issue Nov 11, 2024 · 20 comments
Open
2 tasks done
Assignees

Comments

@Dengminer
Copy link

Dengminer commented Nov 11, 2024

Version

1

DataCap Applicant

Byte Tunneling

Project ID

ByteTunneling_data_store_bc_fil_02

Data Owner Name

U.S. National Library of Medicine

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://www.nlm.nih.gov/

Social Media Handle

https://x.com/NLM_NIH

Social Media Type

Twitter

What is your role related to the dataset

Data Preparer

Total amount of DataCap being requested

14 PiB

Expected size of single dataset (one copy)

2 PiB

Number of replicas to store

7

Weekly allocation of DataCap requested

1000 TiB

On-chain address for first allocation

f1tuasyd6axcf3e23kajz5fw3pfsspu7fzmyuayba

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

  • Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

Byte Tunneling was founded in 2020 with a mission to facilitate open scientific research by preparing and managing large-scale datasets. With a background in handling high-energy physics data, Byte Tunneling has been trying to collaborate with global research institutions to prepare and organize public data.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

The NCBI (National Center for Biotechnology Information) dataset includes a vast collection of biological and genetic data, providing resources for research in genomics, proteomics, and bioinformatics. It is widely used for studying various aspects of biology and health.

Where was the data currently stored in this dataset sourced from

Other

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (Country/Region)

Singapore

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

A partner we met at a meetup provided tools that were re-engineered using components from IPFS.

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

No.

Please share a sample of the data

https://ftp.ncbi.nih.gov/biosample/biosample_set.xml.gz

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • I confirm

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

Permanently

In which geographies do you plan on making storage deals

Asia other than Greater China

How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

How did you find your storage providers

Partners

If you answered "Others" in the previous question, what is the tool or platform you used

No response

Please list the provider IDs and location of the storage providers you will be working with.

f01422327, Japan
f02252024, Japan
f02252023, Japan
f01111110, Vietnam
f01909705, Vietnam
f03215853, US
f03218576, US

How do you plan to make deals to your storage providers

Boost client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

Copy link
Contributor

datacap-bot bot commented Nov 11, 2024

Application is waiting for allocator review

@TOPPOOL-LEE
Copy link
Owner

TOPPOOL-LEE commented Dec 13, 2024

We will focus on supporting new datasets.WX20241213-145303@2x

@TOPPOOL-LEE
Copy link
Owner

How do you prepare data? Do all SPs support Spark?
Are you ready to pledge? Do you need our help?

@TOPPOOL-LEE
Copy link
Owner

You wrote 4 backups when filling out the application, but filled in 6 SPs. Can you confirm the SP for cooperation again?

@TOPPOOL-LEE TOPPOOL-LEE self-assigned this Dec 13, 2024
@Dengminer
Copy link
Author

Hi thank you for your review! When preparing the data we have a series of well developed tools for this and we have been using this set of tools for quite some time, which have been proven to be efficient.

And yes, all our SPs will support SPARK, and we can also help them with possible technical issues with SPARK.

And yes again, we are ready to pledge and seal very soon.

In terms of the number of SPs, we are listing extras to be safe, and we will balance the deals sent to each SP, to make sure every piece of the data be stored by 4 different SPs.

Thank you again for your review and please let me know if you have any further questions!

@TOPPOOL-LEE
Copy link
Owner

It seems that the total amount of data you requested is too much.It is recommended that you adjust it down to a reasonable number, such as 10P.

@TOPPOOL-LEE
Copy link
Owner

Unless you can prove that https://www.nlm.nih.gov/ has 5PiB of raw data

@TOPPOOL-LEE
Copy link
Owner

I see your changes

Copy link
Contributor

datacap-bot bot commented Dec 13, 2024

Datacap Request Trigger

Total DataCap requested

14 PiB

Expected weekly DataCap usage rate

1000 TiB

DataCap Amount - First Tranche

256TiB

Client address

f1tuasyd6axcf3e23kajz5fw3pfsspu7fzmyuayba

Copy link
Contributor

datacap-bot bot commented Dec 13, 2024

DataCap Allocation requested

Multisig Notary address

Client address

f1tuasyd6axcf3e23kajz5fw3pfsspu7fzmyuayba

DataCap allocation requested

256TiB

Id

cfd9808b-731e-46ae-9395-5b289cf7053f

Copy link
Contributor

datacap-bot bot commented Dec 13, 2024

Application is ready to sign

Copy link
Contributor

datacap-bot bot commented Dec 13, 2024

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedyut2qldcdtph3bl2nc6ja7erq6ki4xkykomygvmpfngwigbekvg

Address

f1tuasyd6axcf3e23kajz5fw3pfsspu7fzmyuayba

Datacap Allocated

256TiB

Signer Address

f1grjkkw3p5hw3vx5gonvppkkzpcgmu4xnwfm7sli

Id

cfd9808b-731e-46ae-9395-5b289cf7053f

You can check the status here https://filfox.info/en/message/bafy2bzacedyut2qldcdtph3bl2nc6ja7erq6ki4xkykomygvmpfngwigbekvg

Copy link
Contributor

datacap-bot bot commented Dec 13, 2024

Application is Granted

Copy link
Contributor

datacap-bot bot commented Dec 16, 2024

Client used 75% of the allocated DataCap. Consider allocating next tranche.

@TOPPOOL-LEE
Copy link
Owner

checker:manualTrigger

Copy link
Contributor

datacap-bot bot commented Dec 17, 2024

DataCap and CID Checker Report1

No active deals found for this client.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

@TOPPOOL-LEE
Copy link
Owner

checker:manualTrigger

Copy link
Contributor

datacap-bot bot commented Dec 21, 2024

DataCap and CID Checker Report Summary1

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

⚠️ 80.00% of Storage Providers have retrieval success rate less than 75%.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@filecoin-watchdog
Copy link

checker:manualTrigger

Copy link
Contributor

datacap-bot bot commented Jan 3, 2025

DataCap and CID Checker Report Summary1

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

⚠️ 80.00% of Storage Providers have retrieval success rate less than 75%.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants