Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

[DataCap Application] FogMeta Lab - store NEAR Network snapshots #1137

Closed
hengdingy opened this issue Oct 24, 2022 · 23 comments
Closed

[DataCap Application] FogMeta Lab - store NEAR Network snapshots #1137

hengdingy opened this issue Oct 24, 2022 · 23 comments

Comments

@hengdingy
Copy link

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

  • Organization Name: FogMeta Lab
  • Website / Social Media: https://fogmeta.com, http://rebuilder.fogmeta.com/
  • Total amount of DataCap being requested (between 500 TiB and 5 PiB): 3 PiB
  • Weekly allocation of DataCap requested (usually between 1-100TiB): 100 TiB
  • On-chain address for first allocation: f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

FogMeta Lab's research spans multiple levels from system technology, infrastructure, and middleware to services and solutions, and involves future systems, network technology and business, distributed systems and management, information management, and interactive and innovative services. Based on the views on and practices in the industry, FogMeta also solves the problem of business complexity through operations optimization and other technologies.
'filecoin-ipfs-data-rebuilder' is a project of FogMeta, a data build-and-rebuild tool between the IPFS network and the Filecoin network. Rebuilder ensures a permanent storage of at least a cold & hot backup and makes data retrievable at any time.

What is the primary source of funding for this project?

FogMeta Lab.

What other projects/ecosystem stakeholders is this project associated with?

No.

Use-case details

Describe the data being stored onto Filecoin

Full node snapshots of the NEAR Mainnet and Testnet updated every 12 hours.

Where was the data in this dataset sourced from?

The snapshots are maintained by the NEAR team and are updated about every 12 hours. Please refer to the link here: https://near-nodes.io/intro/node-data-snapshots
Moreover, we're also running NEAR nodes and will export a snapshot twice a month.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

RPC testnet
s3://near-protocol-public/backups/testnet/rpc/latest
RPC mainnet
s3://near-protocol-public/backups/mainnet/rpc/latest
Archival testnet
s3://near-protocol-public/backups/testnet/archive/latest
Archival mainnet
s3://near-protocol-public/backups/mainnet/archive/latest

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, it's a public dataset.

What is the expected retrieval frequency for this data?

3 to 5 times a month.

For how long do you plan to keep this dataset stored on Filecoin?

At least 500 days.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Preferably in all continents.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

The data will be sent to storage providers, and be uploaded to the web server or IPFS for storage providers to download.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We will use the FilSwan platform to distribute these data. The Market Matcher, a module of the platform, will choose the most suitable storage providers for us automatically and make sure that the data can be retrieved in the future.

How will you be distributing deals across storage providers?

We are using FilSwan client agent for batch sending deals.  https://github.com/filswan/swan-client
FilSwan has a reputation module called Swan reputation system (https://docs.filswan.com/filswan-platform/overview/reputation-system) to give storage providers scores for the data storage behavior, it based on Time-based Reachability + Regional Weighted Adjusted Power + General Deals and Verified-Storage Provider Deals

FilSwan Auction System will match the storage providers based on reputation and coditions. For the bidding policy you can find it here: https://docs.filswan.com/filswan-platform/overview/filswan-auction-system.

Some of the storage providers are as follows:
f0143858
f03624
f010088
f02301
f0187709
f01402814
f01859603
f01133080
f01858429
f01398391
f01072221
f0240185
f01390330
f01784458
f01840390
f01870135
f0520660
f01871352
f01883179
f01886797

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes. FogMeta Lab will fund the project.
@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@hengdingy hengdingy changed the title [DataCap Application] FogMeta Lab - store Near Protocol snapshots [DataCap Application] FogMeta Lab - store NEAR Network snapshots Oct 24, 2022
@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@kernelogic
Copy link

It says it's updated every 12 hours, are you planning to store it incrementally or a complete snapshot every once a while?

@hengdingy
Copy link
Author

It says it's updated every 12 hours, are you planning to store it incrementally or a complete snapshot every once a while?

@kernelogic
The NEAR team updates the snapshots in the public S3 bucket every 12 hours. We will try to download a complete snapshot every day and send it to Storage Providers.

@flyworker
Copy link

hey, I see you guys are working on snapshots and deals, can you elaborate more about how you are doing the snapshot and how you backup to the filecoin network.
I also want to know how to retrieve the data schema

@hengdingy
Copy link
Author

hey, I see you guys are working on snapshots and deals, can you elaborate more about how you are doing the snapshot and how you backup to the filecoin network.
I also want to know how to retrieve the data schema

@flyworker @kernelogic
We will use the Swan Client to send deals. Storage providers who accepted the deals will be requested to support the fast retrieval of files. The Market Matcher based on the Reputation System of FilSwan can help us find suitable storage providers. Please refer to the introduction of the system here.

Our Retrieval Scheme
We have designed a complete retrieval scheme for that. All the information about the snapshots will be maintained in our GitHub repository chainsnap, mainly including the name of each snapshot, the download url, snapshot metadata, deal metadata, etc., and all the deal metadata will be minted into NFTs through multichain.storage. The minted NFTs can be viewed on OpenSea.

If the user wants to get the complete snapshot, the specific retrieval steps are:

  • Download the deal metadata and get the storage provider IDs that store each piece;
  • Obtain all pieces of the snapshot through the lotus retrieval function;
  • Merge all pieces to get the original snapshot

At present, we have completed the data storage of part of the Filecoin and Polygon snapshots. For details, please refer to the link here.

@simonkim0515
Copy link
Collaborator

Datacap Request Trigger

Total DataCap requested

3PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

@large-datacap-requests
Copy link

large-datacap-requests bot commented Nov 28, 2022

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

DataCap allocation requested

50TiB

Id

6e748664-0d68-4429-9ae6-afad277d8b5d

@kernelogic
Copy link

Looks like this is public data and FogMeta have done LDNs before. Willing to support.

@filplus-checker
Copy link

DataCap and CID Checker Report1

  • Organization: FogMeta Lab
  • Client: f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

  • Storage provider should not exceed 25% of total datacap.
  • Storage provider should not be storing duplicate data for more than 20%.
  • Storage provider should have published its public IP address.
  • All storage providers should be located in different regions.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f0717969 Los Angeles, California, US 3.51 TiB 9.37% 3.07 TiB 12.46%
f01222595 Moscow, Moscow, RU 3.45 TiB 9.21% 2.89 TiB 16.29%
f03624 Nürnberg, Bavaria, DE 3.23 TiB 8.63% 2.80 TiB 13.53%
f0187709 Moscow, Moscow, RU 3.01 TiB 8.02% 2.50 TiB 16.75%
f01886797 Vancouver, British Columbia, CA 2.98 TiB 7.94% 2.54 TiB 14.70%
f01163272 Perm, Perm Krai, RU 2.91 TiB 7.75% 2.38 TiB 18.28%
f01072221 Los Angeles, California, US 2.72 TiB 7.25% 2.25 TiB 17.24%
f01896422 Fremont, California, US 2.69 TiB 7.17% 2.25 TiB 16.28%
f01871352 Seoul, Seoul, KR 2.54 TiB 6.77% 2.23 TiB 12.31%
f010088 Everett, Washington, US 2.19 TiB 5.84% 2.00 TiB 8.57%
f0240456 Chengdu, Sichuan, CN 1.85 TiB 4.93% 1.82 TiB 1.69%
f01402814 Singapore, Singapore, SG 1.79 TiB 4.77% 1.51 TiB 15.72%
f08399 Seattle, Washington, US 1.56 TiB 4.17% 1.34 TiB 14.00%
f01390330 Xi’an, Shaanxi, CN 1.31 TiB 3.50% 1.19 TiB 9.52%
f047419 North Prairie, Wisconsin, US 896.00 GiB 2.33% 896.00 GiB 0.00%
f0836160 Seoul, Seoul, KR 896.00 GiB 2.33% 800.00 GiB 10.71%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

  • No more than 25% of unique data are stored with less than 4 providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
32.00 GiB 64.00 GiB 1 0.17%
208.00 GiB 448.00 GiB 2 1.17%
612.00 GiB 2.02 TiB 3 5.38%
1.45 TiB 6.34 TiB 4 16.92%
1.59 TiB 8.41 TiB 5 22.43%
896.00 GiB 5.63 TiB 6 15.01%
224.00 GiB 1.75 TiB 7 4.67%
64.00 GiB 512.00 GiB 8 1.33%
32.00 GiB 288.00 GiB 9 0.75%
96.00 GiB 1.56 TiB 12 4.17%
224.00 GiB 3.81 TiB 13 10.17%
256.00 GiB 4.88 TiB 14 13.01%
96.00 GiB 1.81 TiB 15 4.84%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients.
Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

Copy link

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebsjbo52xfclki2ly4gko75ubrovx3onzrt24se55gkciavc75bsm

Address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Datacap Allocated

50.00TiB

Signer Address

f1hlubjsdkv4wmsdadihloxgwrz3j3ernf6i3cbpy

Id

6e748664-0d68-4429-9ae6-afad277d8b5d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebsjbo52xfclki2ly4gko75ubrovx3onzrt24se55gkciavc75bsm

@filplus-checker-app
Copy link

DataCap and CID Checker Report Summary1

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the full report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Copy link

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea3hyfpihe7iblglohauxxf6equhe572ksmqg4md2x6zwrvyxdtwc

Address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Datacap Allocated

100.00TiB

Signer Address

f1pszcrsciyixyuxxukkvtazcokexbn54amf7gvoq

Id

070bb770-11fb-43ab-86ed-036366aa98ec

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea3hyfpihe7iblglohauxxf6equhe572ksmqg4md2x6zwrvyxdtwc

@Sunnyiscoming
Copy link
Collaborator

Hello, @hengdingy per the filecoin-project/notary-governance#922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests