Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Application] <LAMOST DR7 public data> #5

Open
1 of 2 tasks
leier1987 opened this issue Oct 16, 2024 · 23 comments
Open
1 of 2 tasks

[DataCap Application] <LAMOST DR7 public data> #5

leier1987 opened this issue Oct 16, 2024 · 23 comments
Labels

Comments

@leier1987
Copy link

leier1987 commented Oct 16, 2024

Data Owner Name

LAMOST

Data Owner Country/Region

China

Data Owner Industry

Environment

Website

http://www.lamost.org/dr7/v2.0/

Social Media Handle

http://www.lamost.org/dr7/v2.0/

Social Media Type

Other

What is your role related to the dataset

Data Preparer

Total amount of DataCap being requested

15PiB

Expected size of single dataset (one copy)

2PiB

Number of replicas to store

8

Weekly allocation of DataCap requested

1000TiB

On-chain address for first allocation

f1ri3fyqb3tsr66nh2lqmryqktbfver5irm2mmnni

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

  • Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) is a Chinese national scientific research facility operated by the National Astronomical Observatories, Chinese Academy of Sciences. It is a special reflecting Schmidt telescope with 4000 fibers in a field of view of 20 deg2 in the sky. Until July 2019, LAMOST has completed its pilot survey, which was launched in October 2011 and ended in June 2012, and the regular survey of the first seven years, which was initiated on September 2012[1-7]. In this data release, there are totally 10,431,197 low resolution spectra published, which satisfy the selection criteria that the LAMOST LRS General Catalog also used. The data products of this release can be available from the website http://dr7.lamost.org/v2.0/.

Guoshoujing Telescope (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope LAMOST) is a National Major Scientific Project built by the Chinese Academy of Sciences. Funding for the project has been provided by the National Development and Reform Commission. LAMOST is operated and managed by the National Astronomical Observatories, Chinese Academy of Sciences.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) is a Chinese national scientific research facility operated by the National Astronomical Observatories, Chinese Academy of Sciences. It is a special reflecting Schmidt telescope with 4000 fibers in a field of view of 20 deg2 in the sky. Until July 2019, LAMOST has completed its pilot survey, which was launched in October 2011 and ended in June 2012, and the regular survey of the first seven years, which was initiated on September 2012[1-7]. In this data release, there are totally 10,431,197 low resolution spectra published, which satisfy the selection criteria that the LAMOST LRS General Catalog also used. The data products of this release can be available from the website http://dr7.lamost.org/v2.0/, and they include:

1.    Spectra. - There are 10,431,197 flux (relatively)- and wavelength-calibrated, sky-subtracted spectra in DR7, including 9,846,793 stellar spectra, 198,272 galaxy spectra, 66,612 quasar spectra, and 319,520 unknown object spectra, and these spectra cover the wavelength range of 3690 Å - 9100 Å with a resolution of 1800[2-3] at the 5500 Å.

2.    Spectroscopic Parameter Catalogs. - In this data release, nine spectroscopic parameter catalogs are also published,and they are the LAMOST LRS General Catalog, the LAMOST LRS Stellar Parameter Catalog of A, F, G and K Stars, the LAMOST LRS Line-Index Catalog of A Type Stars, the LAMOST LRS Catalog of gM, dM, and sdM Stars, the LAMOST LRS Multiple Epoch Catalog, the LAMOST LRS Observed Plate Information Catalog, the LAMOST LRS Input Catalog, the LAMOST LRS Catalog of Cataclysmic Variable Stars, and the LAMOST LRS Catalog of  White Dwarf Stars respectively. Tens of parameters are included in these catalogs, such as right ascension, declination, signal to noise ratio (S/N), magnitude, gaia_source_id, gaia_g_mean_mag, atmospheric parameters (effective temperature, surface gravity, and metallicity), radial velocity, element abundance, spectral line indices, line widths, the metallicity sensitive parameter, and the magnetic activity flag.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (Country/Region)

None

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

No response

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

No response

Please share a sample of the data

http://www.lamost.org/dr7/v2.0/tar-split/catalog/
http://www.lamost.org/dr7/v2.0/tar-split/lrs-fits/
http://www.lamost.org/dr7/v2.0/tar-split/lrs-png/
http://www.lamost.org/dr7/v2.0/tar-split/mrs-fits/
http://www.lamost.org/dr7/v2.0/tar-split/sky/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • I confirm

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, Africa, North America, South America, Europe, Australia (continent), Antarctica

How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

How did you find your storage providers

Slack, Filmine, Big Data Exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you used

No response

Please list the provider IDs and location of the storage providers you will be working with.

f03178144
f03178077
f01106668 
f0870558 
f01518369
f01889668
f03151456
f03151449
f03151456

How do you plan to make deals to your storage providers

Boost client, Lotus client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

Copy link
Contributor

datacap-bot bot commented Oct 16, 2024

Application is waiting for allocator review

@amiclan666
Copy link
Owner

amiclan666 commented Oct 16, 2024

  1. Have you prepared enough token for sector pledge?
  2. Are you a data preparer? What is your previous experience as a data-preparer? List previous applications and client IDs
  3. How will the data be prepared? Please include tooling used and technical details
  4. If you are not preparing the data, who will prepare the data? (Name and Business)
  5. Has this dataset been stored on Filecoin before? If so, why are you choosing to store it again?
  6. Best practice for storing large datasets includes ideally, storing it in 3 or more regions, with 4 or more storage provider operators or owners.You should list Miner ID, Business Entity, Location of sps you will cooperate with.
  7. Why are you applying for a total of 15 PiB datacap, and how can you prove that each data piece has 2 PiB?
  8. Can you ensure that the SPs' Spark retrieval rate meets the fil-plus rules?
  9. Also, please send your identity information to ariachen7650@gmail.com email for our review.

@leier1987
Copy link
Author

leier1987 commented Oct 16, 2024

Thank you for the review. First, allow me to introduce myself. I am Lei, the applicant for Yunphant Allocator filecoin-project/Allocator-Governance#107. To demonstrate my capability as a new allocator operator, I have thoroughly studied the fil-plus rules and processes, and I have reviewed nearly all early applications and various allocator operation reports. Yunphant itself is a leading blockchain infrastructure service provider (https://www.yunphant.com/), and we have experience in blockchain services.

We are connected to many SPs currently involved in the Filecoin community through our clients, which I have listed in the application. These SPs are ready and just waiting for the datacap allocation. Our team has development and operational capabilities, so we can handle the initial data preparation using primarily official tools like Boost, Lotus, and Singularity.

The datasets we are applying for are primarily first-time storage, communicated with these SPs. I'm not entirely sure if other SPs in the Filecoin network have stored this data.

Through several governance meetings, we understand that the fil-plus team places great emphasis on Spark retrieval rates, so these SPs are actively working to improve their Spark retrieval success rates, and we believe they can meet the official requirements.

@leier1987
Copy link
Author

image

The main data occupation is spectral image data, in order to accurately retrieve the ClD of each image, we will not compress thedata before sealing. The uncompressed data size of each spectral data is = 205MByte.

Originally we planned to save 8 replicas to different SPs.205MB * (10431197+14096967) Spectra /32GB per sector * 8 replicas = 1227605.86 sectors = 37.46 PiB DataCap

According to the formula, a total of 37.46 PiB Datacap can actually be applied for, with 8 backups, each data piece having 4.6825 PiB. However, considering the initial application, we will prepare 2 PiB for each data piece, totaling 15 PiB for the application.

@leier1987
Copy link
Author

image The email has been sent. I hope to receive support for 500 TiB. Thank you!

@leier1987
Copy link
Author

image

Updated:
As I explained in my previous reply, these SPs are stored for the first time, for the rest of the SPs storage, I'm not quite sure.

@amiclan666
Copy link
Owner

All information received, but we apologize that according to our rules, we can only support 200 TiB for the first round. If operations are compliant, the next round will receive 500 TiB. We are willing to support this for the first round.
image

Copy link
Contributor

datacap-bot bot commented Oct 18, 2024

Datacap Request Trigger

Total DataCap requested

15PiB

Expected weekly DataCap usage rate

1000TiB

DataCap Amount - First Tranche

200TiB

Client address

f1ri3fyqb3tsr66nh2lqmryqktbfver5irm2mmnni

Copy link
Contributor

datacap-bot bot commented Oct 18, 2024

DataCap Allocation requested

Multisig Notary address

Client address

f1ri3fyqb3tsr66nh2lqmryqktbfver5irm2mmnni

DataCap allocation requested

200TiB

Id

8c10423f-badc-4a34-8ed3-6746eef274cb

Copy link
Contributor

datacap-bot bot commented Oct 18, 2024

Application is ready to sign

Copy link
Contributor

datacap-bot bot commented Oct 18, 2024

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebhyqvoqppkj6avnnth2r2dzdd6647x3xh55kjeqdpyveo7kjvpju

Address

f1ri3fyqb3tsr66nh2lqmryqktbfver5irm2mmnni

Datacap Allocated

200TiB

Signer Address

f1i2znbisevlalmctotghgnroprplkioss73ka24i

Id

8c10423f-badc-4a34-8ed3-6746eef274cb

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebhyqvoqppkj6avnnth2r2dzdd6647x3xh55kjeqdpyveo7kjvpju

Copy link
Contributor

datacap-bot bot commented Oct 18, 2024

Application is Granted

Copy link
Contributor

datacap-bot bot commented Oct 30, 2024

Client used 75% of the allocated DataCap. Consider allocating next tranche.

@leier1987
Copy link
Author

checker:manualTrigger

Copy link
Contributor

datacap-bot bot commented Nov 4, 2024

DataCap and CID Checker Report Summary1

Storage Provider Distribution

⚠️ 4 storage providers sealed too much duplicate data - f03179572: 51.69%, f03178144: 24.32%, f03214937: 22.16%, f03151449: 48.99%

⚠️ 50.00% of Storage Providers have retrieval success rate less than 75%.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@amiclan666
Copy link
Owner

As the success rate for the first round of Spark retrieval is good, please explain why f03179572, f03214937, and f03179570 did not appear in the application list, as well as the warnings in the report.

@leier1987
Copy link
Author

Thank you for your patient review. Due to internal work plans, not all of the SPs disclosed at that time were sealed together, so they recommended partners. I am now updating the information as follows:f03179572, f03214937 Flycloud US;f03179570 YunSD Singapore.

Regarding the warning in the robot report, I asked the relevant technical staff, and they suspect it may be a bug because they confirmed that no duplicate deals were sent. Please continue to support us; over time, we believe the report will be restored.

Additionally, I shared the pieceCID document using the email you provided earlier to confirm that we do not have duplicate pieceCIDs.

@leier1987
Copy link
Author

image

@leier1987
Copy link
Author

The E-mail sent,Please check.

image

@amiclan666
Copy link
Owner

I received the email, and I confirm that there are no duplicate pieceCIDs in the table. I’ll keep monitoring it closely. For this round, following the guidelines, I’ll continue to support 500TiB.

image

Copy link
Contributor

datacap-bot bot commented Nov 5, 2024

Application is in Refill

Copy link
Contributor

datacap-bot bot commented Nov 5, 2024

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebmrlb2x5moejw4mcrjuqc6il5tlh2ntsn67fa7xr32weysdracea

Address

f1ri3fyqb3tsr66nh2lqmryqktbfver5irm2mmnni

Datacap Allocated

500TiB

Signer Address

f1i2znbisevlalmctotghgnroprplkioss73ka24i

Id

f91e09bf-4b94-4b58-aa53-1b4a384e8061

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebmrlb2x5moejw4mcrjuqc6il5tlh2ntsn67fa7xr32weysdracea

Copy link
Contributor

datacap-bot bot commented Nov 5, 2024

Application is Granted

@datacap-bot datacap-bot bot added granted and removed Refill labels Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants