SPX - PoSt Worker Testing #8375
Replies: 9 comments 8 replies
-
Ozzy's PoSt Worker(s) ReportPhase 1
For the SPX people running without
An 👩🔬👨🔬: Phase 2
Manually trigger one more windowPoSt computation than you have workers and confirm that:
Since the
Optional, but really appreciatedTry to monitor and see how much VRAM and RAM you are using when doing winningPoSt. If you have some small form-factor / low memory GPUs lying around, now is the time to brush the dust off them (should be tested on a testnet) to see how low specs is needed.
UX FeedbacksAdd your UX feedback with the new PoSt workers and commands here. |
Beta Was this translation helpful? Give feedback.
-
Phase 1
For the SPX people running without local sector access on the windowPoSt worker
An 👩🔬👨🔬:
Next |
Beta Was this translation helpful? Give feedback.
-
Phase 2
Manually trigger one more windowPoSt computation than you have workers and confirm that:
Since the lotus-miner proving compute window-post [deadline index] is an real simulation of a windowPoSt, please add the performance numbers for the partitions you are computing in parallel:
*Last was done in second round by PoST worker 1 I have not been able to identify the profile of WinningPoST, but WindowPoST seems to be exactly unchanged and will be consuming:
UX Feedbacks
Other Feedback/concerns:
Like it is on the miner:
|
Beta Was this translation helpful? Give feedback.
-
Phase 1
For the SPX people running without local sector access on the windowPoSt worker
An 👩🔬👨🔬:
Next |
Beta Was this translation helpful? Give feedback.
-
Bug reportBackgroundI'm trying to test out different deployments of PoST workers, and I ran into an issue - lost power on my calib miner. SetupServer 1 has the following individual processes running:
Server 2 has:
IncidentThis morning Seems like the PoST worker was fully tested when reading locally or from the miner, but what about if the PoST worker needs to read data from another PoST worker? This would be a highly relevant scenario if we have PoST workers sitting directly on the storage, but no clean way of ensuring it only has to read locally, but will also have to request data from other workers to assemble data for a partition in a deadline. Logs:
|
Beta Was this translation helpful? Give feedback.
-
Bug reportBackgroundLost all power on my calib miner. So I would use the SetupServer 1 has the following individual processes running:
Incident
LogsNothing in the logs. Empty. |
Beta Was this translation helpful? Give feedback.
-
TippyFlits PoSt Worker(s) ReportPhase 1
An 👩🔬👨🔬:
Phase 2
Manually trigger one more windowPoSt computation than you have workers and confirm that:
Since the
Optional, but really appreciatedTry to monitor and see how much VRAM and RAM you are using when doing winningPoSt. If you have some small form-factor / low memory GPUs lying around, now is the time to brush the dust off them (should be tested on a testnet) to see how low specs is needed.
UX FeedbacksAdd your UX feedback with the new PoSt workers and commands here. |
Beta Was this translation helpful? Give feedback.
-
Phase 1
For the SPX people running without local sector access on the windowPoSt worker
|
Beta Was this translation helpful? Give feedback.
-
My SP is relatively small, I have one miner (doing wdpost and winningpost), 256GB RAM, A4000 GPU and one worker (doing all the sealing), 512GB RAM, 3090 GPU Can I install a second A4000 GPU to the miner and start a winning post worker process and let the miner process use one A4000 for wdpost and the winning post worker use the other A4000? |
Beta Was this translation helpful? Give feedback.
-
They are here, the long-awaited PoSt workers, they are here! Okie.. so you have been expecting this, however, do you know that not only you get a WindowPoSt worker, you also get a WinningPoSt worker... or wait, actually any number of PoSt workers as long as you have the hardware resources!
So as always.. right now the code is at a it may even work status, and with this testing, we will change it to.... it works!. It´s recommended to test this on the calibration-network before moving onto testing on mainnet.
Phase 1
In Phase 1 we will be testing that a single PoSt worker functions as intended, and that the PoSt task switches back to the lotus-miner if a PoSt worker is disconnected. To launch a PoSt worker follow these steps:
1. Upgrade your cluster
Upgrade your cluster & workers to post-worker-rc3. And run your cluster as normal. Check that you have ample time until your next proving window before you proceed to the next step.
2. Set env-variables on worker
Remember to have the appropriate Nvidia-drivers and
nvidia-opencl-icd
installed if running OpenCL on your worker. If using CUDA, install the CUDA-toolkit and build Lotus withFFI_USE_CUDA=1
For the PoSt worker to start, it will need to read and verify the Filecoin proof parameters. These are the same parameters as is currently on the lotus-miner instance. We recommend copying them over from your lotus-miner machine, else they will be downloaded on first run.
3. Local vs Remote storage access
The windowPoSt process requires reading random leafs of all the sealed sectors in a proving deadline. When setting up a windowPoSt worker one needs to consider how the worker can access those files. The PoSt workers can ask any other worker to read challenges for them, including the lotus-miner process, but it will prefer reading it from local paths.
4. Run the PoSt worker
The PoSt workers will fail to start if the file descriptor limit is not set high enough. Raise the the file descriptor limit temporarily before running with
ulimit -n 1048576
or permanently by following the Permanently Setting Your ULIMIT System Value guide.The above command will start the worker. You´ll need to specify which PoSt operation you want the worker to perform with one of the following flags set to true.
A PoSt worker instance can only be either a winningPoSt worker, or a windowPoSt worker. It is not possible to run any PoSt tasks on a seal worker.
When a winningPoSt or windowPoSt worker connects to the lotus-miner, the lotus miner will delegate all winningPoSt or windowPoSt tasks to that worker. If both tasks are delegated to seperate PoSt workers, no PoSt tasks will be executed locally on the miner instance. If a worker is stopped, the lotus-miner instance switches back to local PoSt automatically.
5. Verify connection
After connecting, verify that the PoSt workers are connected to the lotus-miner with
lotus-miner proving workers
&lotus-miner info
6. Manually trigger a windowPoSt
Manually trigger a windowPoSts outside of your proving periods with
lotus-miner proving compute window-post [deadline index]
and verify that your new setup is working and computes a windowPoSt on the windowPoSt worker. It will not send any messages to the chain - its just an upgradedlotus-miner proving compute window-post [deadline index]
command.Finish the tasks listed in the report-template, before moving to Phase 2.
Phase 2
In Phase 2 we will test that 1:N windowPoSt workers functions properly, and that they are able to compute partitions in parallel. Since this phase is more hardware intensive, it might be necessary to run this on mainnet. Use Phase 1 to get accustomed to the PoSt workers before starting on these steps:
Follow the steps in Phase 1 and run as many windowPoSt workers as you desire and are able to.
Manually trigger multiple windowPoSts (start with 2) outside of your proving periods with
lotus-miner proving compute window-post [deadline index]
and verify that the windowPoSts gets computed in parallel across your windowPoSt workers.Finish the tasks listed in the Phase 2 part of the report-template.
Issues can be reported in the https://github.com/filecoin-project/lotus/issues. Issues should be submitted with:
If you think the PoSt worker is not working well, keep all the logs for describing the issue, then disconnect any PoSt workers connected to your
lotus-miner
instance. The lotus-miner instance switches back to local PoSt automatically.Beta Was this translation helpful? Give feedback.
All reactions