Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Use dedicated PoST service for Proof generation #5061

Closed
wants to merge 41 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
932ea97
Fix post test
fasmat Sep 22, 2023
aa04e56
Integrate PostClient in activation package
fasmat Sep 24, 2023
4d9c879
Fix tests
fasmat Oct 4, 2023
ed731ec
Add postConnection listeners to grpc service
fasmat Oct 5, 2023
c76e0c4
Use zap logger directly
fasmat Oct 5, 2023
234b264
Fix failing tests
fasmat Oct 5, 2023
4ddcddc
Increase timeouts
fasmat Oct 5, 2023
75215ca
Update e2e tests
fasmat Oct 5, 2023
ddb939d
Fix tests
fasmat Oct 6, 2023
80da2cc
Fix e2e tests
fasmat Oct 6, 2023
455db7f
Update changelog
fasmat Oct 6, 2023
f70c972
Try fixing flaky tests on macos
fasmat Oct 6, 2023
7d74bd0
Start nodes with private post service
fasmat Oct 6, 2023
f6bc1c7
Start PoST service with right data dir
fasmat Oct 6, 2023
f8aa06f
Use higher tick interval to reduce load
fasmat Oct 6, 2023
fa87e2f
Increase timeout
fasmat Oct 6, 2023
6bc842f
Add cmd parameters
fasmat Oct 7, 2023
a22c2a5
Add logging to post supervisor
fasmat Oct 7, 2023
e8b2533
Fix TestAdmin failing on windows
fasmat Oct 7, 2023
2f288a9
Fix cmd options for post service not used
fasmat Oct 7, 2023
9fa16f6
Deflaking tests
fasmat Oct 7, 2023
f13dffb
Add deprecated to GenerateProof
fasmat Oct 9, 2023
83c1577
Update CHANGELOG
fasmat Oct 9, 2023
ab17f5a
Remove post proof validation from nipost builder
fasmat Oct 9, 2023
c5d384f
Fix e2e tests
fasmat Oct 9, 2023
e6eeea2
Check challenge in response to match request challenge
fasmat Oct 9, 2023
2e02fea
Remove deprecated functionality
fasmat Oct 9, 2023
9d0f1fd
make generate
fasmat Oct 9, 2023
1579785
Remove proving options from PostSetup Manager
fasmat Oct 9, 2023
341fbbf
Replace Flags in proving options with RandomX mode
fasmat Oct 9, 2023
1650a90
Deduplicate config, improve parsing and add tests
fasmat Oct 9, 2023
980711f
Remove obsolete parameter
fasmat Oct 9, 2023
d33e258
Update post service
fasmat Oct 10, 2023
a906dec
Remove scrypt options
fasmat Oct 10, 2023
0a1805b
Fix supervisor for windows and make restartable
fasmat Oct 10, 2023
90305db
Fix tests
fasmat Oct 10, 2023
b049226
revert to alpha2
fasmat Oct 10, 2023
6904bf8
Fix test on windows
fasmat Oct 10, 2023
21064a7
Start and Stop smeshing via GRPC uses supervisor
fasmat Oct 10, 2023
0677787
Merge remote-tracking branch 'origin/develop' into 5042-integrate-pos…
fasmat Oct 11, 2023
584315d
Update review feedback
fasmat Oct 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,14 @@ See [RELEASE](./RELEASE.md) for workflow instructions.

* [#5118](https://github.com/spacemeshos/go-spacemesh/pull/5118) reduce number of tortoise results returned after recovery.

this is hotfix for a bug introduced in v1.2.0. in rare conditions node may loop with the following warning:
this is hotfix for a bug introduced in v1.2.0. in rare conditions node may loop with the following warning:

> 2023-10-02T15:28:14.002+0200 WARN fd68b.sync mesh failed to process layer from sync {"node_id": "fd68b9397572556c2f329f3e5af2faf23aef85dbbbb7e38447fae2f4ef38899f", "module": "sync", "sessionId": "29422935-68d6-47d1-87a8-02293aa181f3", "layer_id": 23104, "errmsg": "requested layer 8063 is before evicted 13102", "name": "sync"}
> 2023-10-02T15:28:14.002+0200 WARN fd68b.sync mesh failed to process layer from sync {"node_id": "fd68b9397572556c2f329f3e5af2faf23aef85dbbbb7e38447fae2f4ef38899f", "module": "sync", "sessionId": "29422935-68d6-47d1-87a8-02293aa181f3", "layer_id": 23104, "errmsg": "requested layer 8063 is before evicted 13102", "name": "sync"}

* [#5091](https://github.com/spacemeshos/go-spacemesh/pull/5091) First stage of separating PoST from the node into its own service.
* [#5061](https://github.com/spacemeshos/go-spacemesh/pull/5061) Proof generation is now done via a dedicated service instead of the node.

Operating a node doesn't require any changes at the moment. The service will be automatically started by the node if needed and will be stopped when the node is stopped.

* [#5138](https://github.com/spacemeshos/go-spacemesh/pull/5138) Bump poet to v0.9.7

Expand Down
41 changes: 39 additions & 2 deletions activation/activation.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
"sync"
"time"

"github.com/spacemeshos/post/proving"
"github.com/spacemeshos/post/shared"
"go.uber.org/atomic"
"golang.org/x/sync/errgroup"
Expand Down Expand Up @@ -88,6 +87,9 @@
initialPost *types.Post
validator nipostValidator

postMux sync.Mutex
postClient PostClient

// smeshingMutex protects `StartSmeshing` and `StopSmeshing` from concurrent access
smeshingMutex sync.Mutex

Expand Down Expand Up @@ -183,6 +185,41 @@
return b
}

func (b *Builder) Connected(client PostClient) {
b.postMux.Lock()
defer b.postMux.Unlock()

if b.postClient != nil {
b.log.With().Error("post service already connected")
return
}

Check warning on line 195 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L193-L195

Added lines #L193 - L195 were not covered by tests

b.postClient = client
}

func (b *Builder) Disconnected(client PostClient) {
b.postMux.Lock()
defer b.postMux.Unlock()

if b.postClient != client {
b.log.With().Debug("post service not connected")
return
}

Check warning on line 207 in activation/activation.go

View check run for this annotation

Codecov / codecov/patch

activation/activation.go#L205-L207

Added lines #L205 - L207 were not covered by tests

b.postClient = nil
}

func (b *Builder) proof(ctx context.Context, challenge []byte) (*types.Post, *types.PostMetadata, error) {
b.postMux.Lock()
defer b.postMux.Unlock()

if b.postClient == nil {
return nil, nil, errors.New("post service not connected")
}

return b.postClient.Proof(ctx, challenge)
}

// Smeshing returns true iff atx builder is smeshing.
func (b *Builder) Smeshing() bool {
return b.started.Load()
Expand Down Expand Up @@ -333,7 +370,7 @@
startTime := time.Now()
var err error
events.EmitPostStart(shared.ZeroChallenge)
post, metadata, err := b.postSetupProvider.GenerateProof(ctx, shared.ZeroChallenge, proving.WithPowCreator(b.nodeID.Bytes()))
post, metadata, err := b.proof(ctx, shared.ZeroChallenge)
if err != nil {
events.EmitPostFailure()
return fmt.Errorf("post execution: %w", err)
Expand Down
31 changes: 17 additions & 14 deletions activation/activation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,12 +96,13 @@ type testAtxBuilder struct {
coinbase types.Address
goldenATXID types.ATXID

mpub *mocks.MockPublisher
mnipost *MocknipostBuilder
mpost *MockpostSetupProvider
mclock *MocklayerClock
msync *Mocksyncer
mValidator *MocknipostValidator
mpub *mocks.MockPublisher
mnipost *MocknipostBuilder
mpost *MockpostSetupProvider
mpostClient *MockPostClient
mclock *MocklayerClock
msync *Mocksyncer
mValidator *MocknipostValidator
}

func newTestBuilder(tb testing.TB, opts ...BuilderOption) *testAtxBuilder {
Expand All @@ -118,6 +119,7 @@ func newTestBuilder(tb testing.TB, opts ...BuilderOption) *testAtxBuilder {
mpub: mocks.NewMockPublisher(ctrl),
mnipost: NewMocknipostBuilder(ctrl),
mpost: NewMockpostSetupProvider(ctrl),
mpostClient: NewMockPostClient(ctrl),
mclock: NewMocklayerClock(ctrl),
msync: NewMocksyncer(ctrl),
mValidator: NewMocknipostValidator(ctrl),
Expand All @@ -143,6 +145,7 @@ func newTestBuilder(tb testing.TB, opts ...BuilderOption) *testAtxBuilder {
Nonce: 0,
Indices: make([]byte, 10),
}
b.Connected(tab.mpostClient)
tab.Builder = b
dir := tb.TempDir()
tab.mnipost.EXPECT().DataDir().Return(dir).AnyTimes()
Expand Down Expand Up @@ -250,7 +253,7 @@ func TestBuilder_StartSmeshingCoinbase(t *testing.T) {
tab.mpost.EXPECT().StartSession(gomock.Any()).AnyTimes()
tab.mpost.EXPECT().LastOpts().Return(&PostSetupOpts{}).AnyTimes()
tab.mpost.EXPECT().CommitmentAtx().Return(tab.goldenATXID, nil).AnyTimes()
tab.mpost.EXPECT().GenerateProof(gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes().Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mpostClient.EXPECT().Proof(gomock.Any(), gomock.Any()).AnyTimes().Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mValidator.EXPECT().Post(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes().Return(nil)
tab.mclock.EXPECT().AwaitLayer(gomock.Any()).Return(make(chan struct{})).AnyTimes()
require.NoError(t, tab.StartSmeshing(coinbase, postSetupOpts))
Expand All @@ -271,7 +274,7 @@ func TestBuilder_RestartSmeshing(t *testing.T) {
tab.mpost.EXPECT().CommitmentAtx().Return(types.EmptyATXID, nil).AnyTimes()
tab.mpost.EXPECT().LastOpts().Return(&PostSetupOpts{}).AnyTimes()
tab.mpost.EXPECT().StartSession(gomock.Any()).AnyTimes()
tab.mpost.EXPECT().GenerateProof(gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes().Return(&types.Post{}, &types.PostMetadata{
tab.mpostClient.EXPECT().Proof(gomock.Any(), gomock.Any()).AnyTimes().Return(&types.Post{}, &types.PostMetadata{
Challenge: shared.ZeroChallenge,
}, nil)
tab.mpost.EXPECT().Reset().AnyTimes()
Expand Down Expand Up @@ -382,7 +385,7 @@ func TestBuilder_StartSmeshing_PanicsOnErrInStartSession(t *testing.T) {
tab.log = l

// Stub these methods in case they get called
tab.mpost.EXPECT().GenerateProof(gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes().Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mpostClient.EXPECT().Proof(gomock.Any(), gomock.Any()).AnyTimes().Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mclock.EXPECT().AwaitLayer(gomock.Any()).AnyTimes()

// Set expectations
Expand All @@ -407,7 +410,7 @@ func TestBuilder_StartSmeshing_SessionNotStartedOnFailPrepare(t *testing.T) {
tab.log = l

// Stub these methods in case they get called
tab.mpost.EXPECT().GenerateProof(gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes().Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mpostClient.EXPECT().Proof(gomock.Any(), gomock.Any()).AnyTimes().Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mclock.EXPECT().AwaitLayer(gomock.Any()).AnyTimes()

// Set PrepareInitializer to fail
Expand All @@ -430,7 +433,7 @@ func TestBuilder_StopSmeshing_OnPoSTError(t *testing.T) {
tab.mpost.EXPECT().StartSession(gomock.Any()).Return(nil).AnyTimes()
tab.mpost.EXPECT().CommitmentAtx().Return(types.EmptyATXID, nil).AnyTimes()
tab.mpost.EXPECT().LastOpts().Return(&PostSetupOpts{}).AnyTimes()
tab.mpost.EXPECT().GenerateProof(gomock.Any(), gomock.Any(), gomock.Any()).Return(&types.Post{}, &types.PostMetadata{}, nil).AnyTimes()
tab.mpostClient.EXPECT().Proof(gomock.Any(), gomock.Any()).AnyTimes().Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mValidator.EXPECT().Post(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes().Return(nil)
ch := make(chan struct{})
close(ch)
Expand Down Expand Up @@ -1089,7 +1092,7 @@ func TestBuilder_RetryPublishActivationTx(t *testing.T) {

func TestBuilder_InitialProofGeneratedOnce(t *testing.T) {
tab := newTestBuilder(t, WithPoetConfig(PoetConfig{PhaseShift: layerDuration * 4}))
tab.mpost.EXPECT().GenerateProof(gomock.Any(), shared.ZeroChallenge, gomock.Any()).Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mpostClient.EXPECT().Proof(gomock.Any(), shared.ZeroChallenge).Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mpost.EXPECT().LastOpts().Return(&PostSetupOpts{})
tab.mpost.EXPECT().CommitmentAtx().Return(tab.goldenATXID, nil)
tab.mValidator.EXPECT().Post(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes().Return(nil)
Expand Down Expand Up @@ -1121,7 +1124,7 @@ func TestBuilder_InitialPostIsPersisted(t *testing.T) {
tab.mpost.EXPECT().Config().AnyTimes().Return(PostConfig{})
tab.mpost.EXPECT().LastOpts().Return(&PostSetupOpts{}).AnyTimes()
tab.mpost.EXPECT().CommitmentAtx().Return(tab.goldenATXID, nil).Times(3)
tab.mpost.EXPECT().GenerateProof(gomock.Any(), shared.ZeroChallenge, gomock.Any()).Return(&types.Post{}, &types.PostMetadata{
tab.mpostClient.EXPECT().Proof(gomock.Any(), shared.ZeroChallenge).Return(&types.Post{}, &types.PostMetadata{
Challenge: shared.ZeroChallenge,
}, nil)
tab.mValidator.EXPECT().Post(gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any(), gomock.Any()).AnyTimes().Return(nil)
Expand All @@ -1132,7 +1135,7 @@ func TestBuilder_InitialPostIsPersisted(t *testing.T) {

// Remove the persisted post file and try again
require.NoError(t, os.Remove(filepath.Join(tab.nipostBuilder.DataDir(), postFilename)))
tab.mpost.EXPECT().GenerateProof(gomock.Any(), shared.ZeroChallenge, gomock.Any()).Return(&types.Post{}, &types.PostMetadata{}, nil)
tab.mpostClient.EXPECT().Proof(gomock.Any(), shared.ZeroChallenge).Return(&types.Post{}, &types.PostMetadata{}, nil)
require.NoError(t, tab.generateInitialPost(context.Background()))
}

Expand Down
Loading