Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sui Node seed peers from validators info not working #20412

Open
Duoquote opened this issue Nov 24, 2024 · 9 comments
Open

Sui Node seed peers from validators info not working #20412

Duoquote opened this issue Nov 24, 2024 · 9 comments
Assignees
Labels
doc-issue Issue submitted using the Doc issue template

Comments

@Duoquote
Copy link

Duoquote commented Nov 24, 2024

When I look at to validators on suiscan for example Mysten 1: https://suiscan.xyz/mainnet/validator/0x4fffd0005522be4bc029724c7f0f6ed7093a6bf3a09b90e62f61dc15181e1a3e/info

I can see that address is /dns/mysten-1.mainnet.sui.io/udp/8084 and Network Public Key Bytes is 0Mfg9FEcDkWBRgl8KNgKNG0EZvwbUgXJjGyNPCk3dBE=, when I convert it to bytes and then hex it should be ed54176cb93ed30aeaa48b3741d76c754295b76e55a510c5a338d41e37934002 right?

But when I try to input that configuration my node just fails to sync, am I converting it wrong? Also how would I find mel-00. or ewr-00.?

My test configuration that I filled from validator informations:

p2p-config:
  seed-peers:
    - address: /dns/mysten-2.mainnet.sui.io/udp/8084
      peer-id: 9e4c25566d1a25fbfad56883d7a04177a6e38bef4d7378f75d65b96d50f8296f
    - address: /dns/mysten-1.mainnet.sui.io/udp/8084
      peer-id: d0c7e0f4511c0e458146097c28d80a346d0466fc1b5205c98c6c8d3c29377411
    - address: /dns/bd-sui-main-validator-01.bdnodes.net/udp/8084
      peer-id: 2aead29218c797593036eae814ffd2920540b64a17ea354f1a458080134d52a9
    - address: /dns/sui-mainnet.chorus.one/udp/8084
      peer-id: 28afe60b560f3fadb38158ddcbb451b7f78ee14ac8b1282a0891ad9c1edfa192
    - address: /dns/sui-mainnet.overclock.run/udp/8084
      peer-id: 0fafd9a6ad162e9fc82e3c56b5f71f1510454e8a13e26e86f95fa74ff3601d87
    - address: /dns/validator-01.sui.dsrvlabs.net/udp/8084
      peer-id: cac0e7ae6f85b11c4e261ee071a71be84e98a57df9499a8254f3b0f37341530c
    - address: /dns/sui-mainnet.nodes.lgns.xyz/udp/8084
      peer-id: edc0c0ece84ab3befbd3d4c806d25b44d0f32048426a8e8f031f5012e859a546
    - address: /dns/validator.mainnet.sui.rpcpool.com/udp/8084
      peer-id: 8d7439595d60348d3c3ebb651ae539f2819856f1ba3f15502093f1dda3767822
    - address: /dns/sui-mainnet-figment.staking.production.figment.io/udp/8084
      peer-id: ed54176cb93ed30aeaa48b3741d76c754295b76e55a510c5a338d41e37934002
fullnode-1  | 2024-11-24T19:07:53.250774Z ERROR sui_storage::object_store::util: Failed to read file from object store with error: Failed to get header
fullnode-1  | 
fullnode-1  | Caused by:
fullnode-1  |     Missing last modified
fullnode-1  | 2024-11-24T19:07:54.434203Z ERROR sui_storage::object_store::util: Failed to read file from object store with error: Failed to get header
fullnode-1  | 
fullnode-1  | Caused by:
fullnode-1  |     Missing last modified
fullnode-1  | 2024-11-24T19:07:55.459241Z ERROR sui_storage::object_store::util: Failed to read file from object store with error: Failed to get header
fullnode-1  | 
fullnode-1  | Caused by:
fullnode-1  |     Missing last modified
fullnode-1  | 2024-11-24T19:07:56.983884Z ERROR sui_storage::object_store::util: Failed to read file from object store with error: Failed to get header
fullnode-1  | 
fullnode-1  | Caused by:
fullnode-1  |     Missing last modified
fullnode-1  | 2024-11-24T19:07:57.777986Z  WARN sui_core::checkpoints::checkpoint_executor: Received no new synced checkpoints for 5s. Next checkpoint to be scheduled: 83347517
fullnode-1  | 2024-11-24T19:07:58.883054Z ERROR sui_storage::object_store::util: Failed to read file from object store with error: Failed to get header
fullnode-1  | 
fullnode-1  | Caused by:
fullnode-1  |     Missing last modified
fullnode-1  | 2024-11-24T19:08:01.805363Z ERROR sui_storage::object_store::util: Failed to read file from object store with error: Failed to get header
fullnode-1  | 
fullnode-1  | Caused by:
fullnode-1  |     Missing last modified

Am I doing something wrong?

Also where would I find for example mel-00. or ewr-00. regional subdomains for such nodes?

@Duoquote Duoquote added the doc-issue Issue submitted using the Doc issue template label Nov 24, 2024
@johnjmartin
Copy link
Contributor

johnjmartin commented Nov 25, 2024

@Duoquote what docs are you looking at for syncing a fullnode? Take a look at https://docs.sui.io/guides/operator/sui-full-node#setting-up-a-full-node for a list of state sync fullnodes you can use as seed peers.

We discourage peering directly with validators as a large number of fullnodes paired to a validator can impact performance.

It looks like your actual underlying issue is with the archival fallback - can you share your fullnode's config?

@Duoquote
Copy link
Author

Sorry if I misinterpreted at first, yes I am currently using the peer configuration you mentioned, I put together that list from what I looked up on suiscan. My fullnode is working but I wanted faster and low latency sync so I thought I could use whatever close to my server, turns out it is not working that way. Now to my understanding on what you said, on validator side they host additional nodes and we use them instead of validator node right? What else can I do to improve latency? Also I don't understand how multiple validators work in this scenario, we all sending requests to the same rpc servers, then who is validating it? Is it distributed across all of them or what?

@johnjmartin
Copy link
Contributor

Now to my understanding on what you said, on validator side they host additional nodes and we use them instead of validator node right?

Yes, the mainnet State Sync Full Nodes (SSFNs) are run by validators. And you connect to the SSFNs instead of directly to the validators to reduce bandwidth constraints on validators.

What else can I do to improve latency?

The improvement offered by pairing directly with a validator is actually quite small (~50ms). The majority of latency has to do with the fact that fullnodes are executing and verifying the checkpoints they receive via state sync. We don't have any existing options for large latency reductions, the general advice is to run with high performance disks and cpu.

Also I don't understand how multiple validators work in this scenario, we all sending requests to the same rpc servers, then who is validating it? Is it distributed across all of them or what?

Sorry, I don't understand what you're asking

@Duoquote
Copy link
Author

Thanks for the response, I think I am going to go with as you suggested high performance server and I come accross with custom indexer solution instead of polling local RPC (turns out JSON serialization/deserialization takes quite time), I think it will work better than polling local RPC? Next question was quite unrelated, no need to discuss how validation happens here. Thanks!

@johnjmartin
Copy link
Contributor

I come across with custom indexer solution instead of polling local RPC (turns out JSON serialization/deserialization takes quite time), I think it will work better than polling local RPC?

I guess it depends what latencies you're concerned about. If you create a custom indexer you could have the response latencies be much lower than json rpc (as you mentioned serialization/deserialization takes time and it's a more expressive data format). but the sync latencies will be slower. The custom indexer needs to read checkpoints from somewhere, by default its from a s3/gcs bucket which adds latency when reading the checkpoints & waiting for them to be written by the upstream.

@Duoquote
Copy link
Author

If you create a custom indexer you could have the response latencies be much lower...

As I can't cut down on network latency that's most I can do I guess.

but the sync latencies will be slower. The custom indexer needs to read checkpoints from somewhere, by default its from a s3/gcs bucket which adds latency when reading the checkpoints & waiting for them to be written by the upstream.

What do you mean, I enabled this setting on my fullnode to read from my local node;

checkpoint-executor-config:
  checkpoint-execution-max-concurrency: 200
  local-execution-timeout-sec: 30
  data-ingestion-dir: /opt/sui/ingest

by default its from a s3/gcs bucket which adds latency when reading the checkpoints & waiting for them to be written by the upstream.

Does that apply for full node as well, do my full node sync from s3/gcs by default too? I thought p2p config was made to sync checkpoints.

@johnjmartin
Copy link
Contributor

Does that apply for full node as well, do my full node sync from s3/gcs by default too? I thought p2p config was made to sync checkpoints.

No full node does not sync from s3/gcs. Sorry, I was just describing the default method the custom indexer reads checkpoints - looks like you're already planning to read them from a local fullnode disk though 👍 , so you wont have the additional latency of gcs/s3

@Duoquote
Copy link
Author

No full node does not sync from s3/gcs

Oh, is there a way I can fetch instead of running full node and syncing to disk?

Also I tried this after local reader failing;

use anyhow::Result;
use async_trait::async_trait;
use sui_data_ingestion_core::{setup_single_workflow, Worker};
use sui_types::full_checkpoint_content::CheckpointData;

struct CustomWorker;

#[async_trait]
impl Worker for CustomWorker {
    type Result = ();
    async fn process_checkpoint(&self, checkpoint: &CheckpointData) -> Result<()> {
        // custom processing logic
        // print out the checkpoint number
        println!(
            "Processing checkpoint: {}",
            checkpoint.checkpoint_summary.to_string()
        );
        Ok(())
    }
}

#[tokio::main]
async fn main() -> Result<()> {
    let (executor, term_sender) = setup_single_workflow(
        CustomWorker,
        "https://checkpoints.mainnet.sui.io".to_string(),
        83728623, /* initial checkpoint number */
        5,        /* concurrency */
        None,     /* extra reader options */
    )
    .await?;
    executor.await?;
    Ok(())
}

Which I observed it is behind 2 checkpoints compared to my full node, I tried local reader but it gives me Error: EOF while parsing a value at line 1 column 0 error, not sure why is that. I confirmed that the folder is correct, maybe because my node is not full sync I started it from epoch 590.

Also thank you so much for taking your precious time!

@johnjmartin
Copy link
Contributor

cc @phoenix-o on the custom indexer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-issue Issue submitted using the Doc issue template
Projects
None yet
Development

No branches or pull requests

3 participants