From 478c3f7ab9bcc5a6394e9e1e73b0eb264f6e5118 Mon Sep 17 00:00:00 2001 From: hatemosphere Date: Wed, 6 Mar 2024 15:25:59 +0100 Subject: [PATCH 1/3] feat!: Enable Traefik metrics, rename common label var, README fixes --- README.md | 81 +++++++++++++------------------- defaults/main.yml | 3 +- templates/docker-compose.yaml.j2 | 3 ++ templates/monitoring.yaml.j2 | 2 +- templates/vmagent-config.yml.j2 | 8 ++++ 5 files changed, 46 insertions(+), 51 deletions(-) diff --git a/README.md b/README.md index b003593..a1177a5 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,21 @@ # ansible-en-role -Ansible role for setup external node. + +Ansible role to deploy and configure zkSync Era External Node, including DB isntance setup on the same machine, Traefik as reverse proxy, and Prometheus monitoring (PostgreSQL exporter, Node exporter, cAdvisor, Traefik, External Node native metrics, and VictoriaMetrics vmagent to scrape all of them). + +Make sure to configure Prometheus remote write endpoint to send metrics to centralized metrics storage. + +Role has been tested and used internally on bare metal Hetzner instances. ## Requirements + This role has been tested on: + * Ubuntu 22.04, Jammy Jellyfish; Ansible 2.13.9 ## Usage + This role contains variables which has to be set: + ```yaml database_name: "" database_username: "" @@ -17,7 +26,8 @@ l1_chain_id: "" l2_chain_id: "" ``` -If you want to use monitoring, you can use next variables: +If you want to use monitoring (which we highly recommend), you have to change these variables: + ```yaml # Monitoring options section enable_monitoring: false @@ -27,11 +37,11 @@ prometheus_remote_write_url: "" prometheus_remote_write_auth: false prometheus_remote_write_auth_username: "" prometheus_remote_write_auth_password: "" -prometheus_remote_write_label: "" +prometheus_remote_write_common_label: "" ``` -This role also has option to secure your server and allow traffic only from specified ip in case if you want -to use some load balancer in front of your node: +This role also has option to secure your server and allow traffic only from specified IP address in case if you want +to use some load balancer in front of your node, while not having fancy cloud security groups at your disposal: ```yaml # Security options @@ -40,13 +50,13 @@ disable_ssh_password_auth: false iptables_packages: - iptables - iptables-persistent -# Variable can be used in case with accept external traffic only from one ip +# Variable to be used to accept external traffic only from single specified IP loadbalancer_ip: "" ``` -In some cases, you may need to change postgres parameters, so you can do it using `postgres_arguments` variable: -```yaml +In most of cases, you'd want to change PostgreSQL parameters (we recommend to use with "Online transaction processing system" preset as sane defaults), so you can do it using `postgres_arguments` variable, eg: +```yaml postgres_arguments: - log_error_verbosity=terse - -c @@ -54,49 +64,24 @@ postgres_arguments: - -c - shared_buffers=47616MB - -c - - effective_cache_size=142848MB - - -c - - maintenance_work_mem=2GB - - -c - - checkpoint_completion_target=0.9 - - -c - - wal_buffers=16MB - - -c - - default_statistics_target=500 - - -c - - random_page_cost=1.1 - - -c - - effective_io_concurrency=200 - - -c - - work_mem=2573kB - - -c - - huge_pages=try - - -c - - min_wal_size=4GB - - -c - - max_wal_size=16GB - - -c - - max_worker_processes=74 - - -c - - max_parallel_workers_per_gather=37 - - -c - - max_parallel_workers=74 - - -c - - max_parallel_maintenance_workers=4 - - -c - - checkpoint_timeout=1800 ``` -We recommend to use [pgtune](https://github.com/le0pard/pgtune) to choose optimal config for your hardware. + +We recommend using [online] or [self-hosted](https://github.com/le0pard/pgtune) version with with "Online transaction processing system" preset as a good starting point for generating optimal config for your hardware. ## Step-by-step guide 1. Install ansible collection on your machine from where you will run ansible: `ansible-galaxy collection install community.general` + 2. Prepare latest database backup on your host. you can download it from our [public GCS bucket](https://storage.googleapis.com/zksync-era-mainnet-external-node-backups/external_node_latest.pgdump). -you should place it to `{{ storage_directory }}/pg_backups` directory. By default, `{{ storage_directory }}` is `/usr/src/en` -3. **OPTIONAL**: If you already have external-node, you can copy tree directory to new host. Copy external-node database tree to `{{ storage_directory }}/db`. -**Keep in mind, tree should be older than postgres database backup.** +you should place it to `{{ storage_directory }}/pg_backups` directory. By default, `{{ storage_directory }}` is `/usr/src/en` + +3. **OPTIONAL**: If you already have external-node, you can copy tree directory to new host. Copy external-node database tree to `{{ storage_directory }}/db`. + +**Keep in mind, tree should be older than PostgreSQL database backup.** + 4. Run ansible-playbook using this role. We recommend to encrypt next variables with ansible-vault or some another way: + ``` database_username database_password @@ -104,8 +89,8 @@ eth_l1_url vm_auth_username vm_auth_password ``` -5. Connect to your host, and see status of postgres container. It can take a lot of time before postgres database backup will be restored -and postgres server will be ready for use. After postgres goes healty status, external-node runs automatically. + +5. Connect to your host, and see status of `postgres` container. It can take a lot of time before PostgreSQL database backup will be restored (hours to days, depending on your disk throughput and IOPS), after which PostgreSQL server will be ready for use. Once `postgres` becomes "healthy", `external_node` runs automatically. ## Example Playbook @@ -131,9 +116,9 @@ and postgres server will be ready for use. After postgres goes healty status, ex ## License -Ansible role for external node is distributed under the terms of either +Ansible role for zkSync Era External Node is distributed under the terms of either -- Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or ) -- MIT license ([LICENSE-MIT](LICENSE-MIT) or ) +* Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or ) +* MIT license ([LICENSE-MIT](LICENSE-MIT) or ) at your option. diff --git a/defaults/main.yml b/defaults/main.yml index c18fe9d..ad381f9 100644 --- a/defaults/main.yml +++ b/defaults/main.yml @@ -57,7 +57,6 @@ postgres_arguments: - -c - checkpoint_timeout=1800 - # Enable TLS for traefik enable_tls: false acme_email: "" @@ -87,7 +86,7 @@ prometheus_remote_write_url: "" prometheus_remote_write_auth: false prometheus_remote_write_auth_username: "" prometheus_remote_write_auth_password: "" -prometheus_remote_write_label: "" +prometheus_remote_write_common_label: "" # Security options use_predefined_iptables: false diff --git a/templates/docker-compose.yaml.j2 b/templates/docker-compose.yaml.j2 index 4fab4db..cd9a9c4 100644 --- a/templates/docker-compose.yaml.j2 +++ b/templates/docker-compose.yaml.j2 @@ -16,6 +16,9 @@ services: - "--certificatesresolvers.en_resolver.acme.storage=/letsencrypt/acme.json" - "--certificatesresolvers.myresolver.acme.email={{ acme_email }}" {% endif %} + - "--metrics.prometheus=true" + - "--metrics.prometheus.entryPoint=metrics" + - "--entryPoints.metrics.address=:8080" volumes: - "/var/run/docker.sock:/var/run/docker.sock" {% if enable_tls %} diff --git a/templates/monitoring.yaml.j2 b/templates/monitoring.yaml.j2 index 608b3ea..836986f 100644 --- a/templates/monitoring.yaml.j2 +++ b/templates/monitoring.yaml.j2 @@ -9,7 +9,7 @@ services: command: {% if prometheus_remote_write %} - "--remoteWrite.url={{ prometheus_remote_write_url }}" - - "--remoteWrite.label={{ prometheus_remote_write_label }}" + - "--remoteWrite.label={{ prometheus_remote_write_common_label }}" {% if prometheus_remote_write_auth %} - "--remoteWrite.basicAuth.username={{ prometheus_remote_write_auth_username }}" - "--remoteWrite.basicAuth.password={{ prometheus_remote_write_auth_password }}" diff --git a/templates/vmagent-config.yml.j2 b/templates/vmagent-config.yml.j2 index ad4966d..31781eb 100644 --- a/templates/vmagent-config.yml.j2 +++ b/templates/vmagent-config.yml.j2 @@ -38,3 +38,11 @@ scrape_configs: - source_labels: [instance] target_label: instance replacement: '{{ node_name | mandatory }}' + - job_name: traefik + static_configs: + - targets: + - "traefik:8080" + relabel_configs: + - source_labels: [instance] + target_label: instance + replacement: '{{ node_name | mandatory }}' From c113dbda19aee9f02c8bbf749efc5e8a7bae4b1b Mon Sep 17 00:00:00 2001 From: hatemosphere Date: Wed, 6 Mar 2024 15:30:40 +0100 Subject: [PATCH 2/3] typo fix --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a1177a5..5de0514 100644 --- a/README.md +++ b/README.md @@ -66,7 +66,7 @@ postgres_arguments: - -c ``` -We recommend using [online] or [self-hosted](https://github.com/le0pard/pgtune) version with with "Online transaction processing system" preset as a good starting point for generating optimal config for your hardware. +We recommend using pgtune [online] or [self-hosted](https://github.com/le0pard/pgtune) version with with "Online transaction processing system" preset as a good starting point for generating optimal config for your hardware. ## Step-by-step guide From 7b0912c225e41949b4fba31a0d7fe2874d73cd13 Mon Sep 17 00:00:00 2001 From: hatemosphere Date: Wed, 6 Mar 2024 15:34:13 +0100 Subject: [PATCH 3/3] proper optional vars example usage --- README.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 5de0514..be5e312 100644 --- a/README.md +++ b/README.md @@ -30,14 +30,14 @@ If you want to use monitoring (which we highly recommend), you have to change th ```yaml # Monitoring options section -enable_monitoring: false -node_name: "" -prometheus_remote_write: false -prometheus_remote_write_url: "" -prometheus_remote_write_auth: false -prometheus_remote_write_auth_username: "" -prometheus_remote_write_auth_password: "" -prometheus_remote_write_common_label: "" +enable_monitoring: true +node_name: "some-unique-node-identifier" +prometheus_remote_write: true +prometheus_remote_write_url: "https://metrics.example.org" +prometheus_remote_write_auth: true +prometheus_remote_write_auth_username: "admin" +prometheus_remote_write_auth_password: "password" +prometheus_remote_write_common_label: "matterlabs" ``` This role also has option to secure your server and allow traffic only from specified IP address in case if you want @@ -45,13 +45,13 @@ to use some load balancer in front of your node, while not having fancy cloud se ```yaml # Security options -use_predefined_iptables: false -disable_ssh_password_auth: false +use_predefined_iptables: true +disable_ssh_password_auth: true iptables_packages: - iptables - iptables-persistent # Variable to be used to accept external traffic only from single specified IP -loadbalancer_ip: "" +loadbalancer_ip: "1.2.3.4" ``` In most of cases, you'd want to change PostgreSQL parameters (we recommend to use with "Online transaction processing system" preset as sane defaults), so you can do it using `postgres_arguments` variable, eg: