28 Commits

Author SHA1 Message Date
94db2c0a69 feat!: remove node exporter from monitoring stack (#9)
## What 
Remove node exporter from monitoring

## Why 
Redundant for this context 

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [x] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [x] Documentation comments have been added / updated.
2024-03-12 17:06:09 +02:00
26319af6d7 Merge pull request #8 from matter-labs/docs-envs
chore(docs): Better explain chain settings, minor tweaks
2024-03-08 09:24:14 +01:00
37bbe33ad7 minor tweaks 2024-03-08 09:18:55 +01:00
e776080ba4 typo fix 2024-03-08 09:17:00 +01:00
991e7ba587 typo fix 2024-03-08 09:16:24 +01:00
9f7f0bb6cd prettify dump links 2024-03-08 09:15:58 +01:00
ea06fe4c67 chore(docs): Better explain env settings, minor tweaks 2024-03-08 09:13:20 +01:00
254085bea2 Merge pull request #7 from matter-labs/pgtune-docs-fix
chore(docs): Fix duplicated pgtune block in README
2024-03-07 09:51:48 +01:00
b5cb15078b chore(docs): Fix duplicated pgtune block in README 2024-03-07 09:40:57 +01:00
46cd390d3d Merge pull request #6 from matter-labs/add_basic_auth
feat: add basic auth for external node
2024-03-07 09:36:13 +01:00
9fff909803 Update README.md 2024-03-07 09:35:33 +01:00
873119a54c Update README.md 2024-03-07 09:35:05 +01:00
3e46648470 Merge branch 'main' into add_basic_auth 2024-03-07 10:23:42 +02:00
86c976e375 chore(docs): README grammar fix (#5)
## What 

- Minor README grammar fix

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [x] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [x] Documentation comments have been added / updated.
2024-03-07 10:23:12 +02:00
831de81f98 update doc 2024-03-07 10:21:24 +02:00
7cd995dfb5 update doc 2024-03-07 10:20:27 +02:00
4f54d8a139 update doc 2024-03-07 10:12:20 +02:00
586fe6d4dc Update README.md 2024-03-07 09:03:52 +01:00
bb74dfc55f Update README.md 2024-03-07 09:03:29 +01:00
1114c9a4c6 update doc 2024-03-07 09:55:10 +02:00
950cbd4f7d update doc 2024-03-07 09:53:03 +02:00
e3f7fe3c87 update doc 2024-03-07 09:50:05 +02:00
118523076b feat: add basic auth for external node 2024-03-07 09:49:38 +02:00
9cebe8b7f4 Merge pull request #4 from matter-labs/traefik-metrics-readme-fix
feat!: Enable Traefik metrics, rename common label var, README fixes
2024-03-06 15:35:35 +01:00
7b0912c225 proper optional vars example usage 2024-03-06 15:34:13 +01:00
c113dbda19 typo fix 2024-03-06 15:30:40 +01:00
478c3f7ab9 feat!: Enable Traefik metrics, rename common label var, README fixes 2024-03-06 15:25:59 +01:00
41f491a0bd feat: move ssh password auth to dedicated task (#3)
## What 
Move task for disabling SSH password auth to dedicated task

## Why 
For more transparency

## Checklist

<!-- Check your PR fulfills the following items. -->
<!-- For draft PRs check the boxes as you complete them. -->

- [x] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [x] Documentation comments have been added / updated.
2024-03-06 15:29:36 +02:00
8 changed files with 101 additions and 99 deletions

127
README.md
View File

@ -1,12 +1,21 @@
# ansible-en-role
Ansible role for setup external node.
Ansible role to deploy and configure zkSync Era External Node, including DB instance setup on the same machine, Traefik as reverse proxy, and Prometheus monitoring (PostgreSQL exporter, cAdvisor, Traefik, External Node native metrics, and VictoriaMetrics vmagent to scrape all of them).
Make sure to configure Prometheus remote write endpoint to send metrics to centralized metrics storage.
Role has been tested and used internally on bare metal Hetzner instances.
## Requirements
This role has been tested on:
* Ubuntu 22.04, Jammy Jellyfish; Ansible 2.13.9
## Usage
This role contains variables which has to be set:
Minimal required variables that has to be set:
```yaml
database_name: ""
database_username: ""
@ -17,36 +26,39 @@ l1_chain_id: ""
l2_chain_id: ""
```
If you want to use monitoring, you can use next variables:
Please refer to [External Node docs](https://github.com/matter-labs/zksync-era/tree/main/docs/guides/external-node/prepared_configs) to find values for different zkSync Era chains.
If you want to use monitoring (which we highly recommend), you have to change these variables:
```yaml
# Monitoring options section
enable_monitoring: false
node_name: ""
prometheus_remote_write: false
prometheus_remote_write_url: ""
prometheus_remote_write_auth: false
prometheus_remote_write_auth_username: ""
prometheus_remote_write_auth_password: ""
prometheus_remote_write_label: ""
enable_monitoring: true
node_name: "some-unique-node-identifier"
prometheus_remote_write: true
prometheus_remote_write_url: "https://metrics.example.org"
prometheus_remote_write_auth: true
prometheus_remote_write_auth_username: "admin"
prometheus_remote_write_auth_password: "password"
prometheus_remote_write_common_label: "matterlabs"
```
This role also has option to secure your server and allow traffic only from specified ip in case if you want
to use some load balancer in front of your node:
This role also has the option to secure your server and allow traffic only from specified IP address in case if you want
to use some load balancer in front of your node, while not having fancy cloud security groups at your disposal:
```yaml
# Security options
use_predefined_iptables: false
disable_ssh_password_auth: false
use_predefined_iptables: true
disable_ssh_password_auth: true
iptables_packages:
- iptables
- iptables-persistent
# Variable can be used in case with accept external traffic only from one ip
loadbalancer_ip: ""
# Variable to be used to accept external traffic only from single specified IP
loadbalancer_ip: "1.2.3.4"
```
In some cases, you may need to change postgres parameters, so you can do it using `postgres_arguments` variable:
```yaml
In most cases, you'd want to change PostgreSQL parameters, so you can do it using `postgres_arguments` variable, eg:
```yaml
postgres_arguments:
- log_error_verbosity=terse
- -c
@ -54,49 +66,40 @@ postgres_arguments:
- -c
- shared_buffers=47616MB
- -c
- effective_cache_size=142848MB
- -c
- maintenance_work_mem=2GB
- -c
- checkpoint_completion_target=0.9
- -c
- wal_buffers=16MB
- -c
- default_statistics_target=500
- -c
- random_page_cost=1.1
- -c
- effective_io_concurrency=200
- -c
- work_mem=2573kB
- -c
- huge_pages=try
- -c
- min_wal_size=4GB
- -c
- max_wal_size=16GB
- -c
- max_worker_processes=74
- -c
- max_parallel_workers_per_gather=37
- -c
- max_parallel_workers=74
- -c
- max_parallel_maintenance_workers=4
- -c
- checkpoint_timeout=1800
```
We recommend to use [pgtune](https://github.com/le0pard/pgtune) to choose optimal config for your hardware.
We recommend using pgtune [online](https://pgtune.leopard.in.ua/) or [self-hosted](https://github.com/le0pard/pgtune) version with "Online transaction processing system" preset as a good starting point for generating optimal config for your hardware.
If you want to use basic auth for inbound requests, you have to change next variables:
```yaml
# Enable basic auth for external node
enable_basic_auth: true
basic_auth_secret: "htpasswd-generated-secret"
```
Basic auth secret can be generated by `htpasswd` and `sed` for interpolation:
```echo $(htpasswd -nb <username> <password>) | sed -e s/\\$/\\$\\$/g```
## Step-by-step guide
1. Install ansible collection on your machine from where you will run ansible:
1. Install the ansible collection on your machine from where you will run ansible:
`ansible-galaxy collection install community.general`
2. Prepare latest database backup on your host. you can download it from our [public GCS bucket](https://storage.googleapis.com/zksync-era-mainnet-external-node-backups/external_node_latest.pgdump).
you should place it to `{{ storage_directory }}/pg_backups` directory. By default, `{{ storage_directory }}` is `/usr/src/en`
3. **OPTIONAL**: If you already have external-node, you can copy tree directory to new host. Copy external-node database tree to `{{ storage_directory }}/db`.
**Keep in mind, tree should be older than postgres database backup.**
4. Run ansible-playbook using this role. We recommend to encrypt next variables with ansible-vault or some another way:
2. Prepare the latest database backup on your host. you can download it from our public GCS buckets:
* [Era Mainnet latest dump](https://storage.googleapis.com/zksync-era-mainnet-external-node-backups/external_node_latest.pgdump)
* [Era Sepolia Testnet latest dump](https://storage.googleapis.com/zksync-era-boojnet-external-node-snapshots/external_node_latest.pgdump)
* [Era Goerli Testnet latest dump](https://storage.googleapis.com/zksync-era-testnet-external-node-backups/external_node_latest.pgdump)
Downloaded dump file should be placed into `{{ storage_directory }}/pg_backups` directory (`/usr/src/en/pg_backups` by default)
3. **OPTIONAL**: If you already have running node, you can copy its tree and state directory to a new host's `{{ storage_directory }}/db`. (`/usr/src/en/db` by default)
**Keep in mind that tree and state should be older than PostgreSQL database backup.**
4. Run ansible-playbook using this role. We recommend encrypting next variables with ansible-vault or some another way:
```
database_username
database_password
@ -104,8 +107,8 @@ eth_l1_url
vm_auth_username
vm_auth_password
```
5. Connect to your host, and see status of postgres container. It can take a lot of time before postgres database backup will be restored
and postgres server will be ready for use. After postgres goes healty status, external-node runs automatically.
5. Connect to your host, and see status of `postgres` container. It can take a lot of time before PostgreSQL database backup will be restored (hours to days, depending on your disk throughput and IOPS), after which PostgreSQL server will be ready for use. Once `postgres` becomes "healthy", `external_node` runs automatically.
## Example Playbook
@ -131,9 +134,9 @@ and postgres server will be ready for use. After postgres goes healty status, ex
## License
Ansible role for external node is distributed under the terms of either
Ansible role for zkSync Era External Node is distributed under the terms of either
- Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or <http://www.apache.org/licenses/LICENSE-2.0>)
- MIT license ([LICENSE-MIT](LICENSE-MIT) or <https://opensource.org/blog/license/mit/>)
* Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or <http://www.apache.org/licenses/LICENSE-2.0>)
* MIT license ([LICENSE-MIT](LICENSE-MIT) or <https://opensource.org/blog/license/mit/>)
at your option.

View File

@ -13,7 +13,6 @@ traefik_version: 2.11
postgres_version: 14
external_node_version: 21.0.2
vmagent_version: 1.95.1
node_exporter_version: 1.7.0
cadvisor_version: 0.47.2
postgres_exporter_version: 0.15.0
@ -57,12 +56,15 @@ postgres_arguments:
- -c
- checkpoint_timeout=1800
# Enable TLS for traefik
enable_tls: false
acme_email: ""
domain_name: ""
# Enable basic auth for external node
enable_basic_auth: false
basic_auth_secret: ""
# Force restore pg database
force_pg_restore: false
@ -87,7 +89,7 @@ prometheus_remote_write_url: ""
prometheus_remote_write_auth: false
prometheus_remote_write_auth_username: ""
prometheus_remote_write_auth_password: ""
prometheus_remote_write_label: ""
prometheus_remote_write_common_label: ""
# Security options
use_predefined_iptables: false

View File

@ -65,16 +65,3 @@
ip_version: ipv6
state: saved
path: /etc/iptables/rules.v6
- name: Disable SSH password authentication
when: disable_ssh_password_auth
ansible.builtin.lineinfile:
path: /etc/ssh/sshd_config
regexp: '^#PasswordAuthentication yes'
line: 'PasswordAuthentication no'
- name: Restart ssh
when: disable_ssh_password_auth
ansible.builtin.service:
name: ssh
state: restarted

View File

@ -3,5 +3,9 @@
ansible.builtin.include_tasks: firewall.yml
when: use_predefined_iptables
- name: Disable SSH password auth
ansible.builtin.include_tasks: ssh-config.yml
when: disable_ssh_password_auth
- name: Prepare configs
ansible.builtin.include_tasks: provision.yml

11
tasks/ssh-config.yml Normal file
View File

@ -0,0 +1,11 @@
---
- name: Disable SSH password authentication
ansible.builtin.lineinfile:
path: /etc/ssh/sshd_config
regexp: '^#PasswordAuthentication yes'
line: 'PasswordAuthentication no'
- name: Restart ssh
ansible.builtin.service:
name: ssh
state: restarted

View File

@ -16,6 +16,9 @@ services:
- "--certificatesresolvers.en_resolver.acme.storage=/letsencrypt/acme.json"
- "--certificatesresolvers.myresolver.acme.email={{ acme_email }}"
{% endif %}
- "--metrics.prometheus=true"
- "--metrics.prometheus.entryPoint=metrics"
- "--entryPoints.metrics.address=:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
{% if enable_tls %}
@ -66,6 +69,10 @@ services:
- "traefik.http.routers.external_node_health.rule=PathPrefix(`/`)"
- "traefik.http.routers.external_node_health.entrypoints=external_node_health"
- "traefik.http.routers.external_node_health.service=external_node_health"
{% if enable_basic_auth %}
- "traefik.http.routers.external_node_main.middlewares=external_node_auth"
- "traefik.http.middlewares.external_node_auth.basicauth.users={{ basic_auth_secret }}"
{% endif %}
expose:
- {{ rpc_http_port }}
- {{ rpc_ws_port }}

View File

@ -9,7 +9,7 @@ services:
command:
{% if prometheus_remote_write %}
- "--remoteWrite.url={{ prometheus_remote_write_url }}"
- "--remoteWrite.label={{ prometheus_remote_write_label }}"
- "--remoteWrite.label={{ prometheus_remote_write_common_label }}"
{% if prometheus_remote_write_auth %}
- "--remoteWrite.basicAuth.username={{ prometheus_remote_write_auth_username }}"
- "--remoteWrite.basicAuth.password={{ prometheus_remote_write_auth_password }}"
@ -21,18 +21,6 @@ services:
- "--remoteWrite.vmProtoCompressLevel=2"
restart: always
node-exporter:
image: "prom/node-exporter:v{{ node_exporter_version }}"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
restart: unless-stopped
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'
cadvisor:
image: "gcr.io/cadvisor/cadvisor:v{{ cadvisor_version }}"
volumes:

View File

@ -14,14 +14,6 @@ scrape_configs:
- source_labels: [instance]
target_label: instance
replacement: '{{ node_name | mandatory }}'
- job_name: node-exporter
static_configs:
- targets:
- "node-exporter:9100"
relabel_configs:
- source_labels: [instance]
target_label: instance
replacement: '{{ node_name | mandatory }}'
- job_name: cadvisor
static_configs:
- targets:
@ -38,3 +30,11 @@ scrape_configs:
- source_labels: [instance]
target_label: instance
replacement: '{{ node_name | mandatory }}'
- job_name: traefik
static_configs:
- targets:
- "traefik:8080"
relabel_configs:
- source_labels: [instance]
target_label: instance
replacement: '{{ node_name | mandatory }}'