Обязанности:
We are looking for Incident engineer who will be responsible for ensuring the reliable operation of our platform, working with metrics to improve production process efficiency, and participating in testing new product versions. Responsibilities: Monitoring the operation of the platform and its environment, detecting anomalies in the system’s performance (using tools such as Zabbix, Grafana, Prometheus/ELK). Investigating incidents related to software, networks, and equipment. Independently resolving failures in real-time where possible, or together with DevOps, development, and QA teams. Processing data necessary for issuing resolutions using ClickHouse. Configuring and optimizing monitoring, alerting, and logging tools. Automating and optimizing routine processes (using Python). Skills and experience: Experience working in support, maintenance, analytics, software administration, DevOps, or SRE. Experience in using Python, SQL scripts for automation and optimization. Skill of working with Linux OS and network technologies, including TCP/IP, routing, VPN, HTTP and WebSocket. Experience with monitoring/logging tools such as Zabbix, Grafana, Prometheus and ELK. English for reading technical documentation (B1-B2). Nice to have: Experience working with trading/algo-trading/investment infrastructure. We offer: Work for a modern international technology company without bureaucracy, legacy systems, or technical debt. Excellent opportunities for professional growth and self-realization. We work remotely from anywhere in the world. We offer compensation for health insurance, sports activities, and non-professional training.