I worked on lots of development tasks throughout the time I was at a pair of Data Centres. I also looked after safely deploying code releases to backend/frontend systems.
From memory, these are some tasks I completed:
- Reworked the block storage integration with OpenStack/Ceph, so the more reliable RBD driver is used instead of iSCSI, which previously caused CPU lockups. I subsequently updated the filesystem formatter to use the new block storage system.
- Fixed inaccessible, external, block storage volumes above ~1TB. This was an unsigned 32-bit integer bug in Xen, which only surfaced when accessing LBA sectors in a specific storage mode.
- Diagnosed and improved block storage performance by 5x by adjusting kernel parameters on guests to take advantage of the Xen ring 0 buffer.
- Used ‘fwmark’ing to mark packets in the Linux kernel on firewalls, allowing customers to bypass throttling for backups.
- Fixed server image creation when the disk partitions were non-standard formats or layouts.
- Parallelised and sped up setting up new servers by 50%, by preventing unnecessary reboots, synchronisation pauses, etc.
- Worked with another colleague to get the Windows OOBE (Out-Of-Box-Experience) working correctly with new infrastructure.
- Got multiple guest snapshots working in parallel, and helped out with getting more than one working on the same host, meaning guest servers can be transparently moved to another host even whilst customers are taking snapshots.
- Provided the ability to automatically install/reinstall software RAID monitoring so customers can get SMS/e-mail alerts on failures.
- Upgraded all staff and automated systems to use company-branded, mixed MIME type HTML e-mails, rather than just dated-looking plain text. You could write e-mails in Markdown format, with a live preview of conversions to HTML and plain text.
- Designed automated mailing system to notify staff about routine/emergency maintenance tasks.
- Fixed the staff maintenance mode system, which allows you to put servers in maintenance mode in bulk.
- Provided a notes feature in the Django management interface, allowing you to place generic customer notes on any table/record in the database.
- Added generic exporting (CSV) for many management pages, for Excel charting.
- Ported/fixed many packages so they worked on the latest versions of Ubuntu/Debian Linux.
- Wrote systemd daemoniser wrappers.
- Wrote Nginx/Apache webserver configurations for different systems.
- Fixed Selenium tests to be compatible with modern browsers and newest versions of Debian.
- Built new images for Jenkins.
- Helped upgrade the entire codebase to Django 2.x and Python 3.x
- Helped out with good practices on multi-threading/multi-processing.
- Used code introspection to automatically draw diagrams of most of the automated processes.
- Added a major feature missing in Zabbix 4.x, which the Zabbix community have been asking for over 10 years — automatically stop all the hosts alerting which are assigned to a proxy, when that proxy doesn’t report to a Zabbix server within a minute.
- Got the Zabbix agent working with customer servers in both Linux and Windows, securely communicating with the Zabbix proxies. Each server has a unique Pre-Shared Key, which can be reset if necessary. The servers can also only communicate with the proxies’ subnet.
- Created a multi-proxy balancing strategy that spreads load across the Data Centre when monitoring >4,500 hosts — also ensuring that proxies monitor other proxies/host servers, rather than their own ones.
- Improved the sync speed between the customer database and the Zabbix API from over 10 minutes to less than 2 minutes.
- Added the ability to silence alerting hosts immediately, rather than waiting for a full sync between the customer and Zabbix databases.
- Provided manual proxy configuration so servers outside the normal infrastructure can be monitored.
Note that the company was bought out shortly before I left, so a few of these may not be in production currently.