XCP-NG Hypervisor
Virtual Host Project

virtual host dashboard

Project Overview

This project involves setting up and managing a virtualized environment using XCP-ng, an open-source enterprise-level hypervisor, to create and manage multiple virtual machines (VMs) on a single physical host. This project highlights my proficiency in virtualization technologies, crucial for efficient resource utilization, system isolation, and scalable IT infrastructure management in a Linux environment.

Hardware Configuration

  • Host Server Hardware: The server is built on a custom setup on a HP DL380e Gen8 enterprise server, ensuring high performance and reliability. The hardware includes:

    • Processors: 2x Intel Xeon E5-2450L with 8-cores, 16-threads each; offering multiple cores for efficient VM hosting.

    • Memory: 96GB DDR3 ECC RAM, providing ample resources for running multiple virtual machines simultaneously.

    • Storage: A combination of high-capacity HDDs and SSDs configured for performance and redundancy, including:

      • 7 x 1TB WD Blue SATA HDDs for main storage of VM disks (RAID6 for fault tolerance)

      • 1 x 1TB WD Blue SATA HDD for the operating system

      • 1 x 2TB WD M.2 NVMe on PCIe Expansion Card for VMs that can benefit from low latency storage

    • Network Interface: Dual 10GbE SFP+ network card to ensure fast data transfer and minimal latency over fiber, as well as a quad 1GbE network card for the management interface and redundancy.

Software Configuration

  • Hypervisor: XCP-ng, chosen for its powerful features including support for dual-processor hardware, compatibility with XenServer, and strong community support.

    • Management: Xen Orchestra (XO) is implemented for centralized management of the virtual environment, providing an intuitive web interface to monitor, manage, and maintain VMs.

Networking Configuration

  • Network Interfaces: Multiple network interfaces are connected to ensure redundancy and increase throughput, using a single 10GbE fiber to handle the primary bandwidth with the four 1GbE ports providing redundancy for both the VMs and management interface.

  • Firewall and Security: A separate OPNsense firewall is deployed to manage firewall rules, VPN access, and network segmentation, ensuring the virtual environment is secure from external threats.

Backup and Disaster Recovery

  • Snapshots: Disaster Recovery (DR) VMs of all VMs are scheduled using Xen Orchestra to be created monthly, full backup snapshots are taken once every 2 weeks and delta backup snapshots are completed nightly on all nights not scheduled to run another scheduled backup, allowing quick recovery in case of failure or data corruption. All of the snapshots are sent to an NFS share on TrueNAS and the Disaster Recover VMs are sent to an iSCSI share to make full use of the bandwidth to TrueNAS.

  • Off-site Backup: An automated backup script on the TrueNAS server is used to transfer DR VMs, VM snapshots, host server metadata, and other critical data to an encrypted, off-site storage location, ensuring business continuity.

  • Backup Testing: DR VMs are booted monthly to ensure the VM will start and run correctly in the environment. Restoring from snapshots has been tested multiple times from both the full and delta snapshots. Snapshot restores are tested quarterly to ensure that the system can restore the snapshots successfully.

Use Cases

  • Testing and Development: The virtual environment serves as a sandbox for testing new applications, updates, and security patches before deployment in a production environment.

  • Web Hosting: Multiple VMs are used to host and manage multiple web applications, several of which are using multiple Docker containers, demonstrating the ability to run a full-stack web environment.

  • Monitoring and Automation: Multiple VMs are configured to collect and centrally manage data of the entire environment and automate configuration of new VMs.

Challenges and Solutions

  • Challenge: Initially, there were issues with network performance with VM backups failing or taking days to complete the transfer to TrueNAS.

    • Solution: This was resolved by optimizing the schedules for the backups to isolate the amount of time needed to complete the backups, as well as configuring the iSCSI share to handle the bandwidth needed for the creation of multiple disaster recover VMs.

  • Challenge: Managing storage I/O contention between multiple VMs.

    • Solution: Installed PCIe expansion card with NVMe drive to allow critical VMs that required more I/O to be offloaded to that drive, improving their performance and reducing contention.

  • Challenge: Managing memory and CPU resources between multiple VMs.

    • Solution: Audited the memory and CPU use of each VM on the host and reduced resources as needed to allow the addition of more VMs.

Key Takeaways

This project allowed me to gain hands-on experience with enterprise-level virtualization technologies, specifically using XCP-ng and Xen Orchestra. It demonstrates my ability to design, implement, and manage a virtualized IT infrastructure, crucial for efficient resource management and scalability in a Linux system administrator role.