Advanced VM Lifecycle Management¶
This document describes the advanced VM lifecycle features in VirtRigaud, including reconfiguration, snapshots, cloning, multi-VM sets, and placement policies.
Overview¶
VirtRigaud Stage E introduces comprehensive VM lifecycle management capabilities that go beyond basic create/delete operations:
- VM Reconfiguration: Modify CPU, memory, and disk resources of running VMs
- Snapshot Management: Create, delete, and revert VM snapshots
- VM Cloning: Create new VMs from existing ones with linked clone support
- Multi-VM Sets: Manage groups of VMs with rolling updates
- Placement Policies: Advanced placement rules and anti-affinity constraints
- Image Preparation: Automated image import and preparation workflows
VM Reconfiguration¶
Online vs Offline Reconfiguration¶
VirtRigaud supports both online (hot) and offline reconfiguration depending on provider capabilities:
vSphere: Supports online CPU/memory changes and hot disk expansion Libvirt: Typically requires power cycle for resource changes
Example: CPU/Memory Upgrade¶
# Original VM with 2 CPU, 4GB RAM
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-server
spec:
resources:
cpu: 2
memoryMiB: 4096
# Patch to upgrade resources
# kubectl patch vm web-server --type merge -p '{"spec":{"resources":{"cpu":4,"memoryMiB":8192}}}'
The controller will: 1. Detect resource changes in VM spec 2. Attempt online reconfiguration if supported 3. If offline required, orchestrate graceful power cycle: - Set condition ReconfigurePendingPowerCycle=True - Power off VM gracefully - Apply reconfiguration - Power on VM - Update status.lastReconfigureTime
Disk Expansion¶
spec:
disks:
- name: data
sizeGiB: 100 # Expanded from 50GB
expandPolicy: "Online" # Try online first
Snapshot Management¶
Creating Snapshots¶
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMSnapshot
metadata:
name: pre-maintenance-backup
spec:
vmRef:
name: web-server
nameHint: "maintenance-backup"
memory: true # Include memory state
description: "Backup before maintenance"
retentionPolicy:
maxAge: "7d"
deleteOnVMDelete: true
Snapshot Lifecycle¶
- Creating: Snapshot creation in progress
- Ready: Snapshot available for use
- Deleting: Snapshot being removed
- Failed: Snapshot operation failed
Reverting to Snapshots¶
The controller will: 1. Power off VM if running 2. Call provider's SnapshotRevert RPC 3. Power on VM 4. Clear revertToRef when complete
VM Cloning¶
Basic Cloning¶
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClone
metadata:
name: web-server-clone
spec:
sourceRef:
name: web-server
target:
name: web-server-test
classRef:
name: test-class
linked: true # Faster, space-efficient
powerOn: true
Clone Customization¶
spec:
customization:
hostname: web-server-test
networks:
- name: primary
ipAddress: "192.168.1.100"
gateway: "192.168.1.1"
dns: ["8.8.8.8"]
userData:
cloudInit:
inline: |
#cloud-config
runcmd:
- echo "Test environment" > /etc/motd
Multi-VM Sets (VMSet)¶
VMSets provide declarative management of multiple VMs with rolling updates.
Basic VMSet¶
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMSet
metadata:
name: web-tier
spec:
replicas: 3
selector:
matchLabels:
app: web-server
template:
metadata:
labels:
app: web-server
spec:
providerRef:
name: vsphere-prod
classRef:
name: web-class
imageRef:
name: nginx-image
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
Rolling Updates¶
When you update the template spec, VMSet will: 1. Create new VMs with updated configuration 2. Wait for new VMs to be ready 3. Delete old VMs respecting maxUnavailable 4. Continue until all replicas are updated
Placement Policies¶
Advanced Placement Rules¶
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMPlacementPolicy
metadata:
name: production-policy
spec:
hard:
clusters: ["prod-cluster-1", "prod-cluster-2"]
datastores: ["ssd-datastore-1", "ssd-datastore-2"]
hosts: ["esxi-01", "esxi-02", "esxi-03"]
soft:
folders: ["/Production/WebServers"]
zones: ["zone-a", "zone-b"]
antiAffinity:
hostAntiAffinity: true # Spread across hosts
clusterAntiAffinity: false
datastoreAntiAffinity: true # Spread across datastores
Using Placement Policies¶
The provider will attempt to satisfy: 1. Hard constraints: Must be satisfied 2. Soft constraints: Best effort 3. Anti-affinity rules: Avoid co-location
Image Preparation¶
Automated Image Import¶
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: ubuntu-22-04
spec:
vsphere:
ovaURL: "https://releases.ubuntu.com/22.04/ubuntu-22.04-server.ova"
checksum: "sha256:abcd1234..."
libvirt:
url: "https://cloud-images.ubuntu.com/22.04/ubuntu-22.04-server.img"
format: "qcow2"
prepare:
onMissing: "Import" # Auto-import if missing
validateChecksum: true
timeout: "30m"
retries: 3
storage:
vsphere:
datastore: "images-datastore"
folder: "/Templates"
thinProvisioned: true
Image Preparation Phases¶
- Pending: Waiting to start preparation
- Importing: Downloading/importing image
- Preparing: Processing image (conversion, etc.)
- Ready: Image ready for use
- Failed: Preparation failed
Provider Capabilities¶
Different providers support different features. Query capabilities:
# Example capabilities response
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
status:
capabilities:
supportsReconfigureOnline: true # vSphere: true, Libvirt: false
supportsDiskExpansionOnline: true # vSphere: true, Libvirt: false
supportsSnapshots: true # Both: true
supportsMemorySnapshots: true # vSphere: true, Libvirt: varies
supportsLinkedClones: true # Both: true
supportsImageImport: true # Both: true
supportedDiskTypes: ["thin", "thick"]
supportedNetworkTypes: ["VMXNET3", "E1000"]
Observability¶
Metrics¶
New metrics for advanced lifecycle operations:
virtrigaud_vm_reconfigure_total{provider_type,outcome}
virtrigaud_vm_snapshot_total{action,provider_type,outcome}
virtrigaud_vm_clone_total{linked,provider_type,outcome}
virtrigaud_vm_image_prepare_total{provider_type,outcome}
Events¶
Detailed events for lifecycle operations:
Normal SnapshotCreating Started snapshot creation
Normal SnapshotReady Snapshot created successfully
Normal ReconfigureStarted Started VM reconfiguration
Warning ReconfigurePowerCycle Reconfiguration requires power cycle
Normal CloneCompleted VM clone created successfully
Conditions¶
Comprehensive condition reporting:
VM Conditions: - Ready: VM is ready for use - Provisioning: VM is being created - Reconfiguring: VM is being reconfigured - ReconfigurePendingPowerCycle: Needs power cycle for changes
Snapshot Conditions: - Ready: Snapshot is ready - Creating: Snapshot being created - Deleting: Snapshot being deleted
Clone Conditions: - Ready: Clone completed successfully - Cloning: Clone operation in progress - Customizing: Applying customizations
Best Practices¶
Snapshot Management¶
- Retention Policies: Always set appropriate retention policies
- Memory Snapshots: Use sparingly due to storage overhead
- Cleanup: Implement automated cleanup for old snapshots
- Testing: Test snapshot revert procedures regularly
VM Reconfiguration¶
- Gradual Changes: Make incremental resource changes
- Monitoring: Monitor VM performance after changes
- Rollback Plan: Have snapshots before major changes
- Capacity Planning: Ensure host resources before scaling up
Placement Policies¶
- Start Simple: Begin with basic constraints
- Test Anti-Affinity: Verify rules work as expected
- Monitor Placement: Check actual VM placement matches policy
- Balance Performance: Don't over-constrain placement
Multi-VM Operations¶
- Rolling Updates: Use appropriate
maxUnavailablesettings - Health Checks: Implement proper readiness checks
- Monitoring: Monitor rollout progress
- Rollback Strategy: Plan for rollback scenarios
Troubleshooting¶
Common Issues¶
Reconfiguration Fails: - Check provider capabilities - Verify resource availability on host - Check for VM tools/agent issues
Snapshot Operations Fail: - Verify storage backend supports snapshots - Check available storage space - Ensure VM is not in transitional state
Clone Customization Issues: - Verify network configuration - Check cloud-init/guest tools - Validate IP address availability
Placement Policy Violations: - Check resource availability in target locations - Verify anti-affinity rules aren't too restrictive - Review cluster resource distribution
Debugging¶
# Check VM reconfiguration status
kubectl describe vm web-server
# Monitor snapshot progress
kubectl get vmsnapshots -w
# Check clone status
kubectl describe vmclone web-server-clone
# Review placement policy usage
kubectl describe vmplacementpolicy production-policy
# Check VMSet rollout
kubectl describe vmset web-tier
Migration from Basic VMs¶
Existing VMs can be enhanced with advanced features:
- Add Placement Policy: Update VM spec with
placementRef - Enable Reconfiguration: Add resource overrides
- Create Snapshots: Deploy VMSnapshot resources
- Scale with VMSets: Migrate to VMSet for multi-instance workloads
The controller maintains backward compatibility with existing VM definitions.