Hypervisor From Scratch

Hypervisor From Scratch 系列文章学习笔记

实验代码

Part 1: 基本概念和环境搭建

Intel VT-x Virtual Machine eXtension (VMX)
AMD AMD-V Secure Virtual Machine (SVM)
1
2
[HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Control/Session Manager/Debug Print Filter] 
DEFAULT=dword:0000000f
关键概念
Virtual Machine Monitor (VMM) VMM acts as a host and has full control of the processor(s) and other platform hardware. A VMM is able to retain selective control of processor resources, physical memory, interrupt management, and I/O.
Guest Software Each virtual machine (VM) is a guest software environment.
VMX Root Operation and VMX Non-root Operation A VMM will run in VMX root operation and guest software will run in VMX non-root operation
VMX transitions Transitions between VMX root operation and VMX non-root operation.
VM entries Transitions into VMX non-root operation.
Extended Page Table (EPT) A modern mechanism which uses a second layer for converting the guest physical address to host physical address.
VM exits Transitions from VMX non-root operation to VMX root operation.
Virtual machine control structure (VMCS) a data structure in memory that exists exactly once per VM (or more precisely one per each VCPU [Virtual CPU]), while it is managed by the VMM. With every change of the execution context between different VMs, the VMCS is restored for the current VM, defining the state of the VM’s virtual processor and VMM control Guest software using VMCS.

VMCS由6部分组成: VMCS结构图

Guest-state area Processor state saved into the guest state area on VM exits and loaded on VM entries.
Host-state area Processor state loaded from the host state area on VM exits.
VM-execution control fields Fields controlling processor operation in VMX non-root operation.
VM-exit control fields Fields that control VM exits.
VM-entry control fields Fields that control VM entries.
VM-exit information fields Read-only fields to receive information on VM exits describing the cause and the nature of the VM exit.
VMX指令
INVEPT Invalidate Translations Derived from EPT
INVVPID Invalidate Translations Based on VPID
VMCALL Call to VM Monitor
VMCLEAR Clear Virtual-Machine Control Structure
VMFUNC Invoke VM function
VMLAUNCH Launch Virtual Machine
VMRESUME Resume Virtual Machine
VMPTRLD Load Pointer to Virtual-Machine Control Structure
VMPTRST Store Pointer to Virtual-Machine Control Structure
VMREAD Read Field from Virtual-Machine Control Structure
VMWRITE Write Field to Virtual-Machine Control Structure
VMXOFF Leave VMX Operation
VMXON Enter VMX Operation

VMM生命周期

Part 2: 开启VMX

  • 关闭驱动签名验证 bcdedit.exe /set nointegritychecks on
  • 打开测试模式 bcdedit /set testsigning on
  • 一边按住Shift一边重启,选择7关闭签名验证(仅本次有效)

  • IRP Major Functions

    • IRP_MJ_CREATE
    • IRP_MJ_DEVICE_CONTROL
  • IRP Minor Functions (mainly used for PnP manager to notify for a special event)
  • Fast I/O

    • 先、快于IRP
  • 检查Hypervisor支持

    • EAX = 0 CPUID检查字符串GenuineIntel,判断是否为Intel处理器
    • CPUID.1:ECX.VMX[bit 5] = 1,判断是否支持VMX
  • 开启VMX

    • CR4.VMXE[bit 13] = 1
  • MyHypervisorDriver调用IoCreateDevice函数创建设备并为其IRP_MJ_CREATE指定DrvCreate函数,DrvCreate函数开启VMX,并调用DbgPrint函数输出,用Dbgview中查看。

  • MyHypervisorApp利用__asm检查CPU是否支持VMX,如果支持则调用CreateFile函数打开设备,触发驱动的DrvCreate函数,开启VMX。

Part 3: 建立第一个虚拟机

  • IOCTL(32bit)
    • METHOD_BUFFERED
    • METHOD_NIETHER
    • METHOD_IN_DIRECT
    • METHOD_OUT_DIRECT
内核函数/宏
KeQueryActiveProcessorCount 获取逻辑处理器数
KeSetSystemAffinityThread 将当前线程分配到某个逻辑处理器上
KeRevertToUserAffinityThread 恢复线程运行的处理器
MmGetPhysicalAddress 虚拟地址->物理地址
MmGetVirtualForPhysical 物理地址->虚拟地址
MmAllocateContiguousMemory 申请对齐的连续内存
MmFreeContiguousMemory 释放上述内存
RtlSecureZeroMemory 初始化内存为0
__readmsr 读取IA32_FEATURE_CONTROL_MSR,用于检查VMX支持、保存RevisionId
__vmx_on
__vmx_vmptrld
__vmx_off
PAGED_CODE 确保调用线程运行在一个允许分页的足够低IRQL级别

VMX生命周期

VMCS结构

  • MyHypervisorDriver调用IoCreateDevice函数创建设备并为其IRP_MJ_CREATE指定DrvCreate函数,为IRP_MJ_CLOSE指定DrvClose函数。DrvCreate函数初始化VMX,为每个逻辑处理器开启VMX并为VMXONVMCS分配内存,DrvClose则相反。
  • MyHypervisorApp调用CreateFile函数打开设备,触发驱动的DrvCreate函数,开启VMX,并为每个逻辑处理器的VMXONVMCS分配内存;调用CloseHandle关闭设备,触发驱动的DrvClose函数,关闭VMX并释放内存。

  • 用户态和VMM驱动交互

    • MyHypervisorDriver为IRP_MJ_DEVICE_CONTROL指定DrvIOCTLDispatcher函数,与MyHypervisorApp通过IOCTL通信(内存读取)。MyHypervisorApp调用DeviceIoControl函数并指定IoControlCode触发DrvIOCTLDispatcher

Part 4: 使用EPT进行地址转换

  • Turning the Pages: Introduction to Memory Paging on Windows 10 x64
    • Windbg
      • !pte
        • PXE = PML4E
        • PPE = PDPE
      • !vtop 将虚拟地址转换为物理地址
  • Introduction to IA-32e hardware paging

    • Intel 64位分页机制
    • PG flag: CR0[bit 31]: 开启分页
    • Physical Address Extension (PAE): CR4[bit 5]: 未设置则使用32bit分页
    • Long Mode Enable (LME): Extended Feature Enable Register (IA32_EFER MSR)[bit 8]: 未设置则使用PAE 36bit分页,否则使用64bit的4层分页机制
      • Page Frame Number (PFN): the next paging structure in the hierarchy (0x1000 4 KB)
      • 4096bytes 512 entries(PFN)
      • CR3保存第一个页结构的物理地址
      • 虚拟地址
        • [bits 63-48] 保留
        • [bits 47-39] a PML4 table (located in CR3) offset
        • [bits 38-30] a Page Directory Pointer Table (PDPT) offset
        • [bits 29-21] a Page Directory (PD) offset
        • [bits 20-12] a Page Table (PT) offset
        • [bits 11-00] 物理页中的偏移
    • 使用Windbg具体分析
  • Second Level Address Translation (SLAT) or nested paging

    • an extended layer in the paging mechanism
    • hardware-based virtualization virtual addresses -> physical memory
    • 实现
      • AMD: Rapid Virtualization Indexing (RVI)/Nested Page Tables (NPT)
      • Intel: Extended Page Table (EPT)
      • ARM: Stage-2 page-tables
    • 两种方法
      • Shadow Page Tables (Software-assisted paging)
      • Extended Page Tables (Hardware-assisted paging)
        • Page table maintained by guest OS generate the guest-physical address.
        • Page table maintained by VMM map guest physical address to host physical address.

EPT地址转换

EPTP结构

EPT整体结构

MyHypervisorDriver添加了EPT的初始化代码,为64bit的4层地址转换的每个结构表申请内存空间。

Part 5: 建立VMCS并在虚拟机中执行代码

内核函数
__vmx_vmptrst
__vmx_vmclear
__vmx_vmptrld
__vmx_vmlaunch
__vmx_vmread
__vmx_vmwrite 将数据写入VMCS的指定字段
__vmx_vmresume
VMX Controls
VM-Execution Controls Primary Processor-Based VM-Execution Controls
Secondary Processor-Based VM-Execution Controls
设置VMCS
VM-entry Control Bits
VM-exit Control Bits
PIN-Based Execution Control governs the handling of asynchronous events
Activity State 0:Active 1:HLT 2:Shutdown 3:Wait-for-SIPI
Interruptibility State permit certain events to be blocked for a period of time

MyHypervisorDriver通过__vmx_vmwrite设置VMCS的各项内容(非常复杂),并调用LaunchVM在第0号虚拟处理器上设置VMCS(设置CPU_BASED_HLT_EXITING即在HLT时调用VM-Exit,设置HOST_RIP指向VMExitHandler在处理EXIT_REASON_HLT中调用VMXOFF),最后调用__vmx_vmlaunch执行HLT(\xF4),触发VM-Exit

Part 6: 虚拟化正在运行的系统

CPU_BASED_VM_EXEC_CONTROL CPU_BASED_ACTIVATE_MSR_BITMAP
SECONDARY_VM_EXEC_CONTROL CPU_BASED_CTL2_RDTSCP
CPU_BASED_CTL2_ENABLE_INVPCID
CPU_BASED_CTL2_ENABLE_XSAVE_XRSTORS

IRQL(Interrupt Request Level): a Windows-specific mechanism to manage interrupts or giving priority by their level so raising IRQL means your routine will execute with higher priority than normal Windows codes (PASSIVE LEVEL & APC LEVEL).

1
2
KIRQL OldIrql = KeRaiseIrqlToDpcLevel(); // raises the IRQL to Dispatch Level so the Windows Scheduler can’t kick in to change the context
KeLowerIrql(OldIrql);

CPUID is one the main instructions that cause the VM-Exit.

内核函数
_cpuidex CPUID HYPERV_HYPERVISOR_PRESENT_BIT

Defeating malware’s Anti-VM techniques (CPUID-Based Instructions)