Hypervisor From Scratch 系列文章学习笔记
Part 1: 基本概念和环境搭建
Intel | VT-x | Virtual Machine eXtension (VMX) |
AMD | AMD-V | Secure Virtual Machine (SVM) |
1 | [HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Control/Session Manager/Debug Print Filter] |
关键概念 | |
---|---|
Virtual Machine Monitor (VMM) | VMM acts as a host and has full control of the processor(s) and other platform hardware. A VMM is able to retain selective control of processor resources, physical memory, interrupt management, and I/O. |
Guest Software | Each virtual machine (VM) is a guest software environment. |
VMX Root Operation and VMX Non-root Operation | A VMM will run in VMX root operation and guest software will run in VMX non-root operation |
VMX transitions | Transitions between VMX root operation and VMX non-root operation. |
VM entries | Transitions into VMX non-root operation. |
Extended Page Table (EPT) | A modern mechanism which uses a second layer for converting the guest physical address to host physical address. |
VM exits | Transitions from VMX non-root operation to VMX root operation. |
Virtual machine control structure (VMCS) | a data structure in memory that exists exactly once per VM (or more precisely one per each VCPU [Virtual CPU]), while it is managed by the VMM. With every change of the execution context between different VMs, the VMCS is restored for the current VM, defining the state of the VM’s virtual processor and VMM control Guest software using VMCS. |
VMCS由6部分组成: VMCS结构图
Guest-state area | Processor state saved into the guest state area on VM exits and loaded on VM entries. |
Host-state area | Processor state loaded from the host state area on VM exits. |
VM-execution control fields | Fields controlling processor operation in VMX non-root operation. |
VM-exit control fields | Fields that control VM exits. |
VM-entry control fields | Fields that control VM entries. |
VM-exit information fields | Read-only fields to receive information on VM exits describing the cause and the nature of the VM exit. |
VMX指令 | |
---|---|
INVEPT | Invalidate Translations Derived from EPT |
INVVPID | Invalidate Translations Based on VPID |
VMCALL | Call to VM Monitor |
VMCLEAR | Clear Virtual-Machine Control Structure |
VMFUNC | Invoke VM function |
VMLAUNCH | Launch Virtual Machine |
VMRESUME | Resume Virtual Machine |
VMPTRLD | Load Pointer to Virtual-Machine Control Structure |
VMPTRST | Store Pointer to Virtual-Machine Control Structure |
VMREAD | Read Field from Virtual-Machine Control Structure |
VMWRITE | Write Field to Virtual-Machine Control Structure |
VMXOFF | Leave VMX Operation |
VMXON | Enter VMX Operation |
Part 2: 开启VMX
- 关闭驱动签名验证
bcdedit.exe /set nointegritychecks on
- 打开测试模式
bcdedit /set testsigning on
一边按住
Shift
一边重启,选择7
关闭签名验证(仅本次有效)IRP Major Functions
- IRP_MJ_CREATE
- IRP_MJ_DEVICE_CONTROL
- IRP Minor Functions (mainly used for PnP manager to notify for a special event)
Fast I/O
- 先、快于IRP
检查Hypervisor支持
EAX = 0
CPUID
检查字符串GenuineIntel
,判断是否为Intel处理器CPUID.1:ECX.VMX[bit 5] = 1
,判断是否支持VMX
开启VMX
CR4.VMXE[bit 13] = 1
MyHypervisorDriver调用
IoCreateDevice
函数创建设备并为其IRP_MJ_CREATE
指定DrvCreate
函数,DrvCreate
函数开启VMX,并调用DbgPrint
函数输出,用Dbgview中查看。- MyHypervisorApp利用
__asm
检查CPU是否支持VMX,如果支持则调用CreateFile
函数打开设备,触发驱动的DrvCreate
函数,开启VMX。
Part 3: 建立第一个虚拟机
- IOCTL(32bit)
- METHOD_BUFFERED
- METHOD_NIETHER
- METHOD_IN_DIRECT
- METHOD_OUT_DIRECT
内核函数/宏 | |
---|---|
KeQueryActiveProcessorCount | 获取逻辑处理器数 |
KeSetSystemAffinityThread | 将当前线程分配到某个逻辑处理器上 |
KeRevertToUserAffinityThread | 恢复线程运行的处理器 |
MmGetPhysicalAddress | 虚拟地址->物理地址 |
MmGetVirtualForPhysical | 物理地址->虚拟地址 |
MmAllocateContiguousMemory | 申请对齐的连续内存 |
MmFreeContiguousMemory | 释放上述内存 |
RtlSecureZeroMemory | 初始化内存为0 |
__readmsr | 读取IA32_FEATURE_CONTROL_MSR,用于检查VMX支持、保存RevisionId |
__vmx_on | |
__vmx_vmptrld | |
__vmx_off | |
PAGED_CODE | 确保调用线程运行在一个允许分页的足够低IRQL级别 |
- MyHypervisorDriver调用
IoCreateDevice
函数创建设备并为其IRP_MJ_CREATE
指定DrvCreate
函数,为IRP_MJ_CLOSE
指定DrvClose
函数。DrvCreate
函数初始化VMX,为每个逻辑处理器开启VMX
并为VMXON
和VMCS
分配内存,DrvClose
则相反。 MyHypervisorApp调用
CreateFile
函数打开设备,触发驱动的DrvCreate
函数,开启VMX,并为每个逻辑处理器的VMXON
和VMCS
分配内存;调用CloseHandle
关闭设备,触发驱动的DrvClose
函数,关闭VMX
并释放内存。用户态和VMM驱动交互
- MyHypervisorDriver为
IRP_MJ_DEVICE_CONTROL
指定DrvIOCTLDispatcher
函数,与MyHypervisorApp通过IOCTL
通信(内存读取)。MyHypervisorApp调用DeviceIoControl
函数并指定IoControlCode
触发DrvIOCTLDispatcher
。
- MyHypervisorDriver为
Part 4: 使用EPT进行地址转换
- Turning the Pages: Introduction to Memory Paging on Windows 10 x64
- Windbg
!pte
- PXE = PML4E
- PPE = PDPE
!vtop
将虚拟地址转换为物理地址
- Windbg
Introduction to IA-32e hardware paging
- Intel 64位分页机制
- PG flag: CR0[bit 31]: 开启分页
- Physical Address Extension (PAE): CR4[bit 5]: 未设置则使用32bit分页
- Long Mode Enable (LME): Extended Feature Enable Register (IA32_EFER MSR)[bit 8]: 未设置则使用PAE 36bit分页,否则使用64bit的4层分页机制
- Page Frame Number (PFN): the next paging structure in the hierarchy (0x1000 4 KB)
- 4096bytes 512 entries(PFN)
CR3
保存第一个页结构的物理地址- 虚拟地址
- [bits 63-48] 保留
- [bits 47-39] a PML4 table (located in CR3) offset
- [bits 38-30] a Page Directory Pointer Table (PDPT) offset
- [bits 29-21] a Page Directory (PD) offset
- [bits 20-12] a Page Table (PT) offset
- [bits 11-00] 物理页中的偏移
- 使用Windbg具体分析
Second Level Address Translation (SLAT) or nested paging
- an extended layer in the paging mechanism
- hardware-based virtualization virtual addresses -> physical memory
- 实现
- AMD: Rapid Virtualization Indexing (RVI)/Nested Page Tables (NPT)
- Intel: Extended Page Table (EPT)
- ARM: Stage-2 page-tables
- 两种方法
- Shadow Page Tables (Software-assisted paging)
- Extended Page Tables (Hardware-assisted paging)
- Page table maintained by guest OS generate the guest-physical address.
- Page table maintained by VMM map guest physical address to host physical address.
MyHypervisorDriver添加了EPT的初始化代码,为64bit的4层地址转换的每个结构表申请内存空间。
Part 5: 建立VMCS并在虚拟机中执行代码
内核函数 | |
---|---|
__vmx_vmptrst | |
__vmx_vmclear | |
__vmx_vmptrld | |
__vmx_vmlaunch | |
__vmx_vmread | |
__vmx_vmwrite | 将数据写入VMCS的指定字段 |
__vmx_vmresume |
VMX Controls | ||
---|---|---|
VM-Execution Controls | Primary Processor-Based VM-Execution Controls Secondary Processor-Based VM-Execution Controls |
设置VMCS |
VM-entry Control Bits | ||
VM-exit Control Bits | ||
PIN-Based Execution Control | governs the handling of asynchronous events | |
Activity State | 0:Active 1:HLT 2:Shutdown 3:Wait-for-SIPI | |
Interruptibility State | permit certain events to be blocked for a period of time |
MyHypervisorDriver通过__vmx_vmwrite
设置VMCS
的各项内容(非常复杂),并调用LaunchVM
在第0号虚拟处理器上设置VMCS
(设置CPU_BASED_HLT_EXITING
即在HLT
时调用VM-Exit
,设置HOST_RIP
指向VMExitHandler
在处理EXIT_REASON_HLT
中调用VMXOFF
),最后调用__vmx_vmlaunch
执行HLT(\xF4)
,触发VM-Exit
。
Part 6: 虚拟化正在运行的系统
CPU_BASED_VM_EXEC_CONTROL | CPU_BASED_ACTIVATE_MSR_BITMAP |
SECONDARY_VM_EXEC_CONTROL | CPU_BASED_CTL2_RDTSCP CPU_BASED_CTL2_ENABLE_INVPCID CPU_BASED_CTL2_ENABLE_XSAVE_XRSTORS |
IRQL(Interrupt Request Level): a Windows-specific mechanism to manage interrupts or giving priority by their level so raising IRQL means your routine will execute with higher priority than normal Windows codes (PASSIVE LEVEL & APC LEVEL).1
2KIRQL OldIrql = KeRaiseIrqlToDpcLevel(); // raises the IRQL to Dispatch Level so the Windows Scheduler can’t kick in to change the context
KeLowerIrql(OldIrql);
CPUID is one the main instructions that cause the VM-Exit.
内核函数 | ||
---|---|---|
_cpuidex | CPUID | HYPERV_HYPERVISOR_PRESENT_BIT |
Defeating malware’s Anti-VM techniques (CPUID-Based Instructions)