Skip to content

LLVM 类型系统简介

Posted on:2023.01.22


Open TOC


LLVM 类型系统的基础为 Type


enum TypeID {
// PrimitiveTypes
HalfTyID = 0, ///< 16-bit floating point type
BFloatTyID, ///< 16-bit floating point type (7-bit significand)
FloatTyID, ///< 32-bit floating point type
DoubleTyID, ///< 64-bit floating point type
X86_FP80TyID, ///< 80-bit floating point type (X87)
FP128TyID, ///< 128-bit floating point type (112-bit significand)
PPC_FP128TyID, ///< 128-bit floating point type (two 64-bits, PowerPC)
VoidTyID, ///< type with no size
LabelTyID, ///< Labels
MetadataTyID, ///< Metadata
X86_MMXTyID, ///< MMX vectors (64 bits, X86 specific)
X86_AMXTyID, ///< AMX vectors (8192 bits, X86 specific)
TokenTyID, ///< Tokens
// Derived types... see DerivedTypes.h file.
IntegerTyID, ///< Arbitrary bit width integers
FunctionTyID, ///< Functions
PointerTyID, ///< Pointers
StructTyID, ///< Structures
ArrayTyID, ///< Arrays
FixedVectorTyID, ///< Fixed width SIMD vector type
ScalableVectorTyID ///< Scalable SIMD vector type


所有结构等价的类型在全局只有一个对象实例 (单例)

Type 类的继承关系如下图所示


LLVMContext 类中包含了一个顶层 const 指针,指向 LLVMContextImpl

经典 PImpl 设计

LLVMContextImpl *const pImpl;

LLVMContextImpl 中包含了上述 primitive types 和 integer type 的单例,在构造函数中初始化

LLVMContextImpl::LLVMContextImpl(LLVMContext &C)
: DiagHandler(std::make_unique<DiagnosticHandler>()),
VoidTy(C, Type::VoidTyID), LabelTy(C, Type::LabelTyID),
HalfTy(C, Type::HalfTyID), BFloatTy(C, Type::BFloatTyID),
FloatTy(C, Type::FloatTyID), DoubleTy(C, Type::DoubleTyID),
MetadataTy(C, Type::MetadataTyID), TokenTy(C, Type::TokenTyID),
X86_FP80Ty(C, Type::X86_FP80TyID), FP128Ty(C, Type::FP128TyID),
PPC_FP128Ty(C, Type::PPC_FP128TyID), X86_MMXTy(C, Type::X86_MMXTyID),
X86_AMXTy(C, Type::X86_AMXTyID), Int1Ty(C, 1), Int8Ty(C, 8),
Int16Ty(C, 16), Int32Ty(C, 32), Int64Ty(C, 64), Int128Ty(C, 128) {
if (OpaquePointersCL.getNumOccurrences()) {
OpaquePointers = OpaquePointersCL;

Type 类也提供了对应的静态方法,用于获取这些单例

Floating Point Types

primitive type

half16-bit floating-point value
bfloat16-bit “brain” floating-point value (7-bit significand). Provides the same number of exponent bits as float, so that it matches its dynamic range, but with greatly reduced precision. Used in Intel’s AVX-512 BF16 extensions and Arm’s ARMv8.6-A extensions, among others.
float32-bit floating-point value
double64-bit floating-point value
fp128128-bit floating-point value (113-bit significand)
x86_fp8080-bit floating-point value (X87)
ppc_fp128128-bit floating-point value (two 64-bits)

通常使用 floatdouble 类型

Void Type

primitive type

可以通过如下代码获取 void 类型的单例

llvm::Type *type = llvm::Type::getVoidTy(TheContext);

void 类型不代表任何值,也没有大小,仅起到占位符的作用,如函数的返回值

define dso_local void @foo() #0 {
ret void

Label Type

primitive type

用于标记基本块,例如 max 函数可能对应的 LLVM IR

define dso_local i32 @max(i32 noundef %0, i32 noundef %1) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
store i32 %0, i32* %3, align 4
store i32 %1, i32* %4, align 4
%5 = load i32, i32* %3, align 4
%6 = load i32, i32* %4, align 4
%7 = icmp sgt i32 %5, %6
br i1 %7, label %8, label %10
8: ; preds = %2
%9 = load i32, i32* %3, align 4
br label %12
10: ; preds = %2
%11 = load i32, i32* %4, align 4
br label %12
12: ; preds = %10, %8
%13 = phi i32 [ %9, %8 ], [ %11, %10 ]
ret i32 %13

注意这里隐式的 %2 编号

Token Type

primitive type

The token type is used when a value is associated with an instruction but all uses of the value must not attempt to introspect or obscure it. As such, it is not appropriate to have a phi or select of type token.

The identifier ‘none’ is recognized as an empty token constant and must be of token type.


Metadata Type

primitive type

The metadata type represents embedded metadata. No derived types may be created from metadata except for function arguments.

LLVM IR allows metadata to be attached to instructions and global objects in the program that can convey extra information about the code to the optimizers and code generator. One example application of metadata is source-level debug information. There are two metadata primitives: strings and nodes.

Metadata does not have a type, and is not a value. If referenced from a call instruction, it uses the metadata type.

All metadata are identified in syntax by an exclamation point (‘!’).


!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"clang version 14.0.6"}

Integer Type

语法结构为 iN,其中 N 为表示所需整数大小的位宽

可以通过如下代码获取 i32 类型的单例

llvm::Type *type = llvm::Type::getInt32Ty(TheContext);

在构造 i32 类型的过程中,向 Type 类中存储了 SubclassData 信息

TypeID ID : 8; // The current base type of this type.
unsigned SubclassData : 24; // Space for subclasses to store data.
// Note that this should be synchronized with
// MAX_INT_BITS value in IntegerType class.

受其大小限制,integer type 的宽度范围为 [1,223][1, 2^{23}]

也就是说 LLVM 所能够表示的最大整数为 2223=283886082^{2^{23}}=2^{8388608}

注意这里的 integer type 并不包含符号信息

LLVMContextImpl 使用了下述数据结构缓存了所有的 integer type

DenseMap<unsigned, IntegerType *> IntegerTypes;

Pointer Type

pointer type 通常用于引用指定内存位置中的对象

pointer type 可以定义指向对象所在的地址空间编号,默认为 0

AddrSpace 同样被存储到了 SubclassData

可以通过如下代码获取 i32* 类型的单例

llvm::Type *type = llvm::Type::getInt32PtrTy(TheContext, 0);

上述方法封装了 PointerType::get 方法

PointerType *Type::getInt32PtrTy(LLVMContext &C, unsigned AS) {
return getInt32Ty(C)->getPointerTo(AS);


PointerType *Type::getPointerTo(unsigned AddrSpace) const {
return PointerType::get(const_cast<Type*>(this), AddrSpace);

LLVMContextImpl 使用了下述数据结构缓存了所有的 pointer type

DenseMap<Type *, PointerType *> PointerTypes; // Pointers in AddrSpace = 0
DenseMap<std::pair<Type *, unsigned>, PointerType *> ASPointerTypes;

注意到这里的 pointer type 携带了 pointee 的类型信息

pointee 的类型存储在 Type 类的 ContainedTys

/// Keeps track of how many Type*'s there are in the ContainedTys list.
unsigned NumContainedTys = 0;
/// A pointer to the array of Types contained by this Type. For example, this
/// includes the arguments of a function type, the elements of a structure,
/// the pointee of a pointer, the element type of an array, etc. This pointer
/// may be 0 for types that don't contain other types (Integer, Double,
/// Float).
Type * const *ContainedTys = nullptr;

社区的这种 explicit pointee types 的讨论如下

注意 LLVM 并不存在 void*,可以参考下述代码

bool PointerType::isValidElementType(Type *ElemTy) {
return !ElemTy->isVoidTy() && !ElemTy->isLabelTy() &&
!ElemTy->isMetadataTy() && !ElemTy->isTokenTy() &&

社区最后达成的共识是,explicit pointee types 的成本大于收益,因此应该弃用它们

于是,LLVM 提出了 opaque pointer type,直译为不透明的指针类型,这种指针类型不携带 pointee 的类型信息

例如,对于下述 LLVM IR

load i64* %p

其对应的 opaque 版本为

load i64, ptr %p

在底层 APIs 上,构造这条指令的 API 从 LLVMBuildLoad 变为了 LLVMBuildLoad2

Array Type

array type 包含两个属性


[40 x i32]Array of 40 32-bit integer values.
[3 x [4 x i32]]3x4 array of 32-bit integer values.
[2 x [3 x [4 x i16]]]2x3x4 array of 16-bit integer values.

可以通过如下代码获取 [40 x i32] 类型的单例

llvm::Type *type = llvm::ArrayType::get(llvm::Type::getInt32Ty(TheContext), 40);

类似 pointer type,array type 的 underlying data type 存储在 Type 类的 ContainedTys

LLVMContextImpl 使用了下述数据结构缓存了所有的 array type

DenseMap<std::pair<Type *, uint64_t>, ArrayType *> ArrayTypes;

Vector Type

vector type 类似 array type,但是用于 SIMD,并且不被认为是 aggregate types,而是 first class types

Values of these types are the only ones which can be produced by instructions.

vector type 包含三个属性


<4 x i32>Vector of 4 32-bit integer values.
<vscale x 4 x i32>Vector with a multiple of 4 32-bit integer values.

对于 ScalableVectorType 而言,其 vscale 在编译期由硬件环境决定

可以通过如下代码获取 <vscale x 4 x i32> 类型的单例

llvm::Type *type = llvm::VectorType::get(llvm::Type::getInt32Ty(TheContext), 4, true);

LLVMContextImpl 使用了下述数据结构缓存了所有的 vector type

DenseMap<std::pair<Type *, ElementCount>, VectorType *> VectorTypes;

注意此处的 ElementCount 类,其构造出现在

static VectorType *get(Type *ElementType, unsigned NumElements, bool Scalable) {
return VectorType::get(ElementType, ElementCount::get(NumElements, Scalable));

其中调用了其父类 LinearPolySize 的下述方法

static LeafTy get(ScalarTy MinVal, bool Scalable) {
return static_cast<LeafTy>(LinearPolySize(MinVal, Scalable ? 1 : 0));


/// UnivariateLinearPolyBase is a base class for ElementCount and TypeSize.
/// Like LinearPolyBase it tries to represent a linear polynomial
/// where only one dimension can be set at any time, e.g.
/// 0 * scale0 + 0 * scale1 + ... + cJ * scaleJ + ... + 0 * scaleK
/// The dimension that is set is the univariate dimension.

大概含义是若 scalable property 为 true,则允许对应的 dimension 在不同的硬件环境下进行不同的 scale

在实际测试中,发现在给定的硬件环境下,使用 LLVM 生成的 vector type 通常为 FixedVectorType

例如,利用 AVX2 intrinsics,对包含 8 个 float 类型数据的 vector 执行 abs 操作

#include <immintrin.h>
__m256 _mm256_abs_ps(__m256 vec) {
__m256 float_zero = _mm256_set1_ps(0);
__m256 mask_lt_zero = _mm256_cmp_ps(vec, float_zero, _CMP_LT_OQ);
__m256 vec_neg = _mm256_sub_ps(float_zero, vec);
return _mm256_blendv_ps(vec, vec_neg, mask_lt_zero);

使用 clang -S -emit-llvm a.cpp -O3 -march=native 生成的中间代码如下

define dso_local noundef <8 x float> @_Z13_mm256_abs_psDv8_f(<8 x float> noundef %0) local_unnamed_addr #0 {
%2 = fcmp olt <8 x float> %0, zeroinitializer
%3 = fsub <8 x float> zeroinitializer, %0
%4 = select <8 x i1> %2, <8 x float> %3, <8 x float> %0
ret <8 x float> %4

注意这里 %0, %2, %3, %4 的类型均为 <8 x float>,这同时说明了 vector type 属于 first class types

Structure Type

structure type 有两种类型

匿名,在 context 内保证唯一性,必须包含 body

LLVMContextImpl 使用了下述数据结构缓存了所有的 literal struct type

using StructTypeSet = DenseSet<StructType *, AnonStructTypeKeyInfo>;
StructTypeSet AnonStructTypes;

这里的 AnonStructTypeKeyInfo 包含了下列成员

ArrayRef<Type *> ETypes;
bool isPacked;

可以通过如下方式获取 { i32, i32, i32 } 类型的单例

llvm::Type *i32 = llvm::Type::getInt32Ty(TheContext);
std::array<llvm::Type *, 3> elems = {i32, i32, i32};
llvm::Type *type = llvm::StructType::get(TheContext, elems, false);

LLVM 为 ArrayRef 类提供了大量的 conversion constructors,支持从 pointer, vector, array, C-array 等多种类型构造 ArrayRef

可以匿名,不保证唯一性,可以不包含 body (opaque)

Prior to the LLVM 3.0 release, identified types were structurally uniqued. Only literal types are uniqued in recent versions of LLVM.

LLVMContextImpl 使用了下述数据结构缓存了所有的 identified struct type

StringMap<StructType *> NamedStructTypes;
unsigned NamedStructTypesUniqueID = 0;

可以通过如下方式构造 %struct.A = type { i32, i32, i32 } 类型

llvm::Type *i32 = llvm::Type::getInt32Ty(TheContext);
std::array<llvm::Type *, 3> elems = {i32, i32, i32};
llvm::Type *type = llvm::StructType::create(TheContext, elems, "A", false);

实际上,structure type 定义了下述属性,这些属性会被存储到 SubClassData

enum {
/// This is the contents of the SubClassData field.
SCDB_HasBody = 1,
SCDB_Packed = 2,
SCDB_IsLiteral = 4,
SCDB_IsSized = 8


struct A;
struct B {
A* a;

生成的 LLVM IR 可能为

%struct.B = type { %struct.A* }
%struct.A = type opaque

其中 struct A 不包含 body,为 opaque structure type

由此可见,引入 opaque structure type 的目的是为了解决前置声明

对于 %struct.A 而言,SCDB_HasBodySCDB_IsSized 对应的 bit 置 0

对于 isSized 的实现,可以参考

struct __attribute__((packed)) A {
int i;
short s;
char c;

生成的 LLVM IR 可能为

%struct.A = type <{ i32, i16, i8 }>

注意这里多出的 <>

对于 %struct.A 而言,SCDB_Packed 对应的 bit 置 1

struct A {
struct {
int i;
int j;
int k;
} x;
struct {
int i;
int j;
int k;
} y;

生成的 LLVM IR 可能为

%struct.A = type { %struct.anon, %struct.anon.0 }
%struct.anon = type { i32, i32, i32 }
%struct.anon.0 = type { i32, i32, i32 }

注意这里匿名结构体的类型仍然为 identified struct type,LLVM 内部会自动处理无名和重名的情形

Function Type


类似 literal struct type,LLVMContextImpl 使用了下述数据结构缓存了所有的 function type

using FunctionTypeSet = DenseSet<FunctionType *, FunctionTypeKeyInfo>;
FunctionTypeSet FunctionTypes;

这里的 AnonStructTypeKeyInfo 包含了下列成员

const Type *ReturnType;
ArrayRef<Type *> Params;
bool isVarArg;

可以通过如下方式获取 i32 (i32) 类型的单例

llvm::Type *i32 = llvm::Type::getInt32Ty(TheContext);
std::array<llvm::Type *, 1> args = {i32};
llvm::Type *type = llvm::FunctionType::get(i32, args, false);


  • isVarArg 被存储到了 SubclassData
  • ReturnTypeParams 被存储到了 ContainedTys

这里并没有显式给出 llvm::LLVMContext 参数,实际上这里对应的 context 为 return type 所属的 context

最后,这里的 isVarArg 字段用于指示该函数是否需要包含变长参数


#include <stdio.h>
int main() { printf("hello world\n"); }

生成的 LLVM IR 可能为

@.str = private unnamed_addr constant [13 x i8] c"hello world\0A\00", align 1
define dso_local i32 @main() #0 {
%1 = call i32 (i8*, ...) @printf(i8* noundef getelementptr inbounds ([13 x i8], [13 x i8]* @.str, i64 0, i64 0))
ret i32 0

注意这里的函数签名 i32 (i8*, ...)


Value 类是 LLVM 中一个非常重要的类,是很多核心类的基类

Value 类的部分继承关系如下图所示

flowchart LR Argument --> Value BasicBlock --> Value User --> Value Constant --> User Instruction --> User Operator --> User

每一个 Value 类对象都包含一个指向 Type 类的指针,以及一个 use list,记录了使用了该 value 的 users

class Value {
Type *VTy;
Use *UseList;

Value 类内部为 users 实现了迭代器模式,可以使用下述接口访问 value 的 users

llvm::Value *value = ...
for (auto it = value->use_begin(); it != value->use_end(); ++it) {
llvm::Value *user = it->get();

在对 LLVM IR 进行 transform 的时候,可能会将 value 替换为另一个 value,比如一条指令的结果恒为常数,那么就可以用常数替换这条指令,同时还需要修改引用这个 value 的 users


/// Change all uses of this to point to a new Value.
/// Go through the uses list for this definition and make each use point to
/// "V" instead of "this". After this completes, 'this's use list is
/// guaranteed to be empty.
void replaceAllUsesWith(Value *V);

其内部实现利用了 ValueHandleBase

value handle 可以看作一个指向 value 的智能指针,可以在 value 被 delete 或者被 replaceAllUsesWith (RAUW) 时,触发特定的动作

ValueHandleBase 类有三个子类

Value 类对象可以拥有一个 name,在 Value 类中使用 HasName 字段记录

LLVMContextImpl 使用了下述数据结构存储了所有的 value name

DenseMap<const Value *, ValueName *> ValueNames;


using ValueName = StringMapEntry<Value *>;

User & Use

User 类继承自 Value 类,因为 user 自身也是一个 value,会被其他 users 使用




llvm::Instruction *ins = ...
for (auto it = ins->op_begin(); it != ins->op_end(); ++it) {
llvm::Value *value = it->get();

所以 Use 类的核心就是如何让 value 和 user 高效地双向关联



Constant 类继承自 User

Constant 类作为所有常量的基类,代表其 value 不会在运行时发生变化


所有结构等价的常量在全局只有一个对象实例 (单例)

Constant 类的部分继承关系如下图所示

flowchart LR BlockAddress --> Constant ConstantAggregate --> Constant ConstantArray --> ConstantAggregate ConstantStruct --> ConstantAggregate ConstantVector --> ConstantAggregate ConstantData --> Constant ConstantFP --> ConstantData ConstantInt --> ConstantData ConstantAggregateZero --> ConstantData ConstantPointerNull --> ConstantData ConstantDataSequential --> ConstantData ConstantDataArray --> ConstantDataSequential ConstantDataVector --> ConstantDataSequential ConstantExpr --> Constant GlobalValue --> Constant GlobalObject --> GlobalValue Function --> GlobalObject GlobalVariable --> GlobalObject




LLVMContextImpl 使用了下述数据结构缓存了所有的 int constant

using IntMapTy = DenseMap<APInt, std::unique_ptr<ConstantInt>, DenseMapAPIntKeyInfo>;
IntMapTy IntConstants;

可以通过如下代码获取 i32 100 常量的单例

llvm::Value *value = llvm::ConstantInt::get(TheContext, llvm::APInt(32, 100, false /* isSigned */));

使用 isSigned 参数提示 APInt 类处理符号问题

An analogous transition that happened earlier in LLVM is integer signedness. Currently there is no distinction between signed and unsigned integer types, but rather each integer operation (e.g. add) contains flags to signal how to treat the integer. Previously LLVM IR distinguished between unsigned and signed integer types and ran into similar issues of no-op casts. The transition from manifesting signedness in types to instructions happened early on in LLVM’s timeline to make LLVM easier to work with.

注意此处的辅助类 APInt,其内部使用 uint64_tuint64_t * 存储原始数据

union {
uint64_t VAL; ///< Used to store the <= 64 bits integer value.
uint64_t *pVal; ///< Used to store the >64 bits integer value.
} U;

另外 LLVMContextImpl 也为布尔常量值 i1 额外保存了其单例

ConstantInt *TheTrueVal = nullptr;
ConstantInt *TheFalseVal = nullptr;


llvm::Value *value = llvm::ConstantInt::getTrue(TheContext);



类似 ConstantIntLLVMContextImpl 使用了下述数据结构缓存了所有的 float constant

using FPMapTy = DenseMap<APFloat, std::unique_ptr<ConstantFP>, DenseMapAPFloatKeyInfo>;
FPMapTy FPConstants;

可以通过如下代码获取 float 1.1 常量的单例

llvm::Value *value = llvm::ConstantFP::get(TheContext, llvm::APFloat(static_cast<float>(1.1)));

此处的浮点数遵循 IEEE 规范,其实现封装在 APFloat 等类中,例如

float foo() { return 1.1; }

其生成的 LLVM IR 为

define dso_local noundef float @_Z3foov() #0 {
ret float 0x3FF19999A0000000





const int arr[42] = {0};

其生成的 LLVM IR 为

@_ZL3arr = internal constant [42 x i32] zeroinitializer, align 16

此处的 zeroinitializer 即为 i32 类型的 ConstantAggregateZero

llvm::Value *value = llvm::ConstantAggregateZero::get(llvm::Type::getInt32Ty(TheContext));

LLVMContextImpl 使用了下述数据结构缓存了所有的 constant aggregate zero

DenseMap<Type *, std::unique_ptr<ConstantAggregateZero>> CAZConstants;




void *foo() { return nullptr; }

其生成的 LLVM IR 为

define dso_local noundef i8* @_Z3foov() #0 {
ret i8* null

此处的 null 即为 i8* 类型的 ConstantPointerNull

llvm::Value *value = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(TheContext));

LLVMContextImpl 使用了下述数据结构缓存了所有的 constant pointer null

DenseMap<PointerType *, std::unique_ptr<ConstantPointerNull>> CPNConstants;



限制 underlying data type 为 simple 1/2/4/8-byte integer 或 float/double


const int arr[] = { 0, 1, 2 };

其生成的 LLVM IR 为

@_ZL3arr = internal constant [3 x i32] [i32 0, i32 1, i32 2], align 4


std::array<int, 3> elems = {0, 1, 2};
llvm::Value *value = llvm::ConstantDataArray::get(TheContext, elems);



限制 underlying data type 为 simple 1/2/4/8-byte integer 或 float/double


#include <immintrin.h>
__m256 foo() { return _mm256_set1_ps(1); }

使用 clang -S -emit-llvm a.cpp -O3 -march=native 生成的中间代码如下

define dso_local noundef <8 x float> @_Z3foov() local_unnamed_addr #0 {
ret <8 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>


std::array<float, 8> elems = {1, 1, 1, 1, 1, 1, 1, 1};
llvm::Value *value = llvm::ConstantDataVector::get(TheContext, elems);

LLVMContextImpl 使用了下述数据结构缓存了所有的 constant data array 和 constant data vector

StringMap<std::unique_ptr<ConstantDataSequential>> CDSConstants;

注意 ConstantDataSequentialConstantDataArrayConstantDataVector 的父类

另外,这里 mapping 的 key 是字符串类型,以上述调用为例

Constant *ConstantDataVector::get(LLVMContext &Context, ArrayRef<float> Elts) {
auto *Ty = FixedVectorType::get(Type::getFloatTy(Context), Elts.size());
const char *Data = reinterpret_cast<const char *>(;
return getImpl(StringRef(Data, Elts.size() * 4), Ty);


Constant *ConstantDataSequential::getImpl(StringRef Elements, Type *Ty) {
// If the elements are all zero or there are no elements, return a CAZ, which
// is more dense and canonical.
if (isAllZeros(Elements))
return ConstantAggregateZero::get(Ty);

当元素全零时,ConstantDataSequential 会退化为 ConstantAggregateZero





struct A {
int i;
int j;
const A a = {1, 1};

其生成的 LLVM IR 为

%struct.A = type { i32, i32 }
@_ZL1a = internal constant %struct.A { i32 1, i32 1 }, align 4


llvm::Type *i32 = llvm::Type::getInt32Ty(TheContext);
llvm::StructType *type = llvm::StructType::create(TheContext, {i32, i32}, "A", false);
llvm::Constant *one = llvm::ConstantInt::get(TheContext, llvm::APInt(32, 1, false /* isSigned */));
std::array<llvm::Constant *, 2> consts = {one, one};
llvm::Value *value = llvm::ConstantStruct::get(type, consts);

LLVMContextImpl 使用了下述数据结构缓存了所有的 constant struct

using StructConstantsTy = ConstantUniqueMap<ConstantStruct>;
StructConstantsTy StructConstants;



当 underlying data type 为 simple 1/2/4/8-byte integer 或 float/double 时


struct A {
int i;
int j;
const A a[] = {{1, 1},{1, 1}};

其生成的 LLVM IR 为

%struct.A = type { i32, i32 }
@_ZL1a = internal constant [2 x %struct.A] [%struct.A { i32 1, i32 1 }, %struct.A { i32 1, i32 1 }], align 16

LLVMContextImpl 使用了下述数据结构缓存了所有的 constant array

using ArrayConstantsTy = ConstantUniqueMap<ConstantArray>;
ArrayConstantsTy ArrayConstants;


template <class ConstantClass> class ConstantUniqueMap {
using ValType = typename ConstantInfo<ConstantClass>::ValType;
using TypeClass = typename ConstantInfo<ConstantClass>::TypeClass;
using LookupKey = std::pair<TypeClass *, ValType>;

template <> struct ConstantInfo<ConstantArray> {
using ValType = ConstantAggrKeyType<ConstantArray>;
using TypeClass = ArrayType;

template <class ConstantClass> struct ConstantAggrKeyType {
ArrayRef<Constant *> Operands;

可知缓存的 mapping 中 key 形式如下

{ArrayType *, ArrayRef<Constant *>}



再次强调,函数和全局变量的常量性体现在它们的地址不会发生变化,相当于一个顶层 const 指针指向这些对象




int a{1};

其生成的 LLVM IR 为

@a = dso_local global i32 1, align 4

这里的 dso_local 的含义如下

The compiler may assume that a function or variable marked as dso_local will resolve to a symbol within the same linkage unit. Direct access will be generated even if the definition is not within this compilation unit.


static int a{1};

其生成的 LLVM IR 为

@_ZL1a = internal global i32 1, align 4

这里的 internal 的含义如下

Similar to private, but the value shows as a local symbol (STB_LOCAL in the case of ELF) in the object file. This corresponds to the notion of the ‘static’ keyword in C.

注意这里出现了 name mangling,对于 internal 链接类型的 value,其对应的符号名和目标文件中的一致

联系之前的 internal constant

此处目标文件的类型为 ELF

13: 0000000000004010 4 OBJECT LOCAL DEFAULT 22 _ZL1a

上述 IR 也许可以通过如下代码获取

auto *value = new llvm::GlobalVariable(llvm::Type::getInt32Ty(TheContext), false /* isConstant */, llvm::GlobalValue::LinkageTypes::InternalLinkage);
value->setInitializer(llvm::ConstantInt::get(TheContext, llvm::APInt(32, 1, false /* isSigned */)));

global variable 完整的 LLVM IR 语法如下

@<GlobalVarName> = [Linkage] [PreemptionSpecifier] [Visibility]
[DLLStorageClass] [ThreadLocal]
[(unnamed_addr|local_unnamed_addr)] [AddrSpace]
<global | constant> <Type> [<InitializerConstant>]
[, section "name"] [, partition "name"]
[, comdat [($name)]] [, align <Alignment>]
[, no_sanitize_address] [, no_sanitize_hwaddress]
[, sanitize_address_dyninit] [, sanitize_memtag]
(, !name !N)*


源码层面,所有的 global variable 都存储在当前的 Module

GlobalListType GlobalList; ///< The Global Variables in the module


/// The type for the list of global variables.
using GlobalListType = SymbolTableList<GlobalVariable>;

可以使用下述代码遍历当前 module 所有的 global variable

for (auto it = TheModule->global_begin(); it != TheModule->global_end(); ++it) {
llvm::GlobalVariable &value = *it;

这得益于 GlobalVariable 类还继承了 ilist_node<GlobalVariable>

class GlobalVariable : public GlobalObject, public ilist_node<GlobalVariable>

从而能够通过当前节点 (GlobalVariable),遍历链表上其他节点 (GlobalVariable)




int foo(int) { return {}; }

clang -S -emit-llvm a.cpp -O3 下生成的 LLVM IR 为

; Function Attrs: mustprogress nofree norecurse nosync nounwind readnone sspstrong uwtable willreturn
define dso_local noundef i32 @_Z3fooi(i32 noundef %0) local_unnamed_addr #0 {
ret i32 0

函数定义完整的 LLVM IR 语法如下

define [linkage] [PreemptionSpecifier] [visibility] [DLLStorageClass]
[cconv] [ret attrs]
<ResultType> @<FunctionName> ([argument list])
[(unnamed_addr|local_unnamed_addr)] [AddrSpace] [fn Attrs]
[section "name"] [partition "name"] [comdat [($name)]] [align N]
[gc] [prefix Constant] [prologue Constant] [personality Constant]
(!name !N)* { ... }

上述 IR 也许可以通过如下代码获取

llvm::Type *i32 = llvm::Type::getInt32Ty(TheContext);
std::array<llvm::Type *, 1> args = {i32};
llvm::FunctionType *type = llvm::FunctionType::get(i32, args, false);
llvm::Value *func = llvm::Function::Create(type, llvm::GlobalValue::LinkageTypes::ExternalLinkage, 0 /* AddrSpace */);

对于函数声明,例如 printf

extern int printf (const char *__restrict __format, ...);

其对应的 LLVM IR 为

declare noundef i32 @_Z6printfPKcz(i8* noundef, ...) #1

函数声明完整的 LLVM IR 语法如下

declare [linkage] [visibility] [DLLStorageClass]
[cconv] [ret attrs]
<ResultType> @<FunctionName> ([argument list])
[(unnamed_addr|local_unnamed_addr)] [align N] [gc]
[prefix Constant] [prologue Constant]

源码层面,类似的,所有的 function 都存储在当前的 Module

FunctionListType FunctionList; ///< The Functions in the module

Function 类包含一些重要的成员

using BasicBlockListType = SymbolTableList<BasicBlock>;
// Important things that make up a function!
BasicBlockListType BasicBlocks; ///< The basic blocks
mutable Argument *Arguments = nullptr; ///< The formal arguments
size_t NumArgs;
std::unique_ptr<ValueSymbolTable> SymTab; ///< Symbol table of args/instructions
AttributeList AttributeSets; ///< Parameter attributes

在此主要关注 Argument 类,即函数形参,记录了如下信息

Function 类提供了迭代器接口遍历 arguments 和 basic blocks


用于唯一标识一组 (Function, BasicBlock) 的地址

由于没有介绍 BasicBlock,略过




Constant *ConstantExpr::get(unsigned Opcode, Constant *C1, Constant *C2, unsigned Flags, Type *OnlyIfReducedTy)


if (Constant *FC = ConstantFoldBinaryInstruction(Opcode, C1, C2))
return FC;


其中使用了大量 isa<> 等模板判断 value 是否为 undef 或者 poison

这里简单介绍一下 Undefined Values 和 Poison Values



引入这两种 value 的原因是,LLVM IR 存在 undefined behavior 这个概念,例如常见的 signed integer overflow

bool foo(int a) { return a + 1 > a; }

其对应的 LLVM IR 为

%4 = add nsw i32 %3, 1

注意这里的 nsw 符号,代表 No Signed Wrap,当 %3 的值为 INT_MAX 时,由于 INT_MAX + 1 会导致 signed integer overflow,此时的 %4 即为 poison value

之前的 LLVM 实现中,上述情形下 %4 为 undefined value

在 undefined value 上进行运算将会产生 undefined value,而不是产生 undefined behavior,在某些情形下,可能会产生一些优化,例如编译器会认为 undef & 1 只有最低位是 undefined 的,于是 ((undef & 1) >> 1) 就会被认为是 0

A ‘poison’ value should be used instead of ‘undef’ whenever possible. Poison values are stronger than undef, and enable more optimizations. Just the existence of ‘undef’ blocks certain optimizations.

在 2016 年,LLVM 社区曾提议弃用 undef 而只使用 poison,不过目前看来 undef 和 poison 仍然是并存的

另一个出现常量折叠的地方是使用 IRBuilder 构建指令时,例如

llvm::Constant *one = llvm::ConstantInt::get(TheContext, llvm::APInt(32, 1, false /* isSigned */));
llvm::Value *value = Builder.CreateAdd(one, one);


Value *CreateAdd(Value *LHS, Value *RHS, const Twine &Name = "", bool HasNUW = false, bool HasNSW = false) {
if (Value *V = Folder.FoldNoWrapBinOp(Instruction::Add, LHS, RHS, HasNUW, HasNSW))
return V;
return CreateInsertNUWNSWBinOp(Instruction::Add, LHS, RHS, Name, HasNUW, HasNSW);

Value *FoldNoWrapBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, bool HasNUW, bool HasNSW) const override {
auto *LC = dyn_cast<Constant>(LHS);
auto *RC = dyn_cast<Constant>(RHS);
if (LC && RC) {
if (ConstantExpr::isDesirableBinOp(Opc)) {
unsigned Flags = 0;
if (HasNUW)
Flags |= OverflowingBinaryOperator::NoUnsignedWrap;
if (HasNSW)
Flags |= OverflowingBinaryOperator::NoSignedWrap;
return ConstantExpr::get(Opc, LC, RC, Flags);
return ConstantFoldBinaryInstruction(Opc, LC, RC);
return nullptr;

若操作数满足一定的条件,会调用 ConstantExpr::get 获取对应的常量表达式,从而实现可能的常量折叠优化

LLVMContextImpl 使用了下述数据结构缓存了所有的 constant expr

ConstantUniqueMap<ConstantExpr> ExprConstants;