Customized LLVM supports SEH I: front end processing principle

Customized LLVM supports SEH I: front end processing principle

This year, I was assigned a task to make some C + + modules running on windows compile with clang+llvm. There are a lot of software compiled by llvm. Many software of MacOS and google are basically compiled with clang+llvm. It's not too difficult in theory, and then it's next. Then I found that I was transferred into a huge pit... The details are not detailed. The process from pain to doubt, life to insight, and the speed of bug resolution has increased from one week to at least one day.

The first big problem is that the original llvm partially supports Windows sEH. It's strange that its own code has completed 80% support for SEH, but why not support the rest, but sEH exceptions are a very important function in software development and must be supported. Raise a bug to the llvm community and ask them to repair it (it's estimated that they can support it in two years?) so they can only repair it by themselves, Moreover, sEH syntax is very complex, including different compilation options and various writing methods such as nested jump. More than 10 different bugs have been solved (it is expected that other bugs have not been tested), which took nearly two months.

A simple SEH exception code:

void TestSEH()
{
	__try
	{
		void* test = NULL;
		int temp = *(int*)test;
	}
	__except (1)
	{
		printf("catch exception\n");
	}
}

This code is compiled using the original llvm and cannot catch exceptions

But in the following code, the original llvm can catch exceptions

void test1()
{
	void* test = NULL;
	int temp = *(int*)test;
}
void TestSEH()
{
	__try
	{
		test1();
	}
	__except (1)
	{
		printf("catch exception\n");
	}
}

The exception code must be written in the sub function of the try/except block to catch the exception.

1.SEH principle:

The exception handling process is completed by the combination of compilation and Runtime. There has been a lot of research on the Runtime part. This paper will not be repeated. You can refer to the online materials

32-bit is stack based exception handling:

https://www.cnblogs.com/yilang/p/11233935.html

64 bit is exception table based processing:

https://cloud.tencent.com/developer/article/1471316

This article mainly introduces the compiler part of exception handling.

Introduction to compiler process:

When compiling LLVM, the three-stage design shown in the figure below is used, mainly including front-end and back-end, which are linked by IR

The input of the front end is the source code and the output is IR. Corresponding to the code segment at the beginning of this article, the following IR needs to be generated

define dso_local void @"?TestFinally@@YAXXZ"() #0 personality i8* bitcast (i32 (...)* @__C_specific_handler to i8*) {
  %1 = alloca i32, align 4
  %2 = alloca i8*, align 8
  %3 = alloca i32, align 4
  //This is the key to exception handling. Mark the start of the try block,
  invoke void @llvm.seh.try.begin()
          to label %4 unwind label %8

4:                                                ; preds = %0
  // Code in try block
  store volatile i8* null, i8** %2, align 8
  %5 = load volatile i8*, i8** %2, align 8
  %6 = bitcast i8* %5 to i32*
  %7 = load volatile i32, i32* %6, align 4
  store volatile i32 %7, i32* %3, align 4
  //This is the key to exception handling. Mark the end of the try block. If there is no exception, jump to% 16. If there is an exception, jump to% 8
  invoke void @llvm.seh.try.end()
          to label %16 unwind label %8

 //Jump related IR
8:                                                ; preds = %4, %0
  %9 = catchswitch within none [label %10] unwind to caller

10:                                               ; preds = %8
  %11 = catchpad within %9 [i8* null]
  catchret from %11 to label %12

  //IR of catch block
12:                                               ; preds = %10
  %13 = call i32 @llvm.eh.exceptioncode(token %11)
  store i32 %13, i32* %1, align 4
  %14 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([17 x i8], [17 x i8]* @"??_C@_0BB@LLPJFGLF@catch?5exception?6?$AA@", i64 0, i64 0))
  br label %15

15:                                               ; preds = %12, %16
  ret void

16:                                               ; preds = %4
  br label %15
}

For the sEH related content of the IR layer, the most important thing is to mark the start and end of the try block, as well as the positions of the except block and the finally block. The backend reads llvm.seh.try.begin() and llvm.seh.try.end(), and generates the corresponding exception information.

The 32-bit and 64 bit front-end processes are the same, except that the function name of exception handler is different, and 32-bit is__ except_handler3 or__ except_handler4, 64 bit yes_ C_specific_handler

2. Lexical analysis

The principle of lexical analysis is to save a token, analyze the words in the source code one by one, and produce a token sequence one by one

When read__ try/ __ Processing after except keyword

The code is located in: Lang \ lib \ parse \ parsestmt.cpp and lib\Sema\SemaStmt.cpp

//Encounter__ try keyword  
case tok::kw___try:
    ProhibitAttributes(Attrs); 
    return ParseSEHTryBlock();
//leave keyword encountered
case tok::kw___leave:
    Res = ParseSEHLeaveStatement();
    SemiError = "__leave";
    break;
StmtResult Parser::ParseSEHTryBlock() {
  assert(Tok.is(tok::kw___try) && "Expected '__try'");
  //Read next keyword
  SourceLocation TryLoc = ConsumeToken();
  //__ try must be followed by an open parenthesis
  if (Tok.isNot(tok::l_brace))
    return StmtError(Diag(Tok, diag::err_expected) << tok::l_brace);

  //Analysis__ try the code in {} until you encounter the right parenthesis corresponding to the left parenthesis
  StmtResult TryBlock(ParseCompoundStatement(
      /*isStmtExpr=*/false,
      Scope::DeclScope | Scope::CompoundStmtScope | Scope::SEHTryScope));
  //Error encountered during parsing, returned.
  if (TryBlock.isInvalid())
    return TryBlock;

  StmtResult Handler;
  if (Tok.is(tok::identifier) &&
      Tok.getIdentifierInfo() == getSEHExceptKeyword()) {
    //The next keyword is exception. Read the block of exception
    SourceLocation Loc = ConsumeToken();
    Handler = ParseSEHExceptBlock(Loc);
  } else if (Tok.is(tok::kw___finally)) {
    //The next keyword is finally. Read the finally block
    SourceLocation Loc = ConsumeToken();
    Handler = ParseSEHFinallyBlock(Loc);
  } else {
    //Other keywords are incorrect syntax and report errors
    return StmtError(Diag(Tok, diag::err_seh_expected_handler));
  }

  if(Handler.isInvalid())
    return Handler;

  //Successfully resolved the try block
  return Actions.ActOnSEHTryBlock(false /* IsCXXTry */,
                                  TryLoc,
                                  TryBlock.get(),
                                  Handler.get());
}
Also resolved except/finally/leave block
//Parsing finally syntax
StmtResult Parser::ParseSEHFinallyBlock(SourceLocation FinallyLoc) {
  if (Tok.isNot(tok::l_brace))
    return StmtError(Diag(Tok, diag::err_expected) << tok::l_brace);

  ParseScope FinallyScope(this, 0);
  Actions.ActOnStartSEHFinallyBlock();
  //Parsing the syntax of finally block
  StmtResult Block(ParseCompoundStatement());
  if(Block.isInvalid()) {
    Actions.ActOnAbortSEHFinallyBlock();
    return Block;
  }
  //Create finally syntax
  return Actions.ActOnFinishSEHFinallyBlock(FinallyLoc, Block.get());
}

Create the syntax of try/except/finally/leave to facilitate later syntax analysis

//The syntax of SEHTry is created
StmtResult Sema::ActOnSEHTryBlock(bool IsCXXTry, SourceLocation TryLoc,
                                  Stmt *TryBlock, Stmt *Handler) {
  FSI->setHasSEHTry(TryLoc);
  return SEHTryStmt::Create(Context, IsCXXTry, TryLoc, TryBlock, Handler);
}
 //The syntax of SEHExcept is created
StmtResult Sema::ActOnSEHExceptBlock(SourceLocation Loc, Expr *FilterExpr,
                                     Stmt *Block) {
  return SEHExceptStmt::Create(Context, Loc, FilterExpr, Block);
}
//The syntax of SEHFinal is created
StmtResult Sema::ActOnFinishSEHFinallyBlock(SourceLocation Loc, Stmt *Block) {
  CurrentSEHFinally.pop_back();
  return SEHFinallyStmt::Create(Context, Loc, Block);
}

3. Syntax analysis

Various phrases are recognized from the token sequence output by the lexical analyzer, and a syntax analysis tree (AST tree) is constructed.

The code is mainly located in:

clang\lib\CodeGen\CGException.cpp CGCleanup.cpp

The following three functions are the key functions of SEH front-end processing. Our main changes to the front-end are also in this place

void CodeGenFunction::EmitSEHTryStmt(const SEHTryStmt &S) {
  EnterSEHTryStmt(S);
  {
    //The tag enters a try block because the try block will be nested
    JumpDest TryExit = getJumpDestInCurrentScope("__try.__leave");
    SEHTryEpilogueStack.push_back(&TryExit);

    llvm::BasicBlock *TryBB = nullptr;
    // This place is our modification point,
    // Generate invoke void @llvm.seh.try.begin()
    EmitRuntimeCallOrInvoke(getSehTryBeginFn(CGM));
    if (SEHTryEpilogueStack.size() == 1) // outermost only
      TryBB = Builder.GetInsertBlock();
    
    // Generate IR code in try block
    EmitStmt(S.getTryBlock());

    //Generate catchblock, etc
    getInvokeDest();
    //Note: do not call EmitSehTryScopeEnd here, otherwise part of the test code will crash when compiled
    
    // Volatilize all blocks in Try, till current insert point
    if (TryBB) {
      llvm::SmallPtrSet<llvm::BasicBlock *, 10> Visited;
      VolatilizeTryBlocks(TryBB, Visited);
    }

    //try block exit
    SEHTryEpilogueStack.pop_back();

    if (!TryExit.getBlock()->use_empty())
      EmitBlock(TryExit.getBlock(), /*IsFinished=*/true);
    else
      delete TryExit.getBlock();
  }
  ExitSEHTryStmt(S);
}
void CodeGenFunction::EnterSEHTryStmt(const SEHTryStmt &S) {
  CodeGenFunction HelperCGF(CGM, /*suppressNewContext=*/true);
  HelperCGF.ParentCGF = this;
  if (const SEHFinallyStmt *Finally = S.getFinallyHandler()) {
    // If there is a finally block, llvm will generate a function from the finally block and get the function pointer
    llvm::Function *FinallyFunc =
        HelperCGF.GenerateSEHFinallyFunction(*this, *Finally);

    // push to EHStack and PopCleanupBlock for processing
    EHStack.pushCleanup<PerformSEHFinally>(NormalAndEHCleanup, FinallyFunc);
    return;
  }

  // The following are all handling problems__ except block
  const SEHExceptStmt *Except = S.getExceptHandler();
  EHCatchScope *CatchScope = EHStack.pushCatch(1);
  SEHCodeSlotStack.push_back(
      CreateMemTemp(getContext().IntTy, "__exception_code"));

  //If so__ except(1)
  llvm::Constant *C =
    ConstantEmitter(*this).tryEmitAbstract(Except->getFilterExpr(),
                                           getContext().IntTy);
  if (CGM.getTarget().getTriple().getArch() != llvm::Triple::x86 && C &&
      C->isOneValue()) {
    CatchScope->setCatchAllHandler(0, createBasicBlock("__except"));
    return;
  }

  //If so__ except(filterfunc())
  //You can use a filter function as the condition of except ion. Here, the function of filterffunc is generated and the function pointer is obtained
  llvm::Function *FilterFunc =
      HelperCGF.GenerateSEHFilterFunction(*this, *Except);
  llvm::Constant *OpaqueFunc =
      llvm::ConstantExpr::getBitCast(FilterFunc, Int8PtrTy);
  CatchScope->setHandler(0, OpaqueFunc, createBasicBlock("__except.ret"));
}
void CodeGenFunction::ExitSEHTryStmt(const SEHTryStmt &S) {
  // This is our modification point and generates invoke void @llvm.seh.try.end(). This place is very important 
  if (Builder.GetInsertBlock()) {
    llvm::FunctionCallee SehTryEnd = getSehTryEndFn(CGM);
    EmitRuntimeCallOrInvoke(SehTryEnd);
  }

  // finally block,
  // Corresponding to the previous ehstack. Pushcleanup < performsehfinally > (normalandehcleanup, finally func);
  if (S.getFinallyHandler()) {
    PopCleanupBlock();
    return;
  }
  // Otherwise, we must have an __except block.
  // The following code generates the exception block and the corresponding jump related IR (catch switch, catch RET, etc.)
  const SEHExceptStmt *Except = S.getExceptHandler();
  assert(Except && "__try must have __finally xor __except");
  EHCatchScope &CatchScope = cast<EHCatchScope>(*EHStack.begin());

  //Corresponding to the following IR
  /*
  8:                                                ; preds = %4, %0
  %9 = catchswitch within none [label %10] unwind to caller
  */
  // The fall-through block.
  llvm::BasicBlock *ContBB = createBasicBlock("__try.cont");
  // We just emitted the body of the __try; jump to the continue block.
  if (HaveInsertPoint())
    Builder.CreateBr(ContBB);
  // Check if our filter function returned true.
  emitCatchDispatchBlock(*this, CatchScope);
  // Grab the block before we pop the handler.
  llvm::BasicBlock *CatchPadBB = CatchScope.getHandler(0).Block;
  EHStack.popCatch();
  EmitBlockAfterUses(CatchPadBB);

  //Corresponding to the following IR
  /*
  10:                                               ; preds = %8
  %11 = catchpad within %9 [i8* null]
  catchret from %11 to label %12
  */
  llvm::CatchPadInst *CPI =
      cast<llvm::CatchPadInst>(CatchPadBB->getFirstNonPHI());
  llvm::BasicBlock *ExceptBB = createBasicBlock("__except");
  Builder.CreateCatchRet(CPI, ExceptBB);
  EmitBlock(ExceptBB);

  // On Win64, the exception code is returned in EAX. Copy it into the slot.
  if (CGM.getTarget().getTriple().getArch() != llvm::Triple::x86) {
    llvm::Function *SEHCodeIntrin =
        CGM.getIntrinsic(llvm::Intrinsic::eh_exceptioncode);
    llvm::Value *Code = Builder.CreateCall(SEHCodeIntrin, {CPI});
    Builder.CreateStore(Code, SEHCodeSlotStack.back());
  }

  //Generate Catch block code under IR
  EmitStmt(Except->getBlock());

  // End the lifetime of the exception code.
  SEHCodeSlotStack.pop_back();

  //Jump to the end of exceptblock and continue to execute the code after except block
  //Corresponding to the IR above
  /*
  15:                                               ; preds = %12, %16
  ret void
  16:                                               ; preds = %4
  br label %15
  */
  if (HaveInsertPoint())
    Builder.CreateBr(ContBB);
  EmitBlock(ContBB);
    
}

Through the above code modification, a simple SEH function can generate the correct IR, but there are many ways to write SEH. We have stepped on various pits. The next chapter introduces the various pits we have stepped on at the front end.

Tags: Windows CLang llvm

Posted on Sun, 03 Oct 2021 18:48:08 -0400 by ale_jrb