current position:Home>How to embed Python in go

How to embed Python in go

2022-02-01 00:05:36 Zioyi

「 This is my participation 11 The fourth of the yuegengwen challenge 20 God , Check out the activity details :2021 One last more challenge

If you look at it new Datadog Agent, You may notice that most code bases use Go Compiling , Although the checks we use to collect indicators are still used Python Compiling . This is probably because Datadog Agent It's a Embedded CPython The general of the interpreter Go Binary , Can be executed at any time on demand Python Code . This process is transparent through the abstraction layer , Enables you to write idiomatic Go The code is running at the bottom Python.

video

stay Go Embed... In the application Python There are many reasons :

  • It is useful during the transition period ; You can gradually integrate existing Python Part of the project is migrated to the new language , Without losing any function in the process .
  • You can reuse existing Python Software or library , Without having to re implement in a new language .
  • You can perform routine by loading Python Script to dynamically expand your software , Even at runtime .

There are many reasons , But for the Datadog Agent Come on , The last point is crucial : We want to do without recompiling Agent, Or compile anything to perform custom checks or change existing checks .

The embedded CPython It's simple , And well documented . The interpreter itself uses C Compiling , And provides a C API Perform the underlying operations programmatically , For example, creating objects 、 Importing modules and calling functions .

In this paper , We will show some code examples , We will be with Python Interact while maintaining Go Code idioms , But before we go on , We need to solve a gap : The embedded API yes C Language , But our main application is Go, How can this work ?

Introduce cgo

Yes Many good reasons Convince you why not introduce... Into the stack cgo, But embedded CPython That's why you have to do this .cgo It's not language , It's not a compiler . It is External function interface Foreign Function Interface(FFI), One that allows us to Go Use in to call different languages ( especially C) The mechanism of functions and services written .

When we mention “cgo” when , We actually mean Go A set of tools used at the bottom of the tool chain 、 library 、 Functions and types , So we can implement go build To get our Go Binary . Here's how to use cgo Example program for :

package main

// #include <float.h>
import "C"
import "fmt"

func main() {
    fmt.Println("Max float value of float is", C.FLT_MAX)
}

 Copy code 

In this case with header files ,import "C" The comment block above the instruction is called “ preface preamble”, Can include actual C Code . After import , We can go through “C” Fake bag to “ Jump ” To external code , Access constant FLT_MAX. You can call go build To build , It's like an ordinary Go equally .

If you want to see cgo What did you do behind this , Can run go build -x. You will see “cgo” The tool will be called to generate some C and Go modular , The C and Go Compiler to build the object module , Finally, the linker puts everything together .

You can Go Blog Read more about cgo Information about , To learn more about this article and to do some useful links .

Now we know cgo What can be done for us , Let's see how to use this mechanism to run some Python Code .

The embedded CPython: A Getting Started Guide

Technically speaking , The embedded CPython Of Go The program is not as complicated as you think . in fact , We just need to run Python Initialize interpreter before code , And close it when it's done . Please note that , We use it in all examples Python 2.x, But we only need to make a few adjustments to apply to Python 3.x. Let's take an example :

package main

// #cgo pkg-config: python-2.7
// #include <Python.h>
import "C"
import "fmt"

func main() {
    C.Py_Initialize()
    fmt.Println(C.GoString(C.Py_GetVersion()))
    C.Py_Finalize()
}

 Copy code 

The above example does exactly the following Python What the code has to do :

import sys
print(sys.version)
 Copy code 

You can see that we have added a... In the preface #cgo Instructions ; These instructions are passed to the tool chain , Let you change the build workflow . under these circumstances , We tell cgo call pkg-config To collect builds and links named python-2.7 The required flags for the library , And pass these signs to C compiler . If installed in your system CPython Development library and pkg-config, You just need to run go build To compile the above example .

Back to code , We use Py_Initialize() and Py_Finalize() To initialize and close the interpreter , And use Py_GetVersion C Function to get the string of embedded interpreter version information .

If you want to know , All we need to call together C Language Python API Of cgo The code is all template code . That's why Datadog Agent rely on go-python To complete all embedding operations ; The library is C API Provides a Go Friendly lightweight bag , And hide cgo details . This is another basic embedded example , This use go-python:

package main

import (
    python "github.com/sbinet/go-python"
)

func main() {
    python.Initialize()
    python.PyRun_SimpleString("print 'hello, world!'")
    python.Finalize()
}

 Copy code 

This looks closer to ordinary Go Code , No more exposure cgo, We can visit Python API Use it back and forth Go character string . Embedded looks powerful and developer friendly , It's time to make full use of the interpreter : Let's try loading from disk Python modular .

stay Python On the other hand, we don't need anything complicated , omnipresent “hello world” We can achieve our goal :

# foo.py
def hello():
    """
    Print hello world for fun and profit.
    """
    print "hello, world!"
 Copy code 

Go The code is a little more complex , But still readable :

// main.go
package main

import "github.com/sbinet/go-python"

func main() {
    python.Initialize()
    defer python.Finalize()

    fooModule := python.PyImport_ImportModule("foo")
    if fooModule == nil {
        panic("Error importing module")
    }

    helloFunc := fooModule.GetAttrString("hello")
    if helloFunc == nil {
        panic("Error importing function")
    }

    // The Python function takes no params but when using the C api
    // we're required to send (empty) *args and **kwargs anyways.
    helloFunc.Call(python.PyTuple_New(0), python.PyDict_New())
}

 Copy code 

Build time , We need to PYTHONPATH The environment variable is set to the current working directory , So that the import statement can find foo.py modular . stay shell in , The command is as follows :

$ go build main.go && PYTHONPATH=. ./main
hello, world!
 Copy code 

Terrible global interpreter lock

To embed Python Must introduce cgo , It's a trade-off : The build speed will slow down , The garbage collector will not help us manage the memory used by external systems , Cross compiling is also difficult . For a particular project , Whether these questions are debatable , But I think there are some non-negotiable issues :Go Concurrency model . If we can't start from goroutine Run in Python, So use Go It doesn't make sense .

Dealing with concurrency 、Python and cgo Before , We still need to know something : It is Global interpreter lock Global Interpreter Lock, namely GIL.GIL It's a language interpreter (CPython Is one of them ) A mechanism widely used in , It can prevent multiple threads from running at the same time . It means CPython Any execution of Python Programs cannot run in parallel in the same process . Concurrency is still possible , Lock is speed 、 A good trade-off between security and ease of implementation , that , When it comes to embedding , Why does this cause problems ?

Be a regular 、 Non embedded Python Program startup , Don't involve GIL To avoid unnecessary overhead in locking operations ; In some Python When the code first requests a build thread GIL And it started . For each thread , The interpreter creates a data structure to store the current relevant state information and lock GIL. When the thread completes , The state is restored ,GIL Unlocked , Ready to be used by other threads .

When we are from Go The program runs Python when , None of the above will happen automatically . without GIL, our Go The program can create multiple Python Threads , This may lead to competitive conditions , This leads to fatal runtime errors , And it is likely that segmentation errors will lead to the whole Go Application crash .

The solution is when we start from Go Call explicitly when running multithreaded code GIL; Code is not complicated , because C API Provides all the tools we need . In order to better expose this problem , We need to write something about CPU The limit Python Code . Let's add these functions to the... In the previous example foo.py Module :

# foo.py
import sys

def print_odds(limit=10):
    """
    Print odds numbers < limit
    """
    for i in range(limit):
        if i%2:
            sys.stderr.write("{}\n".format(i))

def print_even(limit=10):
    """
    Print even numbers < limit
    """
    for i in range(limit):
        if i%2 == 0:
            sys.stderr.write("{}\n".format(i))

 Copy code 

We will try to start from Go Concurrent printing of odd and even numbers , Use two different goroutine( Therefore, threads are involved ):

package main

import (
    "sync"

    "github.com/sbinet/go-python"
)

func main() {
    // The following will also create the GIL explicitly
    // by calling PyEval_InitThreads(), without waiting
    // for the interpreter to do that
    python.Initialize()

    var wg sync.WaitGroup
    wg.Add(2)

    fooModule := python.PyImport_ImportModule("foo")
    odds := fooModule.GetAttrString("print_odds")
    even := fooModule.GetAttrString("print_even")

    // Initialize() has locked the the GIL but at this point we don't need it
    // anymore. We save the current state and release the lock
    // so that goroutines can acquire it
    state := python.PyEval_SaveThread()

    go func() {
        _gstate := python.PyGILState_Ensure()
        odds.Call(python.PyTuple_New(0), python.PyDict_New())
        python.PyGILState_Release(_gstate)

        wg.Done()
    }()

    go func() {
        _gstate := python.PyGILState_Ensure()
        even.Call(python.PyTuple_New(0), python.PyDict_New())
        python.PyGILState_Release(_gstate)

        wg.Done()
    }()

    wg.Wait()

    // At this point we know we won't need Python anymore in this
    // program, we can restore the state and lock the GIL to perform
    // the final operations before exiting.
    python.PyEval_RestoreThread(state)
    python.Finalize()
}

 Copy code 

When reading the examples , You may notice a pattern , This mode will become our running embedded system Python The customary way of writing code :

  1. Save state and lock GIL.
  2. perform Python.
  3. Restore state and unlock GIL.

The code should be simple , But we want to point out one subtle detail : Please note that , Although I borrowed GIL perform , Sometimes we call PyEval_SaveThread() and PyEval_RestoreThread() To operate GIL, Sometimes ( see goroutines Inside ) We are right. PyGILState_Ensure() and PyGILState_Release() To do the same thing .

We said when from Python When operating multithreading , The interpreter is responsible for creating the data structure needed to store the current state , But when the same thing happens in C API when , We'll handle it .

When we use go-python When initializing the interpreter , We are in Python Operating in context . therefore , When calling PyEval_InitThreads() when , It initializes the data structure and locks it GIL. We can use PyEval_SaveThread() and PyEval_RestoreThread() Operate on the existing state .

stay goroutines in , We from Go Context operations , We need to explicitly create the state and delete it when we're done , This is it. PyGILState_Ensure() and PyGILState_Release() For what we have done .

Release Gopher

At this point , We know how to handle execution in an embedded interpreter Python The multithreading Go Code , But in GIL after , Another challenge is coming :Go The scheduler .

When one goroutine Startup time , It is arranged in the available GOMAXPROCS Execute on one of the threads , See here For more details on this topic . If one goroutine Happened to execute a system call or call C Code , The current thread will queue other threads waiting to run goroutine Hand over to another thread , So they have a better chance to run ; At present goroutine Be suspended , Wait for system call or C The function returns . When that happens , The thread will attempt to resume the paused goroutine, But if that's not possible , It will ask Go The runtime finds another thread to complete goroutine And go to sleep . goroutine Finally, it is arranged to another thread , It's done .

Consider this , Let's see when a goroutine When moved to a new thread , Run some Python Code goroutine What's going to happen :

  1. our goroutine start-up , perform C Call and pause .GIL Be locked .
  2. When C When the call returns , The current thread is trying to recover goroutine, But failed .
  3. The current thread tells Go The runtime looks for another thread to restore our goroutine.
  4. Go The scheduler finds an available thread and resumes goroutine.
  5. goroutine Almost finished. , And try to unlock before returning GIL.
  6. Threads stored in the current state ID From the original thread , With the current thread ID Different .
  7. collapse !

Fortunately , We can do this by goroutine In the runtime package LockOSThread Function to force Go runtime Always keep our goroutine Run on the same thread :

go func() {
    runtime.LockOSThread()

    _gstate := python.PyGILState_Ensure()
    odds.Call(python.PyTuple_New(0), python.PyDict_New())
    python.PyGILState_Release(_gstate)
    wg.Done()
}()
 Copy code 

This interferes with the scheduler and may introduce some overhead , But this is the price we are willing to pay .

Conclusion

To embed Python,Datadog Agent Must accept some trade-offs :

  • cgo The cost of the introduction .
  • Handle by hand GIL The task of .
  • During execution, will goroutine Restrictions bound to the same thread .

For the convenience of Go Run in Python Check , We are happy to accept each of them . But by recognizing these tradeoffs , We can minimize their impact , In addition to supporting Python Other restrictions introduced , We have no countermeasures to control the potential problems :

  • Build is automated and configurable , So developers still need to have and go build Very similar things .
  • Agent Lightweight version of , have access to Go Building Tags , Completely stripped Python Support .
  • This version only depends on Agent Hard coded core check ( Mainly system and network inspection ), But there is no cgo And can cross compile .

We will reassess our options in the future , And decide whether it's still worth keeping cgo; We can even reconsider the whole Python Is it still worth , wait for Go Plug-in package Mature enough to support our use cases . But for now , The embedded Python Well run , The transition from the old agent to the new agent can't be simpler .

Are you a multilingual who likes to mix different programming languages ? Do you like to understand the internal working principle of the language to improve the performance of your code ?


copyright notice
author[Zioyi],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202010005148355.html

Random recommended