current position:Home>Smplify introduction and in python2 7 operation in environment

Smplify introduction and in python2 7 operation in environment

2022-05-15 05:59:43daoboker

Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Getting Start

This paper introduces a new technology SMPLify. By using SMPLify, We can add a simple picture without manually labeling features 2D The human posture image is transformed into the corresponding 3D Model . First , The author uses a method called DeepCut Of CNN To extract 2D Feature joint points of image input , But at this time, the joint point lacks depth information . The author then reviews the results obtained in the previous step 2D Joint points use what they previously proposed SMPL Model to generate the corresponding 3D Model , And then through Powell’s dogleg method( Dogleg algorithm ) To optimize 3D The joint points of the model , Finally, get the required output .


In the official introduction SMPLify Before the content of , I'd like to briefly describe SMPLify Two important tools used .DeepCut It's a CNN, It first extracts... From the picture body part candidates( Candidate areas for human parts ), Then make each candidate region correspond to a joint point , Make each joint point as a node in the graph . Next ,DeepCut Put the joint points belonging to the same person into one category , At the same time, mark which part of the human body each node belongs to . Last , The nodes with different markers in the same class are combined into a person's pose estimation .

This attitude estimation article involves many new knowledge points , It's hard to master completely in a moment DeepCut The content of , So I won't introduce too much here for the time being . In this article , We just need to know ,DeepCut You can extract 2D Joint points in the portrait , And calculate the corresponding confidence of these joint points . actually , It is now known as Deepercut, It seems to be right in efficiency and effect DeepCut Improved , If in SMPLify On the application Deepercut Words , Maybe you can get better results . But considering that the main direction of our project is fashion style generator, I will not delve into this aspect for the time being ( If you want to seek further improvement in the future ,Deepercut Technology may be a good alternative), We just need to explore how to put SMPLify Apply it to your own project .

Here are some related reading materials :DeepCut: Joint subset partition and labeling for multi person pose estimationDeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model.


SMPL It is a human modeling method based on parametric model , It consists of Loper et al. Put forward . We need to make sure that SMPL The advantage of requires a lot of background knowledge , We only need to know some basic information about it first . stay SMPL in , One 3D Models can be β And θ Express , among ,β by body shape( That is, the height of the human body, fat and thin 、 Head body ratio, etc 10 Parameters ),θ by body pose( That is, the overall motion posture and 24 The relative angle of the two joints , common 75 Parameters ).

Here are some related reading materials :SMPL: A Skinned Multi-Person Linear ModelSMPL official website.



SMPLify stay 2016 from Bogo et al. Put forward , It can put a piece of 2D The portrait is transformed into the corresponding 3D Model , And in a world that contains many complex postures Leeds Sports Pose Dataset Good results have been achieved in , Its official website is :SMPLify official website.

We can observe from the effect picture below ,SMPLify You can really deal with 2D The portrait is transformed into the corresponding 3D Excellent performance on the problem of model , meanwhile , Its biggest advantage is that it can get a good conversion effect without manual feature annotation ( For all that , Manual gender tagging can get better results ).

 Insert picture description here
SMPLify A manikin can be expressed as M(β, θ, γ), among :β Express shape Parameters ;θ Express pose Parameters ;γ Representation phase
Spatial transformation parameters of the machine .SMPLify Output M By 6890 A triangular mesh of vertices , And β And θ Definition and SMPL Consistent with the definitions in .SMPLify Contains 3 A template model , Use 2000 A man registration And 2000 A woman registration To train two gender-specific Model ( Pink ) With a gender-neutral Model ( Blue ).

The pose transformation of the model is through 24 The skeleton skin of joint points is realized ,θ Is the relative rotation angle of each joint point . In the beginning , We can get from the template model 3D Initial positions of joint points and mesh vertices . then , We go through β Calculate the joint position considering the body shape , Write it down as J(β). after , We go through θ Calculate the position of joint points after considering pose transformation , Write it down as R(J(β), θ). We also need to be careful LSP Joint points and SMPL There are some differences in joint points , Unable to build a bijection relationship . therefore , We will LSP Joint points and SMPL The points with the highest similarity in the joint points are linked together . Last , We use perspective camera model hold SMPL The joint points are projected onto the picture , We record the projection parameters of the camera as K.LSP Joint points and SMPL The corresponding relationship of joint point labels is as follows :

indexjoint namecorresponding SMPL joint ids
0Right ankle8
1Right knee5
2Right hip2
3Left hip1
4Left knee4
5Left ankle7
6Right wrist21
7Right elbow19
8Right shoulder17
9Left shoulder16
10Left elbow18
11Left wrist20
13Head topvertex 411 (see line

in consideration of LSP And SMPL Yes hips There are great differences in the definition of , And SMPL Missing right neck And head top The definition of , We may be able to build better mapping relationships .

and DeepCut The output format of joint points is as follows :

The pose in 5x14 layout. The first axis is along per-joint information,
the second the joints. Information is:
  1. position x,
  2. position y,
  3. CNN confidence,
  4. CNN offset vector x,
  5. CNN offset vector y.

First , Let's deepen our understanding of SMPLify Preliminary understanding of the implementation process :

 Insert picture description here

SMPLify First receive a 2D A portrait of , In addition to gender information , We don't need to provide any additional information . Next , We use the previously mentioned DeepCut Network to get 2D Node information of portrait . after , We use SMPL To generate the corresponding 3D Model , And pass Powell’s dogleg method To minimize the nodes and 2D Error of portrait node , So as to obtain the optimal model .

SMPL The manikin can be expressed as M(β,θ,γ) In the form of , among ,β by body shape,θ by body pose,γ by translation( in fact , We can learn from the values in the following text , there translation That is to say camera translation). It should be noted that ,DeepCut The number of output nodes is the same as SMPL The number of output nodes is slightly different , Here, the author connects the similar nodes in the two methods .

Approximating Bodies with Capsules

 Insert picture description here

for fear of interpenetration( That is, the mannequin has an unnatural posture , People's limbs intersect abnormally ), We often need to do additional calculations , And calculate the surface of the human body interpenetration It's very expensive , The author therefore uses capsule( Capsule body ) To approximate different parts of the manikin . ad locum , The author completed the regression from body shape And body pose To capsule Transformation .

##Objective Function

This part belongs to the key content of the article , Let's start with Powell’s dogleg method The objective function used in E(β, θ), It is 5 Weighted sum of different error terms :
 Insert picture description here
In this formula ,β And θ That's been explained ,Jest For from DeepCut To obtain the 2D node ,K Represents the camera parameters , Four different λ Represents the weight of each sub error .

We use the joint point error term (joint-based data term) To punish predicted joint points and 2D Weighted distance sum of joint points . among ,ΠK To predict 3D The joint points are based on the camera projection parameters K Got 2D Projection relation . According to the different 2D Prediction confidence of joint points wi, We give it different proportions in the error term . Last , We use Geman-McClure Norm to represent the predicted joint points and 2D Distance of joint point , So as to improve the performance of processing blurred input pictures .
 Insert picture description here
about joint-based error term, We can understand it as directly calculating the error between corresponding nodes .J(β)i Express basis body shape What you get 3D Prediction of bone node position ;Rθ(J(β)i) Express 3D Pose node , and 3D The pose node is through 3D Obtained by transforming bone nodes . In obtaining 3D After the pose node , We use ΠK Project it onto 2D On , In the middle of K It can be roughly understood as perspective . then , We use Geman-McClure Norm to represent the model node and the actual 2D Node error , And use from DeepCut The confidence of the obtained node wi As the weight of node error . Last ,joint-based error term Equal to the weighted sum of each node error .

As mentioned above interpenetration The problem of , And to solve this problem , We need to introduce an error term based on a priori knowledge . there i Represents bending with the knee 、 Pose nodes related to elbow bending . there θ representative rotation The angle of , Under normal circumstances ,θ negative , be 0 < exp(θ) < 1; if θ Positive value , It means an unnatural twist ,exp(θ) Generally will  » 1. It should be noted that ,θ Equal to... Without rotation 0.
 Insert picture description here
In order to further eliminate abnormal posture , We can set some parameters for the model in advance favorable pose( It can be understood as a more common posture ). The author in CMU marker dataset through MoSh To get a series of SMPL Model , Then use these models to establish several Gaussian distributions . Then in the error term ,gj representative 8 The... In a Gaussian distribution j individual , Under normal circumstances , We need to calculate the negative logarithm of the weighted probability sum of all Gaussian distributions , But this method is very computationally expensive , We can replace the weighted probability sum of all Gaussian distributions with the largest weighted probability of all Gaussian distributions . therefore , Let's introduce a constant c To counteract the effects of this approximation . In order to save computation , The author only takes the logarithm of the maximum deviation in the process of Gaussian mapping as the error .
 Insert picture description here
It was mentioned before that capsule To approximate the human body , And to avoid interpenetration The problem of , The error calculation method is given here . We will go further capsule Approximate sphere, there C(θ,β) Represents the central coordinate of the capsule ,r(β) Represents the radius of the capsule ,σ(β) be equal to r(β)/3.I(i) For the second i A sphere , A collection of incompatible spheres . Here we compare the center distance and radius of the two spheres , Comparing the sizes of the two can obtain the positional relationship between the two spheres , So as to achieve punishment interpenetration The role of . Tell the truth , I don't quite understand here 3D isotropic Gaussian The role of , Further study is needed to fully understand the operation of this step .
 Insert picture description here
Last , The author defines a shape error term To measure body shape The error of the . It's used here PCA( namely Principal Component Analysis, Principal component analysis , To eliminate the correlation between variables ) To get error term.
 Insert picture description here


Because the angle of taking pictures is uncertain , The input portrait may not be facing the camera plane , therefore , We need to introduce γ(camera translation) To further optimize the model . First , We can easily get the approximate focal length of the camera from the portrait picture , Then use the method of similar triangle to calculate the approximate depth of the human body ( According to 3D The length of the model torso node is the same as that of the model 2D The ratio of trunk node length to estimated depth ). Then before the formal calculation , First use Ej Yes γ To optimize .

Next , Can officially begin to 2D The portrait is transformed into 3D Model . In the process , The author found that λθ And λβ Setting higher initial values and letting them decay gradually can effectively prevent the results from falling into local optimization .

In the beginning , We can't know the direction of the human body , So when 2D When the shoulder distance in the node is less than a certain value , We can conclude that the current portrait is not facing the camera , We try to choose the direction of the human body , And select the better angle .

Last , Use Powell’s dogleg method Obtain a set of optimal 3D node , The whole process just needs 1 Minutes to complete .


Considering the actual situation , A picture 2D Pictures rarely have a ground truth Of 3D Model , At the same time, the author uses synthetic data And real data To evaluate .

stay synthetic data In the part , The author starts with a series of SMPL Get... From the model 2D node , And for these synthetic data Add random Gaussian noise , And then to synthetic data For input , The model is evaluated by the above method . And outside , The author also assumes that body pose, On this basis, try to use as few nodes as possible to represent body shape. The results of this part are shown in the figure below :

 Insert picture description here

stay real data In the part , The authors are in HumanEva-I.Human3.6M.Leeds Sports Dataset. The training set is compared with other models . A series of data show that ,SMPLify Better than some other advanced models , Just look at the data here .

 Insert picture description here
 Insert picture description here

meanwhile , The author also proves with experimental data that ,multi-modal pose prior And interpenetration error term Can significantly improve the performance of the model .

Next, some visual results are attached to help prove SMPLify The advantages of . We can observe that ,SMPLify Not only in some simple postures, but also significantly better than other models ,SMPLify It can still give a good performance when dealing with complex motor posture .

 Insert picture description here  Insert picture description here

and SMPLify There are also failures , Processing Leeds Sports Dataset. when SMPLify Wrong models will also be generated , Here we can simply summarize the failure . Possible failures include : The height of different limbs overlaps 、 The overlap of different characters in the absence of depth information 、 Unable to correctly distinguish the direction of the character . In the middle of it , I think the first and second failure cases are difficult to solve ; In contrast, the third case can be measured in 2D When the node shoulder distance is less than a certain value , Try several different groups of possible human body orientation angles , Not just 180° As step size .

 Insert picture description here


in general ,SMPLify Can only rely on simple 2D Image input to get good 3D Modeling results . actually , I may not need to think about how to SMPLify Improvement , Instead, think about how to apply... In your own projects SMPLify. in my opinion , current image-based virtual try-on network The biggest problem is that we can't deal with complex posture and large deformation , And outside , In terms of results , The output is just a simple style transfer , Failed to retain some three-dimensional information of clothing . If we put SMPLify The output of as image-based virtual try-on network Enter the depth information , We may get a better result ( This step can refer to some current fitting systems that need in-depth information , These systems usually work well , It's just not practical , If you use SMPLify Words , It is possible to turn a problem requiring depth measurement into a problem 2D Image processing problems ).



CPM and SMPL Model download and installation reference

It should be noted that SMPL The code has two , One is the original model code, Part of it is the use of SMPL Code for, Don't confuse the two , The specific use and installation process is in the latter FAQ It has been made very clear in , It can be adjusted slightly according to your specific situation

analysis CPM

models There is CPM Original model ,test_imgs Stored is the input picture ,utils There are tools in , Everything else is demo file , We can directly use... Without special requirements demo What we want to run joints Location of points ( No confidence ), by 14×2 A list of .

analysis SMPL

Analyze folder structure
 Insert picture description here
smplify_public Next

1. requirements It contains those that need to be installed in advance python library , among opendr The library can first `sudo apt-get install libosmesa6-dev`,`pip3 install opendr==0.78` Little buddy, personally test python3 Probably can't fit , You can change it yourself or search online python3 Under the SMPL Code

2. venv yes README An independent environment built according to the tips in virtualvenv, I installed it directly in conda In virtual environment

3. images Link to the original picture

4. results What is stored in the is actually the parameters that need to be set before running

est_joints.npz It stores the joint point information of the human body , That is, we pass CPM The joint point position obtained by the model

lsp_gender.csv The gender of the input picture is specified in , So that it can automatically call the manikin parameters of different genders at runtime

meshes.hdf5 Is the original model parameter , We can leave it alone for the time being , It doesn't matter whether you have it or not

5. code There is running code Is the main program running , Is the program to render the model ,models What is stored in the folder is SMPL Model

Get any picture SMPL Model results

1. take kuli.jpg Put in test_imgs Next , take Change the image path in to the path of the input image

function Program , Get the annotation picture of joint points and joint point information , We save the joint point information as .npy Documents for subsequent use

CPM The node order of the result is ( Head , Neck , Left shoulder , Left elbow , Left arm , Right shoulder , Right elbow , Right arm , Left crotch , Left knee , His left foot , Right crotch , Right knee , Right foot )

2. SMPL The order of input required is ( Right ankle,Right knee,Right hip,Left hip, Left knee, Left ankle,Right wrist, Right elbow,Right shoulder, Left shoulder, Left elbow ,Left wrist, Neck ,Head top), Node information should be stored in /smplify_public/results/est_joints.npz in , Let's rewrite the above data in this order and replace est_joints.npz, And revise the gender document ,lsp_gender.csv, Finally, put your picture in image The link to the folder

Because of this official CPM Use three of the models in the file googledrive download , So I used folk code to get est_joints.npy. The order of the points I get from the code I use is the same as smplify The input order is the same, so there is no need to change the order .
Inside best Weights can be downloaded from Taobao customer service

3. Got .npy replace est_joints.npz The first two lines in , In this way, the changes to the source code can be minimized .

Read the official est_joints.npz:

import numpy as np
npz = np.load('last.npz')    #  Read out such a type numpy.lib.npyio.NpzFile
print type(npz)
print npz.files                    #  Read npz Of ' key ', yes est_joints
ndarray = npz['est_joints']
print ndarray                      #  Read npz Of ' value ', The discovery is a multidimensional array 
print type(ndarray)                #  Read type 
print np.array(ndarray).shape      #  The multivariate array dimension is (28, 14, 2000)

The converted code is :

import numpy as np
array = np.load('est_joints.npy').T  #  Makes it 2×14
ndarray = np.load('est_joints.npz')['est_joints']  # (28,14,2000)
ndarray[:2, :, 0] = array  #  Change the first two lines 
np.savez('last.npz', est_joints = ndarray)

4. function, Get the results

 Insert picture description here
The next output dictionary is SMPLify In the code Medium run_single_fit() The value returned by the function . The returned is one that includes the following key It's worth it dict Variable of type :cam_t、f、pose、betas
 Insert picture description here
cam_t It's the translation of the camera ;f It's the focal length of the camera ;pose yes SMPL Model pose Parameters ;betas yes SMPL Model shape Parameters . If we just want to achieve 3D Virtual fitting on , Then we don't have to pay attention to cam_t And f; If we want to finish 3D After the virtual fitting on the, paste the fitting results back to the original image , We need to use... At the end cam_t And f To solve the camera .

When we want to pass pose And betas Restore the corresponding SMPL Model time , We need to call SMPL Of load_model() Function to read the corresponding gender SMPL Template model , And then put model Of pose And betas Modify the parameter to the value we obtained earlier , Finally, the output is obj file .( I read this piece written by others , But I don't know what this step means , There is already a corresponding two-dimensional graph .pkl I want more obj What for? )
 Insert picture description here

copyright notice
author[daoboker],Please bring the original link to reprint, thank you.

Random recommended