current position:Home>[Python] Word2Vec predicts what will be left if I subtract 'love' from my 'life'

[Python] Word2Vec predicts what will be left if I subtract 'love' from my 'life'

2022-08-05 22:41:58Front-end fairy

German poet Schiller said:“A life without the radiance of love is worthless”

Not just Schiller,在整个人类历史上,How many people have discussed it "生命 "和 "爱 "之间的关系?

在这里,While paying tribute to the work of these predecessors,We use modern featsWord2Vec来问

如果从'生命'中减去'爱',还剩下什么?

Word2Vec

安装库

使用gensim

!pip install gensim

Download the trained modelWord2Vec

Download the model here:http://bit.ly/2srnKoy Unzip and use the fileja.bin

载入模型

import gensim
model = gensim.models.Word2Vec.load('/content/ja.bin')

查找 "人生"similar words

首先,我们使用Word2Vecdistributed notation to study analogs of the word life

for item, value in model.wv.most_similar('人生'):
    print(item,value)

输出结果

不知何故,This seems appropriate

心境 0.5031609535217285
命运 0.4962955415248871
幸福 0.4731469750404358
我 0.45822054147720337
老年 0.44594892859458923
书 0.4451371133327484
半生 0.4450538754463196
思索 0.4410485029220581
永远 0.43776482343673706
一生 0.43756037950515747

从 "生命 "中减去 "爱"

Let's actually try it outWord2VecLet's do some word vector arithmetic

代码如下

model.wv.most_similar(positive=['人生'],negative=['爱'])

输出结果

第一名是 "事业",第二名是 "金钱",以此类推

但不确定第9位的 "大联盟 "是什么意思

[('事业', 0.3442475199699402),
 ('金钱', 0.33152854442596436),
 ('机会', 0.2999703586101532),
 ('老年', 0.29941919445991516),
 ('沙利文', 0.29740941524505615),
 ('困难', 0.2900192439556122),
 ('成果', 0.286318302154541),
 ('理论', 0.2787232995033264),
 ('大联盟', 0.2783268094062805),
 ('压力', 0.27150285243988037)]]

Try the classic“王”+“女人”——“男人”

model.wv.most_similar(positive=['王','女'],negative=['男'])

输出结果

Here comes the princess

[('公主', 0.4609297811985016),
 ('王位', 0.4463587999343872),
 ('女王', 0.440536767244339),
 ('王室', 0.4360467791557312),
 ('父王', 0.4177766740322113),
 ('王妃', 0.4131211042404175),
 ('唐朝', 0.410072922706604),
 ('国师', 0.4071836471557617),
 ('李世明', 0.404167115688324),
 ('……', 0.40059027075767517)]

也用fastText来计算

Download the trained modelfastText

加载模型

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('/content/model.vec',binary=False)

查找 "人生" similar words

在此之前,得到“人生”The feature vector of

有 300 个维度. 顺便说一下,Word2Vec有100个维度

 word = '人生'
print(f'{word}的特征向量')
print(model[word])
model[word].shape

输出结果:

(300,)
print(f'{word}的同义词')
for item ,value in model.most_similar(word):
    print(item,value)

输出结果,它比Word2Vec更具体

synonym for life
生活方式 0.643797755241394
居住 0.5911297798156738
生涯 0.5900983810424805
一生 0.5889523029327393
人生观 0.5824947953224182
活着 0.5823161602020264
幸福 0.574641227722168
youth generation 0.5740978717803955
婚后生活 0.5648994445800781
记忆 0.5635090470314026

在 "人生" 中减去 "爱"

model.most_similar(positive=['人生'],negative=['爱'])

输出结果

It's too specific,The same goes for similar terms

像fastTextSuch a good model doesn't seem to be suitable for rough themes

[('回顾', 0.34721365571022034),
 ('Athlete's life', 0.32623374462127686),
 ('职业生活', 0.3085706830024719),
 ('职业', 0.30444127321243286),
 ('life plan', 0.2932748794555664),
 ('职业生涯', 0.2895069122314453),
 ('生活', 0.283188134431839),
 ('转折点', 0.2803061604499817),
 ('回顾', 0.27186983823776245),
 ('重新开始', 0.2653588056564331)]

结论

席勒说,A life without the radiance of love is worthless,But it doesn't have to sound like worthless

本文由 mdnice 多平台发布

copyright notice
author[Front-end fairy],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/217/202208052241092356.html

Random recommended