18 Word embeddings with fastmath.vector
- DRAFT π
authors: Nedeljko Radovanovic, Epidiah Ravachol, Daniel Slutsky
One of the uses of linear algebra is the embedding of texts (e.g., tokens, words, sentences, or chunks of texts) into high-dimentional vector spaces.
Even in relatively simple embedding methods like Word2Vec, the linear structure of vector space operations (e.g., addition, subtraction) can be meaningful in a way that relates to the meaning of texts.
18.1 Setup
ns noj-book.fastmath-vector-word-embeddings
(:require [clojure.string :as str]
(:as tc]
[tablecloth.api vector :as vec]
[fastmath.:as plotly]
[scicloj.tableplot.v1.plotly :as kind])) [scicloj.kindly.v4.kind
18.2 Data file
We will look into a few example vectors generated by Word2Vec. The original file wiki-news-300d-1M.vec.zip
was downloaded from the fasttext website. In the unix shell, we can process it as follows to generate examples.vec
:
zcat wiki-news-300d-1M.vec.zip | awk '$1=="female" || $1=="male" || $1=="queen" || $1=="king" || $1=="programming" || $1=="data" || $1=="bike" || $1=="bycicle"' > examples.vec
def examples-path "data/word2vec/examples.vec") (
(kind/code slurp examples-path)) (
0.0126 -0.0550 -0.0673 0.0531 0.0174 -0.0873 -0.0915 -0.1084 0.0638 -0.1266 0.1288 -0.1521 0.0414 0.0487 -0.1098 0.0475 0.0426 -0.1246 -0.1432 -0.0071 -0.0960 -0.0917 0.0393 0.0950 0.0114 -0.1023 -0.0695 -0.0641 -0.0265 0.0686 -0.1383 0.0106 0.1454 -0.0027 -0.0860 0.0948 0.1722 -0.0036 0.0301 -0.0527 0.0131 -0.1335 0.0675 0.1889 0.1442 -0.0038 0.0272 0.0707 0.0344 0.1593 0.0693 0.1345 -0.6185 -0.0433 0.0814 0.0601 -0.0173 -0.0063 -0.1585 0.0347 0.1010 0.1105 -0.0402 0.0807 -0.1082 -0.1269 0.0427 0.0505 -0.0171 0.0146 -0.2346 0.1206 -0.2111 0.0038 0.0396 -0.2016 0.1229 -0.0082 0.0068 -0.0157 0.0131 -0.1301 -0.0496 -0.2086 -0.0867 -0.0431 -0.0190 0.0439 0.2003 -0.0152 -0.0604 -0.0464 -0.0262 -0.0866 0.0103 -0.0961 0.0389 -0.2485 0.0551 0.0289 -0.1506 0.0000 -0.0063 0.0720 0.1091 0.0260 -0.1382 -0.0238 0.0308 0.0540 -0.0454 0.0231 0.0251 -0.1297 0.0104 -0.0128 -0.0184 -0.0386 -0.0788 -0.2565 0.0681 -0.0123 -0.1096 0.0779 -0.0328 0.1767 -0.1053 -0.0072 0.0207 -0.0320 -0.0518 -0.0530 0.1460 -0.0242 -0.0182 -0.0444 0.0988 0.0273 0.1687 -0.0212 -0.0752 -0.2203 -0.1497 0.1832 -0.0089 -0.0028 0.0450 0.0967 -0.0494 -0.0620 -0.0833 0.0100 -0.0099 0.0496 -0.0173 0.0270 -0.0317 -0.0298 0.0178 0.0116 -0.1266 0.0893 -0.0056 -0.0707 0.0680 0.0169 0.0916 0.0162 -0.0708 0.0578 -0.0099 0.0229 -0.0441 0.0020 0.0162 -0.0481 0.2272 -0.1124 0.0480 -0.0027 -0.0517 -0.0157 0.0676 -0.0845 -0.0502 -0.0627 0.0091 0.0294 -0.1341 0.0313 -0.0102 0.0505 0.0626 -0.0876 -0.0579 -0.1177 -0.1129 0.1468 0.0726 -0.0222 0.1514 0.0637 -0.0264 0.0147 -0.0741 -0.0696 -0.0485 -0.1046 0.0585 -0.0353 0.0113 0.0756 -0.0150 -0.1605 0.0799 -0.0216 -0.1031 0.0089 0.0612 -0.1007 -0.0250 -0.0576 -0.0061 -0.0604 0.0023 -0.1273 -0.0854 -0.1441 0.1100 -0.1146 -0.1407 -0.2474 0.3389 -0.0091 -0.1732 -0.1272 -0.0144 -0.0542 -0.2204 0.0775 0.0383 0.0025 -0.0094 0.0965 -0.0349 0.0170 -0.1757 -0.1984 -0.0374 0.2955 0.1078 -0.1025 -0.1133 -0.0670 -0.0853 -0.0849 0.0215 -0.1129 0.0055 -0.0003 0.0255 -0.0878 0.0130 0.0297 -0.5079 -0.0188 -0.0178 -0.0306 -0.0696 -0.0902 -0.0927 0.1374 -0.0839 -0.1728 -0.0036 -0.0712 -0.0552 -0.0821 -0.0222 0.0479 -0.0099 0.0135 -0.0637 0.0678 0.0195 -0.0050 0.0694 0.0162 0.0194 -0.0111 -0.1467 0.0217 -0.0831 0.0541 -0.1758 0.0777 0.0445 0.0739 -0.0811 -0.0906
data 0.1112 0.0432 -0.0964 -0.0237 0.0387 0.0970 0.0167 0.0890 -0.0142 0.1210 0.1061 0.0269 0.0222 0.1563 -0.1560 0.0186 0.0724 -0.0241 -0.1104 -0.1491 -0.0936 0.0175 -0.2147 0.0636 0.0279 -0.0120 0.0281 0.3057 0.0137 -0.0995 -0.1151 0.0094 -0.1090 0.0559 0.0513 0.0196 0.1276 0.0726 -0.0841 -0.1335 0.0369 -0.0996 0.0970 -0.0427 -0.0068 0.0730 0.1000 0.1326 0.2117 0.1045 0.0732 0.0431 -0.6515 0.0151 0.1114 0.0939 0.1296 0.0335 0.0569 -0.1170 -0.0874 -0.0174 -0.1944 -0.2514 -0.0745 0.0574 -0.0019 -0.1424 0.0199 0.0636 -0.1047 -0.0818 0.0451 0.2131 -0.0436 0.0425 -0.0238 -0.0427 0.0392 0.1230 0.0168 0.1850 -0.0671 -0.2412 0.0584 0.0133 -0.0195 0.1021 0.2557 -0.0595 -0.1774 -0.0907 -0.0179 0.0462 -0.1483 0.0087 0.1157 0.0516 -0.0446 0.0938 -0.1701 0.0551 -0.0689 0.0646 -0.0237 0.0697 0.1639 -0.0932 -0.0365 0.0600 -0.1238 0.1072 0.0036 0.0938 -0.0215 -0.0871 -0.2737 -0.1373 -0.1633 -0.3257 -0.0814 -0.0620 0.0515 0.0863 0.0207 0.1987 0.0498 -0.0402 0.0915 -0.0795 -0.0542 -0.0154 0.0157 0.0176 0.1011 -0.1472 0.0863 0.1557 -0.2571 0.0120 0.0188 -0.0347 -0.1078 0.2326 -0.0959 0.0749 -0.0564 -0.0757 0.0968 -0.1285 -0.1172 -0.0952 -0.0973 -0.0785 -0.1326 0.0347 0.0506 0.0143 -0.0562 0.1371 -0.0734 -0.0134 -0.0142 0.0502 0.0046 0.1104 0.1076 0.1493 0.1432 -0.0680 -0.0721 -0.0342 -0.2574 0.0514 0.1051 -0.0183 0.3497 -0.2231 -0.0668 -0.0263 0.0940 0.0747 0.0955 0.0821 0.0995 0.0318 0.0942 -0.0174 -0.0143 0.0264 0.0099 -0.1418 -0.1924 0.0577 0.1048 0.1517 0.0067 -0.0583 0.1493 0.0767 0.0557 0.0720 -0.0110 -0.0983 0.0799 -0.0861 0.0149 -0.1145 0.0281 0.0263 -0.0308 -0.1280 0.1357 0.0265 -0.0876 -0.0531 0.0507 -0.0503 -0.0499 -0.0850 -0.0032 -0.0336 -0.2112 -0.0314 -0.0662 -0.0375 -0.0578 -0.0182 -0.1818 -0.1703 0.1611 0.0766 0.3763 0.0824 -0.0203 -0.1652 -0.0009 0.0164 -0.2516 -0.0630 -0.0603 0.0189 0.0769 0.0139 0.1244 -0.0657 -0.0925 0.0704 -0.1355 0.4045 0.1061 0.1115 -0.0761 -0.1032 0.2797 0.0013 -0.0542 0.1308 0.0919 -0.1203 0.0181 0.1123 -0.0334 -0.0332 -0.1409 -0.0700 0.0075 0.1405 -0.1815 0.0175 -0.1256 0.0120 0.1507 0.1173 0.1972 0.0233 -0.0252 0.0549 -0.0009 0.1581 0.0253 0.1056 -0.0588 0.0802 0.0166 -0.0084 -0.0418 -0.0334 -0.0061 0.0845 0.0630 -0.1798 -0.0491 -0.0105 0.0732 0.0246 0.0174 -0.0146 0.0164 -0.0575
female 0.0901 0.0282 -0.0575 -0.0899 0.0772 0.0704 0.0247 -0.0135 0.0299 0.1597 0.0767 0.0114 0.0447 0.1863 -0.2076 -0.0280 0.1065 0.0439 -0.0709 -0.1821 -0.1032 0.0306 -0.2065 0.0451 -0.0054 0.0382 0.0283 0.3110 -0.0226 -0.0823 -0.0503 -0.0084 -0.1245 0.1011 0.1611 0.0509 0.0884 0.0311 -0.1032 -0.1455 0.0646 -0.0283 0.0933 -0.0447 -0.0332 0.0562 0.0460 0.0506 0.1736 0.1000 0.1114 -0.0022 -0.6759 0.0381 0.1534 0.1290 0.1408 0.0232 0.1299 -0.1372 -0.1065 -0.0047 -0.1956 -0.2115 -0.0788 0.0468 -0.0254 -0.1852 0.0313 0.0057 -0.1366 -0.0827 -0.0151 0.2316 -0.0800 0.0341 -0.0562 -0.0438 0.0468 0.0988 -0.0119 0.1269 -0.0912 -0.2513 0.0337 0.0470 -0.0004 0.1341 0.2552 -0.0686 -0.1901 -0.1649 -0.0069 0.0163 -0.1598 0.0658 0.1829 0.0854 -0.0018 0.0933 -0.2235 0.0534 -0.0968 -0.0249 -0.0549 0.0263 0.1341 -0.1359 -0.0664 -0.0060 -0.1143 0.0661 0.0215 0.0436 -0.0096 -0.0867 -0.1526 -0.1637 -0.1411 -0.2793 -0.1574 -0.0908 0.0858 0.1029 0.0254 0.2361 0.0727 -0.0655 0.0676 -0.0717 -0.0164 0.0045 0.0073 0.0042 0.1269 -0.1143 0.0874 0.1255 -0.2168 0.0280 -0.0128 -0.0917 -0.0601 0.2201 -0.0873 0.0102 -0.0123 -0.0778 0.0968 -0.1024 -0.0972 -0.1344 -0.1468 0.0051 -0.1332 0.0686 0.0653 -0.0123 -0.0524 0.1382 -0.0262 0.0014 -0.0195 0.0364 -0.0155 0.0467 0.1129 0.1544 0.1570 -0.0482 -0.0723 -0.0317 -0.2590 0.0280 0.1047 0.0382 0.3744 -0.2217 -0.0029 0.0126 0.1157 0.0788 0.1643 0.0827 0.1140 0.0506 0.0721 -0.0122 -0.0438 0.0148 0.0443 -0.0892 -0.1826 0.0469 0.1134 0.1606 -0.0180 -0.1774 0.1163 0.0629 0.0714 0.1610 0.0075 -0.1264 0.0718 -0.0975 -0.0114 -0.2015 0.1128 0.0910 -0.0481 -0.1333 0.1114 -0.0024 -0.0400 -0.0916 0.0326 -0.0351 -0.0249 -0.1923 0.0544 0.0073 -0.1410 -0.0496 0.0089 -0.0070 -0.0817 -0.0400 -0.1368 -0.1491 0.1357 0.0376 0.3391 0.1154 0.0597 -0.1389 -0.0361 0.0317 -0.2601 -0.0243 -0.0286 0.0127 0.0919 0.0510 0.1409 -0.0921 -0.1320 -0.0117 -0.0959 0.4352 0.0964 0.0839 0.0022 -0.1654 0.2634 0.0527 -0.0157 0.1791 0.0656 -0.1044 0.0060 0.1172 -0.0220 0.0226 -0.1698 -0.0873 -0.0196 0.1079 -0.1598 -0.0314 -0.0857 -0.0093 0.1674 0.0472 0.2260 0.0501 0.0044 0.0149 -0.0619 0.1204 -0.0178 0.0558 -0.0299 0.1154 0.0183 -0.0678 -0.0109 -0.0656 0.0487 -0.0068 0.0769 -0.1641 -0.0365 0.0438 0.0535 0.0092 -0.0178 -0.0141 0.0417 -0.0335
male 0.1082 0.0445 -0.0384 0.0011 -0.0888 0.0713 -0.0696 -0.0477 0.0071 -0.0408 -0.0707 -0.0266 0.0500 -0.0824 0.0848 -0.1627 -0.0851 -0.0295 0.1534 -0.1828 -0.2208 0.0243 -0.0921 -0.1089 -0.1009 -0.0119 0.0377 0.2038 0.0720 0.0202 0.2798 0.0115 -0.0151 0.1037 0.0004 -0.0104 0.0196 0.1265 0.0828 -0.1369 0.1070 0.1270 -0.0349 -0.0683 -0.0114 0.0337 0.0126 0.0792 0.0440 -0.0253 0.0489 -0.0785 -0.6259 -0.0972 0.1654 -0.0578 -0.0437 0.0409 -0.0182 -0.1891 0.0277 -0.0146 -0.0531 0.0426 0.0049 0.0040 0.1423 -0.0975 -0.0035 0.0963 -0.0019 -0.1466 -0.1662 0.0665 -0.1500 -0.1267 0.0267 -0.1560 -0.1442 0.1515 0.0242 -0.0608 0.0918 -0.2407 -0.0411 -0.0142 0.0655 -0.0359 0.1459 0.0940 0.0159 0.0638 -0.1077 -0.0517 -0.0137 0.0512 -0.0275 -0.0507 0.0069 0.0366 -0.1529 -0.1813 0.0339 -0.0851 -0.0540 0.1180 0.1039 0.0619 -0.0235 -0.0115 0.1648 0.0936 -0.0050 -0.0979 -0.0589 -0.0721 -0.1586 0.0227 -0.0446 -0.3398 -0.0284 -0.2507 0.0451 -0.1226 0.0800 0.2365 0.0756 -0.0853 0.1157 0.0278 0.0710 -0.1314 -0.0463 0.0427 -0.0505 -0.0249 0.1182 0.0481 -0.1085 -0.0160 0.0039 -0.0386 0.1551 0.2695 0.0707 -0.0842 0.1167 0.0845 -0.0104 0.0206 0.0469 0.0057 0.0897 0.0723 0.0222 0.0727 0.0642 -0.0235 -0.0216 -0.0601 0.0537 -0.2842 -0.1047 0.1733 0.0021 -0.0105 0.1143 0.0215 0.0074 -0.0504 -0.0049 0.0119 -0.0270 0.0145 0.0967 0.0903 0.3145 0.1222 0.0985 0.2126 -0.1030 0.0793 -0.0787 -0.0593 0.0739 -0.0696 -0.0818 0.0320 -0.1808 0.0477 0.0825 -0.0127 0.1445 -0.0605 -0.0513 0.0945 -0.1030 0.0475 0.0982 0.2402 0.0086 -0.0241 -0.0332 0.0430 -0.0417 0.0199 -0.0528 -0.0630 0.0347 0.0580 -0.0260 0.1113 0.0989 -0.0038 -0.1272 -0.0979 0.0045 0.0061 -0.0398 -0.0085 -0.0035 -0.1191 -0.0949 0.0123 0.1705 -0.2065 0.0550 0.0453 0.0424 -0.0578 -0.0348 -0.0177 0.3437 -0.0659 0.0924 -0.1122 -0.1588 0.1068 -0.3029 0.0018 0.0317 0.1857 0.0360 0.0829 0.0224 0.0934 -0.0475 0.1719 0.0015 0.4849 -0.0228 -0.0902 0.0465 -0.1087 0.1374 0.0115 -0.1246 0.0509 0.1578 -0.1667 -0.0340 0.0469 0.0568 0.1599 -0.3915 0.0356 0.0287 -0.2275 -0.1378 -0.0265 -0.1115 0.1804 0.0796 -0.0987 0.0905 0.3556 0.0240 0.0246 0.0283 0.0609 -0.0227 -0.0469 -0.0535 0.0440 0.1021 -0.1398 0.0537 -0.2549 0.0827 -0.1011 0.0047 -0.0712 0.1442 -0.0700 0.0123 0.0344 -0.0570 0.0158 0.0544 0.0256
king 0.0359 -0.0037 -0.1948 -0.0735 0.0015 -0.0710 -0.1257 0.1125 0.0897 0.0660 -0.0509 -0.3716 -0.2034 0.0939 -0.0720 0.0692 0.0237 -0.0361 -0.1972 0.0281 -0.2232 -0.0528 0.0424 0.2265 0.0166 0.0724 0.0802 -0.1738 -0.2274 -0.0525 -0.0611 0.0911 0.0761 0.0275 -0.0419 -0.0833 -0.1577 0.0221 -0.2224 -0.0523 -0.2824 -0.0302 0.0810 0.0970 -0.0961 -0.1698 -0.2428 -0.0725 0.0336 0.0494 0.0122 0.0043 -0.6289 -0.0247 0.2441 -0.1057 0.0227 0.0696 -0.0730 0.1186 -0.0014 0.1676 -0.0712 -0.1210 -0.0576 -0.1379 0.0085 -0.0647 -0.0216 -0.0170 0.1161 0.1908 -0.1452 -0.1284 0.0064 -0.0794 0.0100 -0.0017 -0.0728 0.0576 0.0673 -0.0141 0.1418 -0.1191 -0.0616 0.1597 -0.0236 -0.0193 0.2859 -0.0393 -0.0775 -0.1515 0.1166 -0.0337 0.1775 0.0909 -0.0348 0.0972 -0.1195 -0.1176 -0.2482 -0.0476 0.1101 0.0989 -0.0504 -0.1058 -0.1635 0.0742 0.0122 -0.0097 0.0164 0.1000 0.0046 0.1465 -0.0008 0.0060 0.0603 -0.0395 0.0370 -0.3175 0.1229 0.0047 -0.0874 0.1101 -0.0360 0.2256 0.0072 0.0273 -0.0108 0.0016 0.2252 -0.0218 0.0150 0.0881 -0.0248 0.0442 0.0412 0.0336 0.1097 0.1512 -0.0405 -0.0211 -0.0910 0.1968 0.0960 -0.0284 -0.1363 0.0892 0.1723 -0.1090 -0.0171 0.0229 0.0837 0.0619 -0.1038 0.1447 -0.0964 -0.0948 0.1135 0.0547 0.0676 -0.2286 -0.1102 0.0928 0.0469 -0.0479 0.1023 0.0307 0.0083 0.0118 -0.0373 -0.2943 -0.1947 0.0108 -0.0104 0.0165 0.2860 -0.4094 -0.0901 0.1753 -0.1135 0.1401 -0.0880 0.2030 0.1193 0.0509 -0.0009 -0.0247 -0.1799 -0.0147 0.0025 -0.1617 0.0566 -0.1111 -0.0640 0.0415 -0.0355 0.0813 0.1395 -0.2517 0.0849 -0.0003 0.0145 -0.1677 0.1349 0.0186 -0.0199 -0.0091 0.2043 -0.0226 0.0067 -0.0751 0.1411 0.1148 -0.0695 -0.0866 -0.0570 0.0138 -0.0470 -0.0745 -0.0476 0.0541 -0.0190 -0.0432 0.0276 -0.1622 0.1494 0.0817 0.0472 -0.0812 -0.0461 -0.0515 0.4082 -0.1834 0.0817 -0.1393 -0.0114 -0.0383 -0.2477 0.1634 -0.0488 -0.0357 -0.0792 -0.0503 0.1223 -0.0422 -0.2443 0.1944 -0.2514 0.4021 0.2150 -0.0239 -0.1626 0.0432 -0.0078 0.0095 0.0281 -0.0244 0.1063 0.3063 -0.2097 0.0720 0.0637 0.1183 -0.4195 0.1138 -0.1958 -0.0277 -0.1635 0.0441 -0.2258 0.1115 -0.1499 -0.1678 0.0537 -0.0040 -0.0002 -0.0643 -0.1321 -0.1337 -0.0534 0.0745 0.0445 0.0171 -0.1798 0.0116 -0.0003 0.0356 0.1850 0.1647 0.1551 -0.0332 0.0786 -0.0915 -0.1599 0.0225 0.1578 0.0055 -0.0530 -0.1763
programming -0.2158 0.1095 -0.0499 0.0528 -0.0691 0.1357 -0.2257 -0.0401 -0.1270 0.0628 -0.0031 -0.0278 0.0962 -0.0509 0.1659 -0.1456 0.0043 -0.0858 0.0675 -0.1441 -0.1971 0.0238 -0.1019 0.0023 -0.1479 -0.0579 -0.0348 0.1964 0.1310 0.0026 0.1745 0.1163 -0.0067 0.0843 0.0498 -0.0916 -0.0876 0.0906 0.0348 -0.0006 0.1479 -0.0370 -0.0490 -0.1296 0.0063 0.1218 -0.0154 0.0408 -0.0499 0.0074 -0.0628 -0.1445 -0.6658 0.0405 0.1376 0.0919 0.0064 0.1542 0.0345 -0.1420 -0.0065 -0.0346 -0.1175 0.0170 0.0975 0.0143 0.1287 -0.1075 -0.0065 0.0312 -0.0693 -0.1160 -0.0655 -0.0168 -0.0913 -0.0935 -0.0625 -0.1310 -0.1675 0.1654 -0.0291 0.1045 0.1013 -0.2298 -0.0114 -0.0483 -0.0833 -0.0197 0.2074 0.0536 -0.0780 0.1643 -0.1019 -0.0931 -0.1721 0.1074 -0.1172 -0.1924 0.0593 0.2065 -0.1203 -0.0467 0.1084 0.0567 -0.0726 0.1413 0.0250 0.1973 -0.0504 -0.1155 0.1588 0.1433 -0.0268 0.0863 -0.0997 -0.0466 -0.3265 -0.0673 -0.2185 -0.3463 -0.0872 -0.2026 0.0909 -0.0537 0.0585 0.1235 0.0444 -0.0480 0.0677 -0.0741 0.0913 0.0058 -0.0550 -0.0142 0.0055 -0.0351 0.1426 -0.0439 -0.1415 -0.0103 -0.0261 -0.0491 0.1112 0.2555 -0.0204 0.0381 0.1636 0.0400 -0.0657 0.0045 -0.0749 -0.1928 -0.0147 -0.1681 0.0318 0.1770 0.1891 0.1022 -0.1247 0.1407 0.0687 -0.3527 -0.1691 0.1944 0.0327 0.0830 0.0782 0.0804 -0.0624 -0.0398 -0.0075 -0.0820 -0.0755 0.0504 0.1733 -0.0063 0.2813 0.0388 -0.0612 0.0538 -0.1038 0.0091 -0.1261 0.0584 -0.0394 -0.0677 0.0403 -0.0526 -0.1908 0.0883 -0.0173 -0.0609 -0.0514 0.0405 0.0013 0.0893 -0.0247 -0.0738 0.1093 0.2395 0.0624 -0.0682 -0.2574 0.0557 0.0258 0.1199 -0.0422 -0.0120 -0.1217 -0.0582 0.0242 0.0149 0.1039 0.0624 -0.1623 -0.0538 0.0108 -0.1172 0.0243 -0.0471 -0.0398 -0.1916 -0.1612 -0.0712 0.0630 -0.1812 0.0100 -0.0720 0.0633 -0.0304 0.0055 0.0877 0.3299 -0.1671 -0.0814 -0.1093 -0.0552 0.1108 -0.2203 -0.1218 -0.0576 0.1252 -0.0136 0.1349 0.1234 0.0827 -0.1832 0.1550 -0.1590 0.3917 0.0217 0.0120 0.0074 -0.3095 0.0760 0.0258 -0.0027 -0.1155 0.2152 -0.0023 -0.0116 0.0667 -0.0752 0.0392 -0.3450 -0.0493 0.0098 -0.2498 -0.1739 -0.0746 -0.1962 0.2262 0.0944 0.0789 0.0607 0.3018 -0.0569 0.0931 0.0977 0.2114 0.0645 0.0111 -0.1061 -0.0148 0.1037 0.0244 0.0004 -0.1368 0.1000 -0.0398 0.0114 -0.1902 0.1368 -0.1466 0.1036 0.0302 -0.0502 0.0857 0.1020 0.0424
queen 0.0810 -0.2253 -0.1163 0.0197 0.0209 -0.0413 0.0436 0.0997 0.0560 -0.0246 0.1781 0.0432 0.1534 0.0806 -0.1477 0.1035 -0.0728 -0.0337 -0.0048 -0.2221 -0.4131 -0.0517 0.0399 0.2196 0.1548 -0.0436 -0.1628 0.0860 -0.0458 -0.1028 0.1007 -0.1345 -0.0455 -0.0651 0.0892 -0.0497 -0.0539 0.0041 0.2464 -0.0186 0.0625 -0.0287 -0.0048 -0.0507 -0.1289 -0.0004 -0.1949 0.1101 -0.0428 -0.1882 -0.1063 -0.0069 -0.7432 0.1824 0.0831 -0.0915 0.2549 0.0871 0.0861 -0.1181 -0.1358 -0.2465 0.0306 0.0810 0.1669 -0.0125 -0.2777 0.0038 0.0284 -0.1185 0.0384 -0.0335 0.0813 0.1343 0.1911 -0.0242 0.0940 0.0415 -0.0338 -0.0040 -0.0477 0.1179 0.0002 -0.1873 0.0810 -0.0236 0.0772 0.1164 0.2948 0.0084 0.1103 -0.1192 0.0906 -0.0972 0.0424 -0.0643 0.0060 -0.0268 -0.0688 -0.0004 -0.1834 -0.0512 -0.1533 0.0157 0.1139 0.0682 -0.0225 -0.0259 -0.0700 0.0394 -0.0560 0.0497 0.0945 0.0290 -0.0281 0.0125 0.0297 0.2400 -0.3715 -0.3374 0.0483 0.2656 -0.0917 0.0514 0.0615 0.1499 -0.2017 0.0575 -0.0497 -0.0489 0.1868 0.0960 0.1130 -0.1195 0.1372 0.0680 -0.0784 0.1335 -0.0209 0.0526 0.3480 -0.0454 -0.1771 0.1445 -0.2196 -0.1141 -0.1794 0.1143 -0.0518 -0.0095 -0.0783 0.1670 -0.0285 -0.0468 -0.0133 0.0330 0.1604 -0.1346 -0.1166 -0.2396 -0.0347 -0.0785 -0.0197 -0.1238 0.0507 -0.0390 0.0651 -0.0263 -0.0499 -0.0177 -0.0496 -0.1209 -0.0323 0.0742 0.0551 -0.1079 0.3316 -0.1530 0.0887 0.0178 0.0916 -0.0309 0.0242 0.0728 0.2129 -0.2269 -0.0139 -0.0125 -0.2455 -0.0181 0.1871 0.0278 0.1579 -0.0276 -0.0301 0.0962 -0.0010 0.1031 0.1132 -0.0832 -0.1566 -0.0071 -0.0597 0.1162 0.1935 -0.0980 -0.0227 -0.1021 0.0344 0.3255 0.0319 0.1030 -0.3150 0.0325 -0.1964 0.0540 -0.1663 0.2237 -0.0751 0.0325 0.0018 -0.0459 0.1056 -0.0252 0.1322 -0.2756 -0.1064 -0.0446 0.0724 -0.0672 0.0186 -0.1673 0.3247 -0.0553 -0.1012 -0.2524 -0.0016 -0.0228 -0.2791 0.0961 0.0723 0.0424 -0.2130 -0.0236 -0.0079 0.0142 0.1190 0.1112 0.0288 0.4431 0.0214 -0.0154 0.1291 -0.0605 -0.0977 0.1482 0.0686 0.0504 -0.1769 -0.0990 -0.1044 0.1028 0.0262 -0.0775 -0.3471 -0.0060 -0.2241 -0.2071 -0.1881 -0.0508 0.1130 0.0528 -0.0422 0.0484 -0.0652 -0.0572 0.0201 -0.0875 0.2139 0.0088 -0.0389 -0.0652 0.0219 0.2281 -0.1969 -0.3254 -0.0447 -0.0524 -0.1444 -0.0207 -0.0238 -0.1919 -0.0826 0.2532 0.0881 0.0633 0.1004 -0.0157 0.1310 0.0288
bike -0.0960 -0.0068 -0.1629 0.0633 -0.0414 -0.0167 0.0154 0.0099 0.0120 0.0661 -0.0367 0.0500 0.1442 0.0191 -0.0196 0.0139 -0.0736 0.0612 -0.0673 -0.1114 -0.1433 -0.0034 0.0328 0.0833 0.1251 -0.0534 -0.0870 0.1138 0.0554 -0.0971 0.1502 -0.0907 -0.0410 -0.0994 0.0717 0.0598 -0.0233 -0.0516 0.0408 -0.0025 0.0282 0.1276 -0.0859 -0.0761 -0.1238 -0.0334 -0.0315 -0.0143 0.0136 -0.2172 0.0122 -0.1390 -0.6917 0.1568 -0.0568 -0.0336 0.0678 -0.0622 0.2034 -0.0944 0.0104 -0.1943 0.0214 0.0973 0.1279 -0.0777 -0.1309 0.0037 -0.0008 -0.0078 0.0057 -0.0568 0.1361 0.1177 0.0847 -0.0289 -0.0834 0.0539 -0.1107 -0.0545 0.0025 -0.0016 -0.0216 0.0703 0.0137 -0.0248 0.0444 -0.0028 0.0517 0.0275 0.1102 0.0400 0.0932 -0.1538 0.0692 0.0754 0.0166 -0.0332 -0.0936 -0.0163 0.0567 0.0059 -0.0888 0.0439 0.0437 0.0720 0.0645 0.0515 -0.1239 0.2025 -0.1326 -0.0259 0.1308 -0.0113 -0.0688 0.1003 -0.0370 0.0026 -0.1622 -0.1001 -0.0220 0.0863 -0.0911 0.0133 0.0964 -0.1038 -0.0284 0.0088 -0.1272 0.1947 0.0655 -0.0439 0.0068 0.0466 0.0142 -0.1387 0.1109 0.0647 -0.0795 -0.0815 0.1109 -0.0367 -0.1254 0.0304 -0.1510 -0.1378 -0.0365 0.0149 -0.0663 -0.0202 -0.1449 0.0522 -0.0705 -0.0234 0.0332 0.0225 0.1000 -0.0819 -0.0431 -0.0915 0.0523 -0.0328 0.0582 0.0227 0.0415 0.0641 -0.0893 0.0448 -0.0400 -0.0677 0.0457 -0.0323 -0.0118 0.0538 0.0072 -0.0167 0.1805 0.1293 -0.0197 -0.0329 0.0324 0.0194 0.0161 0.0190 0.0117 -0.1870 -0.0837 -0.0618 -0.0070 -0.0110 0.0219 0.0036 0.0721 0.0044 -0.0260 -0.0460 -0.0185 -0.0310 0.0052 0.0390 -0.0651 -0.0674 -0.0174 0.0614 0.1240 0.0054 -0.0011 -0.0663 0.0017 0.0947 0.0638 0.0499 -0.1399 0.0470 -0.0648 0.1360 -0.1301 0.0415 -0.0768 0.0109 -0.0785 -0.0836 -0.0606 -0.0745 0.0183 -0.0634 -0.1074 0.0154 0.0341 0.0614 0.1219 0.1184 0.0145 0.0318 -0.0524 -0.0348 -0.0432 -0.0153 0.0004 0.0668 -0.0297 0.0413 -0.0029 -0.1210 -0.1011 0.0358 0.0104 0.0711 -0.0957 0.0431 0.0193 -0.0484 0.2732 -0.1691 -0.1071 0.0044 -0.0498 0.0291 -0.1714 -0.0300 0.0173 0.0760 0.1074 -0.0268 -0.2761 0.0533 -0.1987 -0.1531 -0.2648 0.0934 0.0870 0.0958 -0.0787 -0.0176 -0.0016 0.0347 -0.0305 -0.0447 0.0963 -0.0206 -0.0678 0.0044 0.0549 0.0898 -0.0791 -0.1175 -0.0384 -0.0122 -0.0813 0.0500 0.0032 -0.0892 -0.0644 0.0498 -0.0351 0.0632 -0.0163 0.0505 0.1975 0.0540 bycicle -
18.3 Reading the data
def embeddings
(-> examples-path
(slurp
#"\n")
(str/split ->> (map (fn [line]
(let [[token & weights] (str/split line #" ")]
(keyword token)
[(double-array (map #(Double/parseDouble %) weights))])))
(into {})))) (
-> embeddings
(
tc/dataset tc/info)
_unnamed: descriptive-stats [8 11]:
:col-name | :datatype | :n-valid | :n-missing | :min | :mean | :max | :standard-deviation | :skew | :first | :last |
---|---|---|---|---|---|---|---|---|---|---|
:data | :float64 | 300 | 0 | -0.6185 | -0.01805033 | 0.3389 | 0.10136687 | -0.87434210 | 0.0126 | -0.0906 |
:female | :float64 | 300 | 0 | -0.6515 | 0.00105400 | 0.4045 | 0.11530855 | -0.43782181 | 0.1112 | -0.0575 |
:male | :float64 | 300 | 0 | -0.6759 | 0.00077600 | 0.4352 | 0.11742106 | -0.40358429 | 0.0901 | -0.0335 |
:king | :float64 | 300 | 0 | -0.6259 | 0.00106200 | 0.4849 | 0.11550981 | -0.36335334 | 0.1082 | 0.0256 |
:programming | :float64 | 300 | 0 | -0.6289 | -0.01020667 | 0.4082 | 0.12922631 | -0.42322574 | -0.0359 | -0.1763 |
:queen | :float64 | 300 | 0 | -0.6658 | -0.00614200 | 0.3917 | 0.12458089 | -0.48650175 | 0.2158 | 0.0424 |
:bike | :float64 | 300 | 0 | -0.7432 | -0.00782567 | 0.4431 | 0.13633039 | -0.50522079 | -0.0810 | 0.0288 |
:bycicle | :float64 | 300 | 0 | -0.6917 | -0.00828233 | 0.2732 | 0.09174574 | -1.38555651 | -0.0960 | 0.0540 |
18.4 Exploring distances
-> (for [[token1 vec1] embeddings
(
[token2 vec2] embeddings:when (pos? (compare token1 token2))]
:token1 token1
{:token2 token2
:distance (vec/dist vec1 vec2)})
tc/dataset= (:token1 %)
(tc/drop-rows #(:token2 %)))
(:distance])
(tc/order-by [:use-datatables true})) (kind/table {
token1 | token2 | distance |
---|---|---|
male | female | 0.6693099730319279 |
queen | king | 1.4353056608262922 |
bycicle | bike | 1.8556491263167187 |
queen | female | 2.028264395979972 |
male | king | 2.107641141181297 |
queen | male | 2.1246356958311696 |
programming | data | 2.1484709004312808 |
king | female | 2.149412631394911 |
king | data | 2.167340349368321 |
king | bycicle | 2.194395171795637 |
data | bycicle | 2.20650210061083 |
female | data | 2.223698538471436 |
male | data | 2.23728921465241 |
female | bycicle | 2.283274904605224 |
male | bycicle | 2.299659507405388 |
queen | bycicle | 2.3271938445260627 |
queen | data | 2.3682638260970847 |
data | bike | 2.395225242017961 |
programming | female | 2.425492795289237 |
programming | male | 2.4395068354075167 |
programming | king | 2.468167603709278 |
king | bike | 2.527731047797609 |
male | bike | 2.528236628561496 |
female | bike | 2.5357556447733676 |
programming | bycicle | 2.600162393005483 |
queen | programming | 2.625867106309837 |
programming | bike | 2.689235856149474 |
queen | bike | 2.744270352935367 |
18.5 Exploring relationships
female-male is relatively close to queen-king. This way, the structure of the vector space reflects relationships between words.
let [{:keys [queen king female male]} embeddings]
(
(vec/dist
(vec/sub female male) (vec/sub queen king)))
1.3892567509283515
compared to:
let [{:keys [queen king data programming]} embeddings]
(
(vec/dist
(vec/sub data programming) (vec/sub queen king)))
2.604613631616022
A different way to pharse it is: βqueen minus female plus male is close to kingβ.
let [{:keys [queen king female male]} embeddings]
(-> queen
(
(vec/sub female)
(vec/add male) (vec/dist king)))
1.3892567509283515