18  Word embeddings with fastmath.vector - DRAFT πŸ› 

authors: Nedeljko Radovanovic, Epidiah Ravachol, Daniel Slutsky

One of the uses of linear algebra is the embedding of texts (e.g., tokens, words, sentences, or chunks of texts) into high-dimentional vector spaces.

Even in relatively simple embedding methods like Word2Vec, the linear structure of vector space operations (e.g., addition, subtraction) can be meaningful in a way that relates to the meaning of texts.

18.1 Setup

(ns noj-book.fastmath-vector-word-embeddings
  (:require [clojure.string :as str]
            [tablecloth.api :as tc]
            [fastmath.vector :as vec]
            [scicloj.tableplot.v1.plotly :as plotly]
            [scicloj.kindly.v4.kind :as kind]))

18.2 Data file

We will look into a few example vectors generated by Word2Vec. The original file wiki-news-300d-1M.vec.zip was downloaded from the fasttext website. In the unix shell, we can process it as follows to generate examples.vec:

zcat wiki-news-300d-1M.vec.zip | awk '$1=="female" || $1=="male" || $1=="queen" || $1=="king" || $1=="programming" || $1=="data" || $1=="bike" || $1=="bycicle"' > examples.vec
(def examples-path "data/word2vec/examples.vec")
(kind/code 
  (slurp examples-path))
data 0.0126 -0.0550 -0.0673 0.0531 0.0174 -0.0873 -0.0915 -0.1084 0.0638 -0.1266 0.1288 -0.1521 0.0414 0.0487 -0.1098 0.0475 0.0426 -0.1246 -0.1432 -0.0071 -0.0960 -0.0917 0.0393 0.0950 0.0114 -0.1023 -0.0695 -0.0641 -0.0265 0.0686 -0.1383 0.0106 0.1454 -0.0027 -0.0860 0.0948 0.1722 -0.0036 0.0301 -0.0527 0.0131 -0.1335 0.0675 0.1889 0.1442 -0.0038 0.0272 0.0707 0.0344 0.1593 0.0693 0.1345 -0.6185 -0.0433 0.0814 0.0601 -0.0173 -0.0063 -0.1585 0.0347 0.1010 0.1105 -0.0402 0.0807 -0.1082 -0.1269 0.0427 0.0505 -0.0171 0.0146 -0.2346 0.1206 -0.2111 0.0038 0.0396 -0.2016 0.1229 -0.0082 0.0068 -0.0157 0.0131 -0.1301 -0.0496 -0.2086 -0.0867 -0.0431 -0.0190 0.0439 0.2003 -0.0152 -0.0604 -0.0464 -0.0262 -0.0866 0.0103 -0.0961 0.0389 -0.2485 0.0551 0.0289 -0.1506 0.0000 -0.0063 0.0720 0.1091 0.0260 -0.1382 -0.0238 0.0308 0.0540 -0.0454 0.0231 0.0251 -0.1297 0.0104 -0.0128 -0.0184 -0.0386 -0.0788 -0.2565 0.0681 -0.0123 -0.1096 0.0779 -0.0328 0.1767 -0.1053 -0.0072 0.0207 -0.0320 -0.0518 -0.0530 0.1460 -0.0242 -0.0182 -0.0444 0.0988 0.0273 0.1687 -0.0212 -0.0752 -0.2203 -0.1497 0.1832 -0.0089 -0.0028 0.0450 0.0967 -0.0494 -0.0620 -0.0833 0.0100 -0.0099 0.0496 -0.0173 0.0270 -0.0317 -0.0298 0.0178 0.0116 -0.1266 0.0893 -0.0056 -0.0707 0.0680 0.0169 0.0916 0.0162 -0.0708 0.0578 -0.0099 0.0229 -0.0441 0.0020 0.0162 -0.0481 0.2272 -0.1124 0.0480 -0.0027 -0.0517 -0.0157 0.0676 -0.0845 -0.0502 -0.0627 0.0091 0.0294 -0.1341 0.0313 -0.0102 0.0505 0.0626 -0.0876 -0.0579 -0.1177 -0.1129 0.1468 0.0726 -0.0222 0.1514 0.0637 -0.0264 0.0147 -0.0741 -0.0696 -0.0485 -0.1046 0.0585 -0.0353 0.0113 0.0756 -0.0150 -0.1605 0.0799 -0.0216 -0.1031 0.0089 0.0612 -0.1007 -0.0250 -0.0576 -0.0061 -0.0604 0.0023 -0.1273 -0.0854 -0.1441 0.1100 -0.1146 -0.1407 -0.2474 0.3389 -0.0091 -0.1732 -0.1272 -0.0144 -0.0542 -0.2204 0.0775 0.0383 0.0025 -0.0094 0.0965 -0.0349 0.0170 -0.1757 -0.1984 -0.0374 0.2955 0.1078 -0.1025 -0.1133 -0.0670 -0.0853 -0.0849 0.0215 -0.1129 0.0055 -0.0003 0.0255 -0.0878 0.0130 0.0297 -0.5079 -0.0188 -0.0178 -0.0306 -0.0696 -0.0902 -0.0927 0.1374 -0.0839 -0.1728 -0.0036 -0.0712 -0.0552 -0.0821 -0.0222 0.0479 -0.0099 0.0135 -0.0637 0.0678 0.0195 -0.0050 0.0694 0.0162 0.0194 -0.0111 -0.1467 0.0217 -0.0831 0.0541 -0.1758 0.0777 0.0445 0.0739 -0.0811 -0.0906
female 0.1112 0.0432 -0.0964 -0.0237 0.0387 0.0970 0.0167 0.0890 -0.0142 0.1210 0.1061 0.0269 0.0222 0.1563 -0.1560 0.0186 0.0724 -0.0241 -0.1104 -0.1491 -0.0936 0.0175 -0.2147 0.0636 0.0279 -0.0120 0.0281 0.3057 0.0137 -0.0995 -0.1151 0.0094 -0.1090 0.0559 0.0513 0.0196 0.1276 0.0726 -0.0841 -0.1335 0.0369 -0.0996 0.0970 -0.0427 -0.0068 0.0730 0.1000 0.1326 0.2117 0.1045 0.0732 0.0431 -0.6515 0.0151 0.1114 0.0939 0.1296 0.0335 0.0569 -0.1170 -0.0874 -0.0174 -0.1944 -0.2514 -0.0745 0.0574 -0.0019 -0.1424 0.0199 0.0636 -0.1047 -0.0818 0.0451 0.2131 -0.0436 0.0425 -0.0238 -0.0427 0.0392 0.1230 0.0168 0.1850 -0.0671 -0.2412 0.0584 0.0133 -0.0195 0.1021 0.2557 -0.0595 -0.1774 -0.0907 -0.0179 0.0462 -0.1483 0.0087 0.1157 0.0516 -0.0446 0.0938 -0.1701 0.0551 -0.0689 0.0646 -0.0237 0.0697 0.1639 -0.0932 -0.0365 0.0600 -0.1238 0.1072 0.0036 0.0938 -0.0215 -0.0871 -0.2737 -0.1373 -0.1633 -0.3257 -0.0814 -0.0620 0.0515 0.0863 0.0207 0.1987 0.0498 -0.0402 0.0915 -0.0795 -0.0542 -0.0154 0.0157 0.0176 0.1011 -0.1472 0.0863 0.1557 -0.2571 0.0120 0.0188 -0.0347 -0.1078 0.2326 -0.0959 0.0749 -0.0564 -0.0757 0.0968 -0.1285 -0.1172 -0.0952 -0.0973 -0.0785 -0.1326 0.0347 0.0506 0.0143 -0.0562 0.1371 -0.0734 -0.0134 -0.0142 0.0502 0.0046 0.1104 0.1076 0.1493 0.1432 -0.0680 -0.0721 -0.0342 -0.2574 0.0514 0.1051 -0.0183 0.3497 -0.2231 -0.0668 -0.0263 0.0940 0.0747 0.0955 0.0821 0.0995 0.0318 0.0942 -0.0174 -0.0143 0.0264 0.0099 -0.1418 -0.1924 0.0577 0.1048 0.1517 0.0067 -0.0583 0.1493 0.0767 0.0557 0.0720 -0.0110 -0.0983 0.0799 -0.0861 0.0149 -0.1145 0.0281 0.0263 -0.0308 -0.1280 0.1357 0.0265 -0.0876 -0.0531 0.0507 -0.0503 -0.0499 -0.0850 -0.0032 -0.0336 -0.2112 -0.0314 -0.0662 -0.0375 -0.0578 -0.0182 -0.1818 -0.1703 0.1611 0.0766 0.3763 0.0824 -0.0203 -0.1652 -0.0009 0.0164 -0.2516 -0.0630 -0.0603 0.0189 0.0769 0.0139 0.1244 -0.0657 -0.0925 0.0704 -0.1355 0.4045 0.1061 0.1115 -0.0761 -0.1032 0.2797 0.0013 -0.0542 0.1308 0.0919 -0.1203 0.0181 0.1123 -0.0334 -0.0332 -0.1409 -0.0700 0.0075 0.1405 -0.1815 0.0175 -0.1256 0.0120 0.1507 0.1173 0.1972 0.0233 -0.0252 0.0549 -0.0009 0.1581 0.0253 0.1056 -0.0588 0.0802 0.0166 -0.0084 -0.0418 -0.0334 -0.0061 0.0845 0.0630 -0.1798 -0.0491 -0.0105 0.0732 0.0246 0.0174 -0.0146 0.0164 -0.0575
male 0.0901 0.0282 -0.0575 -0.0899 0.0772 0.0704 0.0247 -0.0135 0.0299 0.1597 0.0767 0.0114 0.0447 0.1863 -0.2076 -0.0280 0.1065 0.0439 -0.0709 -0.1821 -0.1032 0.0306 -0.2065 0.0451 -0.0054 0.0382 0.0283 0.3110 -0.0226 -0.0823 -0.0503 -0.0084 -0.1245 0.1011 0.1611 0.0509 0.0884 0.0311 -0.1032 -0.1455 0.0646 -0.0283 0.0933 -0.0447 -0.0332 0.0562 0.0460 0.0506 0.1736 0.1000 0.1114 -0.0022 -0.6759 0.0381 0.1534 0.1290 0.1408 0.0232 0.1299 -0.1372 -0.1065 -0.0047 -0.1956 -0.2115 -0.0788 0.0468 -0.0254 -0.1852 0.0313 0.0057 -0.1366 -0.0827 -0.0151 0.2316 -0.0800 0.0341 -0.0562 -0.0438 0.0468 0.0988 -0.0119 0.1269 -0.0912 -0.2513 0.0337 0.0470 -0.0004 0.1341 0.2552 -0.0686 -0.1901 -0.1649 -0.0069 0.0163 -0.1598 0.0658 0.1829 0.0854 -0.0018 0.0933 -0.2235 0.0534 -0.0968 -0.0249 -0.0549 0.0263 0.1341 -0.1359 -0.0664 -0.0060 -0.1143 0.0661 0.0215 0.0436 -0.0096 -0.0867 -0.1526 -0.1637 -0.1411 -0.2793 -0.1574 -0.0908 0.0858 0.1029 0.0254 0.2361 0.0727 -0.0655 0.0676 -0.0717 -0.0164 0.0045 0.0073 0.0042 0.1269 -0.1143 0.0874 0.1255 -0.2168 0.0280 -0.0128 -0.0917 -0.0601 0.2201 -0.0873 0.0102 -0.0123 -0.0778 0.0968 -0.1024 -0.0972 -0.1344 -0.1468 0.0051 -0.1332 0.0686 0.0653 -0.0123 -0.0524 0.1382 -0.0262 0.0014 -0.0195 0.0364 -0.0155 0.0467 0.1129 0.1544 0.1570 -0.0482 -0.0723 -0.0317 -0.2590 0.0280 0.1047 0.0382 0.3744 -0.2217 -0.0029 0.0126 0.1157 0.0788 0.1643 0.0827 0.1140 0.0506 0.0721 -0.0122 -0.0438 0.0148 0.0443 -0.0892 -0.1826 0.0469 0.1134 0.1606 -0.0180 -0.1774 0.1163 0.0629 0.0714 0.1610 0.0075 -0.1264 0.0718 -0.0975 -0.0114 -0.2015 0.1128 0.0910 -0.0481 -0.1333 0.1114 -0.0024 -0.0400 -0.0916 0.0326 -0.0351 -0.0249 -0.1923 0.0544 0.0073 -0.1410 -0.0496 0.0089 -0.0070 -0.0817 -0.0400 -0.1368 -0.1491 0.1357 0.0376 0.3391 0.1154 0.0597 -0.1389 -0.0361 0.0317 -0.2601 -0.0243 -0.0286 0.0127 0.0919 0.0510 0.1409 -0.0921 -0.1320 -0.0117 -0.0959 0.4352 0.0964 0.0839 0.0022 -0.1654 0.2634 0.0527 -0.0157 0.1791 0.0656 -0.1044 0.0060 0.1172 -0.0220 0.0226 -0.1698 -0.0873 -0.0196 0.1079 -0.1598 -0.0314 -0.0857 -0.0093 0.1674 0.0472 0.2260 0.0501 0.0044 0.0149 -0.0619 0.1204 -0.0178 0.0558 -0.0299 0.1154 0.0183 -0.0678 -0.0109 -0.0656 0.0487 -0.0068 0.0769 -0.1641 -0.0365 0.0438 0.0535 0.0092 -0.0178 -0.0141 0.0417 -0.0335
king 0.1082 0.0445 -0.0384 0.0011 -0.0888 0.0713 -0.0696 -0.0477 0.0071 -0.0408 -0.0707 -0.0266 0.0500 -0.0824 0.0848 -0.1627 -0.0851 -0.0295 0.1534 -0.1828 -0.2208 0.0243 -0.0921 -0.1089 -0.1009 -0.0119 0.0377 0.2038 0.0720 0.0202 0.2798 0.0115 -0.0151 0.1037 0.0004 -0.0104 0.0196 0.1265 0.0828 -0.1369 0.1070 0.1270 -0.0349 -0.0683 -0.0114 0.0337 0.0126 0.0792 0.0440 -0.0253 0.0489 -0.0785 -0.6259 -0.0972 0.1654 -0.0578 -0.0437 0.0409 -0.0182 -0.1891 0.0277 -0.0146 -0.0531 0.0426 0.0049 0.0040 0.1423 -0.0975 -0.0035 0.0963 -0.0019 -0.1466 -0.1662 0.0665 -0.1500 -0.1267 0.0267 -0.1560 -0.1442 0.1515 0.0242 -0.0608 0.0918 -0.2407 -0.0411 -0.0142 0.0655 -0.0359 0.1459 0.0940 0.0159 0.0638 -0.1077 -0.0517 -0.0137 0.0512 -0.0275 -0.0507 0.0069 0.0366 -0.1529 -0.1813 0.0339 -0.0851 -0.0540 0.1180 0.1039 0.0619 -0.0235 -0.0115 0.1648 0.0936 -0.0050 -0.0979 -0.0589 -0.0721 -0.1586 0.0227 -0.0446 -0.3398 -0.0284 -0.2507 0.0451 -0.1226 0.0800 0.2365 0.0756 -0.0853 0.1157 0.0278 0.0710 -0.1314 -0.0463 0.0427 -0.0505 -0.0249 0.1182 0.0481 -0.1085 -0.0160 0.0039 -0.0386 0.1551 0.2695 0.0707 -0.0842 0.1167 0.0845 -0.0104 0.0206 0.0469 0.0057 0.0897 0.0723 0.0222 0.0727 0.0642 -0.0235 -0.0216 -0.0601 0.0537 -0.2842 -0.1047 0.1733 0.0021 -0.0105 0.1143 0.0215 0.0074 -0.0504 -0.0049 0.0119 -0.0270 0.0145 0.0967 0.0903 0.3145 0.1222 0.0985 0.2126 -0.1030 0.0793 -0.0787 -0.0593 0.0739 -0.0696 -0.0818 0.0320 -0.1808 0.0477 0.0825 -0.0127 0.1445 -0.0605 -0.0513 0.0945 -0.1030 0.0475 0.0982 0.2402 0.0086 -0.0241 -0.0332 0.0430 -0.0417 0.0199 -0.0528 -0.0630 0.0347 0.0580 -0.0260 0.1113 0.0989 -0.0038 -0.1272 -0.0979 0.0045 0.0061 -0.0398 -0.0085 -0.0035 -0.1191 -0.0949 0.0123 0.1705 -0.2065 0.0550 0.0453 0.0424 -0.0578 -0.0348 -0.0177 0.3437 -0.0659 0.0924 -0.1122 -0.1588 0.1068 -0.3029 0.0018 0.0317 0.1857 0.0360 0.0829 0.0224 0.0934 -0.0475 0.1719 0.0015 0.4849 -0.0228 -0.0902 0.0465 -0.1087 0.1374 0.0115 -0.1246 0.0509 0.1578 -0.1667 -0.0340 0.0469 0.0568 0.1599 -0.3915 0.0356 0.0287 -0.2275 -0.1378 -0.0265 -0.1115 0.1804 0.0796 -0.0987 0.0905 0.3556 0.0240 0.0246 0.0283 0.0609 -0.0227 -0.0469 -0.0535 0.0440 0.1021 -0.1398 0.0537 -0.2549 0.0827 -0.1011 0.0047 -0.0712 0.1442 -0.0700 0.0123 0.0344 -0.0570 0.0158 0.0544 0.0256
programming -0.0359 -0.0037 -0.1948 -0.0735 0.0015 -0.0710 -0.1257 0.1125 0.0897 0.0660 -0.0509 -0.3716 -0.2034 0.0939 -0.0720 0.0692 0.0237 -0.0361 -0.1972 0.0281 -0.2232 -0.0528 0.0424 0.2265 0.0166 0.0724 0.0802 -0.1738 -0.2274 -0.0525 -0.0611 0.0911 0.0761 0.0275 -0.0419 -0.0833 -0.1577 0.0221 -0.2224 -0.0523 -0.2824 -0.0302 0.0810 0.0970 -0.0961 -0.1698 -0.2428 -0.0725 0.0336 0.0494 0.0122 0.0043 -0.6289 -0.0247 0.2441 -0.1057 0.0227 0.0696 -0.0730 0.1186 -0.0014 0.1676 -0.0712 -0.1210 -0.0576 -0.1379 0.0085 -0.0647 -0.0216 -0.0170 0.1161 0.1908 -0.1452 -0.1284 0.0064 -0.0794 0.0100 -0.0017 -0.0728 0.0576 0.0673 -0.0141 0.1418 -0.1191 -0.0616 0.1597 -0.0236 -0.0193 0.2859 -0.0393 -0.0775 -0.1515 0.1166 -0.0337 0.1775 0.0909 -0.0348 0.0972 -0.1195 -0.1176 -0.2482 -0.0476 0.1101 0.0989 -0.0504 -0.1058 -0.1635 0.0742 0.0122 -0.0097 0.0164 0.1000 0.0046 0.1465 -0.0008 0.0060 0.0603 -0.0395 0.0370 -0.3175 0.1229 0.0047 -0.0874 0.1101 -0.0360 0.2256 0.0072 0.0273 -0.0108 0.0016 0.2252 -0.0218 0.0150 0.0881 -0.0248 0.0442 0.0412 0.0336 0.1097 0.1512 -0.0405 -0.0211 -0.0910 0.1968 0.0960 -0.0284 -0.1363 0.0892 0.1723 -0.1090 -0.0171 0.0229 0.0837 0.0619 -0.1038 0.1447 -0.0964 -0.0948 0.1135 0.0547 0.0676 -0.2286 -0.1102 0.0928 0.0469 -0.0479 0.1023 0.0307 0.0083 0.0118 -0.0373 -0.2943 -0.1947 0.0108 -0.0104 0.0165 0.2860 -0.4094 -0.0901 0.1753 -0.1135 0.1401 -0.0880 0.2030 0.1193 0.0509 -0.0009 -0.0247 -0.1799 -0.0147 0.0025 -0.1617 0.0566 -0.1111 -0.0640 0.0415 -0.0355 0.0813 0.1395 -0.2517 0.0849 -0.0003 0.0145 -0.1677 0.1349 0.0186 -0.0199 -0.0091 0.2043 -0.0226 0.0067 -0.0751 0.1411 0.1148 -0.0695 -0.0866 -0.0570 0.0138 -0.0470 -0.0745 -0.0476 0.0541 -0.0190 -0.0432 0.0276 -0.1622 0.1494 0.0817 0.0472 -0.0812 -0.0461 -0.0515 0.4082 -0.1834 0.0817 -0.1393 -0.0114 -0.0383 -0.2477 0.1634 -0.0488 -0.0357 -0.0792 -0.0503 0.1223 -0.0422 -0.2443 0.1944 -0.2514 0.4021 0.2150 -0.0239 -0.1626 0.0432 -0.0078 0.0095 0.0281 -0.0244 0.1063 0.3063 -0.2097 0.0720 0.0637 0.1183 -0.4195 0.1138 -0.1958 -0.0277 -0.1635 0.0441 -0.2258 0.1115 -0.1499 -0.1678 0.0537 -0.0040 -0.0002 -0.0643 -0.1321 -0.1337 -0.0534 0.0745 0.0445 0.0171 -0.1798 0.0116 -0.0003 0.0356 0.1850 0.1647 0.1551 -0.0332 0.0786 -0.0915 -0.1599 0.0225 0.1578 0.0055 -0.0530 -0.1763
queen 0.2158 0.1095 -0.0499 0.0528 -0.0691 0.1357 -0.2257 -0.0401 -0.1270 0.0628 -0.0031 -0.0278 0.0962 -0.0509 0.1659 -0.1456 0.0043 -0.0858 0.0675 -0.1441 -0.1971 0.0238 -0.1019 0.0023 -0.1479 -0.0579 -0.0348 0.1964 0.1310 0.0026 0.1745 0.1163 -0.0067 0.0843 0.0498 -0.0916 -0.0876 0.0906 0.0348 -0.0006 0.1479 -0.0370 -0.0490 -0.1296 0.0063 0.1218 -0.0154 0.0408 -0.0499 0.0074 -0.0628 -0.1445 -0.6658 0.0405 0.1376 0.0919 0.0064 0.1542 0.0345 -0.1420 -0.0065 -0.0346 -0.1175 0.0170 0.0975 0.0143 0.1287 -0.1075 -0.0065 0.0312 -0.0693 -0.1160 -0.0655 -0.0168 -0.0913 -0.0935 -0.0625 -0.1310 -0.1675 0.1654 -0.0291 0.1045 0.1013 -0.2298 -0.0114 -0.0483 -0.0833 -0.0197 0.2074 0.0536 -0.0780 0.1643 -0.1019 -0.0931 -0.1721 0.1074 -0.1172 -0.1924 0.0593 0.2065 -0.1203 -0.0467 0.1084 0.0567 -0.0726 0.1413 0.0250 0.1973 -0.0504 -0.1155 0.1588 0.1433 -0.0268 0.0863 -0.0997 -0.0466 -0.3265 -0.0673 -0.2185 -0.3463 -0.0872 -0.2026 0.0909 -0.0537 0.0585 0.1235 0.0444 -0.0480 0.0677 -0.0741 0.0913 0.0058 -0.0550 -0.0142 0.0055 -0.0351 0.1426 -0.0439 -0.1415 -0.0103 -0.0261 -0.0491 0.1112 0.2555 -0.0204 0.0381 0.1636 0.0400 -0.0657 0.0045 -0.0749 -0.1928 -0.0147 -0.1681 0.0318 0.1770 0.1891 0.1022 -0.1247 0.1407 0.0687 -0.3527 -0.1691 0.1944 0.0327 0.0830 0.0782 0.0804 -0.0624 -0.0398 -0.0075 -0.0820 -0.0755 0.0504 0.1733 -0.0063 0.2813 0.0388 -0.0612 0.0538 -0.1038 0.0091 -0.1261 0.0584 -0.0394 -0.0677 0.0403 -0.0526 -0.1908 0.0883 -0.0173 -0.0609 -0.0514 0.0405 0.0013 0.0893 -0.0247 -0.0738 0.1093 0.2395 0.0624 -0.0682 -0.2574 0.0557 0.0258 0.1199 -0.0422 -0.0120 -0.1217 -0.0582 0.0242 0.0149 0.1039 0.0624 -0.1623 -0.0538 0.0108 -0.1172 0.0243 -0.0471 -0.0398 -0.1916 -0.1612 -0.0712 0.0630 -0.1812 0.0100 -0.0720 0.0633 -0.0304 0.0055 0.0877 0.3299 -0.1671 -0.0814 -0.1093 -0.0552 0.1108 -0.2203 -0.1218 -0.0576 0.1252 -0.0136 0.1349 0.1234 0.0827 -0.1832 0.1550 -0.1590 0.3917 0.0217 0.0120 0.0074 -0.3095 0.0760 0.0258 -0.0027 -0.1155 0.2152 -0.0023 -0.0116 0.0667 -0.0752 0.0392 -0.3450 -0.0493 0.0098 -0.2498 -0.1739 -0.0746 -0.1962 0.2262 0.0944 0.0789 0.0607 0.3018 -0.0569 0.0931 0.0977 0.2114 0.0645 0.0111 -0.1061 -0.0148 0.1037 0.0244 0.0004 -0.1368 0.1000 -0.0398 0.0114 -0.1902 0.1368 -0.1466 0.1036 0.0302 -0.0502 0.0857 0.1020 0.0424
bike -0.0810 -0.2253 -0.1163 0.0197 0.0209 -0.0413 0.0436 0.0997 0.0560 -0.0246 0.1781 0.0432 0.1534 0.0806 -0.1477 0.1035 -0.0728 -0.0337 -0.0048 -0.2221 -0.4131 -0.0517 0.0399 0.2196 0.1548 -0.0436 -0.1628 0.0860 -0.0458 -0.1028 0.1007 -0.1345 -0.0455 -0.0651 0.0892 -0.0497 -0.0539 0.0041 0.2464 -0.0186 0.0625 -0.0287 -0.0048 -0.0507 -0.1289 -0.0004 -0.1949 0.1101 -0.0428 -0.1882 -0.1063 -0.0069 -0.7432 0.1824 0.0831 -0.0915 0.2549 0.0871 0.0861 -0.1181 -0.1358 -0.2465 0.0306 0.0810 0.1669 -0.0125 -0.2777 0.0038 0.0284 -0.1185 0.0384 -0.0335 0.0813 0.1343 0.1911 -0.0242 0.0940 0.0415 -0.0338 -0.0040 -0.0477 0.1179 0.0002 -0.1873 0.0810 -0.0236 0.0772 0.1164 0.2948 0.0084 0.1103 -0.1192 0.0906 -0.0972 0.0424 -0.0643 0.0060 -0.0268 -0.0688 -0.0004 -0.1834 -0.0512 -0.1533 0.0157 0.1139 0.0682 -0.0225 -0.0259 -0.0700 0.0394 -0.0560 0.0497 0.0945 0.0290 -0.0281 0.0125 0.0297 0.2400 -0.3715 -0.3374 0.0483 0.2656 -0.0917 0.0514 0.0615 0.1499 -0.2017 0.0575 -0.0497 -0.0489 0.1868 0.0960 0.1130 -0.1195 0.1372 0.0680 -0.0784 0.1335 -0.0209 0.0526 0.3480 -0.0454 -0.1771 0.1445 -0.2196 -0.1141 -0.1794 0.1143 -0.0518 -0.0095 -0.0783 0.1670 -0.0285 -0.0468 -0.0133 0.0330 0.1604 -0.1346 -0.1166 -0.2396 -0.0347 -0.0785 -0.0197 -0.1238 0.0507 -0.0390 0.0651 -0.0263 -0.0499 -0.0177 -0.0496 -0.1209 -0.0323 0.0742 0.0551 -0.1079 0.3316 -0.1530 0.0887 0.0178 0.0916 -0.0309 0.0242 0.0728 0.2129 -0.2269 -0.0139 -0.0125 -0.2455 -0.0181 0.1871 0.0278 0.1579 -0.0276 -0.0301 0.0962 -0.0010 0.1031 0.1132 -0.0832 -0.1566 -0.0071 -0.0597 0.1162 0.1935 -0.0980 -0.0227 -0.1021 0.0344 0.3255 0.0319 0.1030 -0.3150 0.0325 -0.1964 0.0540 -0.1663 0.2237 -0.0751 0.0325 0.0018 -0.0459 0.1056 -0.0252 0.1322 -0.2756 -0.1064 -0.0446 0.0724 -0.0672 0.0186 -0.1673 0.3247 -0.0553 -0.1012 -0.2524 -0.0016 -0.0228 -0.2791 0.0961 0.0723 0.0424 -0.2130 -0.0236 -0.0079 0.0142 0.1190 0.1112 0.0288 0.4431 0.0214 -0.0154 0.1291 -0.0605 -0.0977 0.1482 0.0686 0.0504 -0.1769 -0.0990 -0.1044 0.1028 0.0262 -0.0775 -0.3471 -0.0060 -0.2241 -0.2071 -0.1881 -0.0508 0.1130 0.0528 -0.0422 0.0484 -0.0652 -0.0572 0.0201 -0.0875 0.2139 0.0088 -0.0389 -0.0652 0.0219 0.2281 -0.1969 -0.3254 -0.0447 -0.0524 -0.1444 -0.0207 -0.0238 -0.1919 -0.0826 0.2532 0.0881 0.0633 0.1004 -0.0157 0.1310 0.0288
bycicle -0.0960 -0.0068 -0.1629 0.0633 -0.0414 -0.0167 0.0154 0.0099 0.0120 0.0661 -0.0367 0.0500 0.1442 0.0191 -0.0196 0.0139 -0.0736 0.0612 -0.0673 -0.1114 -0.1433 -0.0034 0.0328 0.0833 0.1251 -0.0534 -0.0870 0.1138 0.0554 -0.0971 0.1502 -0.0907 -0.0410 -0.0994 0.0717 0.0598 -0.0233 -0.0516 0.0408 -0.0025 0.0282 0.1276 -0.0859 -0.0761 -0.1238 -0.0334 -0.0315 -0.0143 0.0136 -0.2172 0.0122 -0.1390 -0.6917 0.1568 -0.0568 -0.0336 0.0678 -0.0622 0.2034 -0.0944 0.0104 -0.1943 0.0214 0.0973 0.1279 -0.0777 -0.1309 0.0037 -0.0008 -0.0078 0.0057 -0.0568 0.1361 0.1177 0.0847 -0.0289 -0.0834 0.0539 -0.1107 -0.0545 0.0025 -0.0016 -0.0216 0.0703 0.0137 -0.0248 0.0444 -0.0028 0.0517 0.0275 0.1102 0.0400 0.0932 -0.1538 0.0692 0.0754 0.0166 -0.0332 -0.0936 -0.0163 0.0567 0.0059 -0.0888 0.0439 0.0437 0.0720 0.0645 0.0515 -0.1239 0.2025 -0.1326 -0.0259 0.1308 -0.0113 -0.0688 0.1003 -0.0370 0.0026 -0.1622 -0.1001 -0.0220 0.0863 -0.0911 0.0133 0.0964 -0.1038 -0.0284 0.0088 -0.1272 0.1947 0.0655 -0.0439 0.0068 0.0466 0.0142 -0.1387 0.1109 0.0647 -0.0795 -0.0815 0.1109 -0.0367 -0.1254 0.0304 -0.1510 -0.1378 -0.0365 0.0149 -0.0663 -0.0202 -0.1449 0.0522 -0.0705 -0.0234 0.0332 0.0225 0.1000 -0.0819 -0.0431 -0.0915 0.0523 -0.0328 0.0582 0.0227 0.0415 0.0641 -0.0893 0.0448 -0.0400 -0.0677 0.0457 -0.0323 -0.0118 0.0538 0.0072 -0.0167 0.1805 0.1293 -0.0197 -0.0329 0.0324 0.0194 0.0161 0.0190 0.0117 -0.1870 -0.0837 -0.0618 -0.0070 -0.0110 0.0219 0.0036 0.0721 0.0044 -0.0260 -0.0460 -0.0185 -0.0310 0.0052 0.0390 -0.0651 -0.0674 -0.0174 0.0614 0.1240 0.0054 -0.0011 -0.0663 0.0017 0.0947 0.0638 0.0499 -0.1399 0.0470 -0.0648 0.1360 -0.1301 0.0415 -0.0768 0.0109 -0.0785 -0.0836 -0.0606 -0.0745 0.0183 -0.0634 -0.1074 0.0154 0.0341 0.0614 0.1219 0.1184 0.0145 0.0318 -0.0524 -0.0348 -0.0432 -0.0153 0.0004 0.0668 -0.0297 0.0413 -0.0029 -0.1210 -0.1011 0.0358 0.0104 0.0711 -0.0957 0.0431 0.0193 -0.0484 0.2732 -0.1691 -0.1071 0.0044 -0.0498 0.0291 -0.1714 -0.0300 0.0173 0.0760 0.1074 -0.0268 -0.2761 0.0533 -0.1987 -0.1531 -0.2648 0.0934 0.0870 0.0958 -0.0787 -0.0176 -0.0016 0.0347 -0.0305 -0.0447 0.0963 -0.0206 -0.0678 0.0044 0.0549 0.0898 -0.0791 -0.1175 -0.0384 -0.0122 -0.0813 0.0500 0.0032 -0.0892 -0.0644 0.0498 -0.0351 0.0632 -0.0163 0.0505 0.1975 0.0540

18.3 Reading the data

(def embeddings
  (-> examples-path 
      slurp
      (str/split #"\n")
      (->> (map (fn [line]
                  (let [[token & weights] (str/split line #" ")]
                    [(keyword token)
                     (double-array (map #(Double/parseDouble %) weights))])))
           (into {}))))
(-> embeddings
    tc/dataset
    tc/info)

_unnamed: descriptive-stats [8 11]:

:col-name :datatype :n-valid :n-missing :min :mean :max :standard-deviation :skew :first :last
:data :float64 300 0 -0.6185 -0.01805033 0.3389 0.10136687 -0.87434210 0.0126 -0.0906
:female :float64 300 0 -0.6515 0.00105400 0.4045 0.11530855 -0.43782181 0.1112 -0.0575
:male :float64 300 0 -0.6759 0.00077600 0.4352 0.11742106 -0.40358429 0.0901 -0.0335
:king :float64 300 0 -0.6259 0.00106200 0.4849 0.11550981 -0.36335334 0.1082 0.0256
:programming :float64 300 0 -0.6289 -0.01020667 0.4082 0.12922631 -0.42322574 -0.0359 -0.1763
:queen :float64 300 0 -0.6658 -0.00614200 0.3917 0.12458089 -0.48650175 0.2158 0.0424
:bike :float64 300 0 -0.7432 -0.00782567 0.4431 0.13633039 -0.50522079 -0.0810 0.0288
:bycicle :float64 300 0 -0.6917 -0.00828233 0.2732 0.09174574 -1.38555651 -0.0960 0.0540

18.4 Exploring distances

(-> (for [[token1 vec1] embeddings
          [token2 vec2] embeddings
          :when (pos? (compare token1 token2))]
      {:token1 token1
       :token2 token2
       :distance (vec/dist vec1 vec2)})
    tc/dataset
    (tc/drop-rows #(= (:token1 %)
                      (:token2 %)))
    (tc/order-by [:distance])
    (kind/table {:use-datatables true}))
token1token2distance
malefemale0.6693099730319279
queenking1.4353056608262922
byciclebike1.8556491263167187
queenfemale2.028264395979972
maleking2.107641141181297
queenmale2.1246356958311696
programmingdata2.1484709004312808
kingfemale2.149412631394911
kingdata2.167340349368321
kingbycicle2.194395171795637
databycicle2.20650210061083
femaledata2.223698538471436
maledata2.23728921465241
femalebycicle2.283274904605224
malebycicle2.299659507405388
queenbycicle2.3271938445260627
queendata2.3682638260970847
databike2.395225242017961
programmingfemale2.425492795289237
programmingmale2.4395068354075167
programmingking2.468167603709278
kingbike2.527731047797609
malebike2.528236628561496
femalebike2.5357556447733676
programmingbycicle2.600162393005483
queenprogramming2.625867106309837
programmingbike2.689235856149474
queenbike2.744270352935367

18.5 Exploring relationships

female-male is relatively close to queen-king. This way, the structure of the vector space reflects relationships between words.

(let [{:keys [queen king female male]} embeddings]
  (vec/dist
   (vec/sub female male)
   (vec/sub queen king)))
1.3892567509283515

compared to:

(let [{:keys [queen king data programming]} embeddings]
  (vec/dist
   (vec/sub data programming)
   (vec/sub queen king)))
2.604613631616022

A different way to pharse it is: β€œqueen minus female plus male is close to king”.

(let [{:keys [queen king female male]} embeddings]
  (-> queen
      (vec/sub female)
      (vec/add male)
      (vec/dist king)))
1.3892567509283515
source: notebooks/noj_book/fastmath_vector_word_embeddings.clj