translations & meanings for words | Draft & Implementation thoughts


Yesterday, I tried to find a way to work with words in Russian and English and extract their meanings.

I proceeded as follows (not the best solution, but the quickest): I found dictionaries for English / Russian in CSV format

  • It was quite difficult to find a Russian dictionary, despite the fact that I found the “National Corpus of the Russian Language” https://ruscorpora.ru
  • English dictionary - much easier and faster
  • I also tried to find one in Portuguese - Brazilian - but there it is more difficult than with Russian (it is easy to find a general list of words, but with meanings - it’s a problem).

The final decision - I decided to leave the meanings in Russian and English for now, and not touch the rest - wait until the interface becomes more stable.

Moment 2: The dictionaries are quite large (170 thousand words in English, almost 70 in Russian, in total when collapsing (because one word has several meanings) - about 147 thousand words)

The total weight is about

  • eng - 14 mb
  • rus - 5.5 mb

if compressed in tar.gz

  • eng - 4.9 mb
  • rus - 1.4 mb

In general (if you disregard the web), in modern realities the weight is extremely small. For the web - in principle, it’s also okay, but there is one but - how to use it, and how it will work faster :)

I decided to do this:

from the user’s point of view:

if the user chooses the “technology” feature - then before loading the level all dictionaries will be loaded into the local database (I use sembast). In this case, it turns out - that this is enough to do once - and you can not update until you need to update the data.

if the user does not choose the feature - then use small runtime dictionaries for 5 thousand words, without meanings (english_words, russian_words)

Then implementation: for native applications (not web) - use tar.gz archives, during runtime, all dictionaries will be unzipped and written to localDb. in terms of speed - it turned out: if loading during the game (during the actual gameplay) - about 130-156 seconds if loading before the start of the game - 30-60 seconds

for the web - use csv immediately, so that there is not too much memory overload and also write to localDb - sembast_web - indexDB if loading during the game: 60-80 seconds if loading before the start of the game: 40- 60 seconds

and in any case, both on native and on the web due to the huge volume of decoding, hello freeze for 5-10 seconds (regardless of the archive)

there is also a point - that in general this can be accelerated if you allocate a separate thread for work (outside the ui), but here you need to invest a lot of time in testing shared resources, which means this is optimization, which means I leave it for the time of refinement.

— Thus, in theory, it may be possible to maintain a balance between “fast play” and “long play with a bunch of features”

The remaining pieces of the puzzle are:

  • bridge rus-eng-rus which I will most likely also solve by separate loading of the dictionary.
  • understand how to work with other interface languages

Ideally, further 100% need to divide into chunks, but this is more time, and it will not change anything significantly - so overcoming myself - no optimization :)

In general, the biggest difficulty - is how to do a lot, for a small amount of time - throwing out what can be done, but is not needed now :)

Leave a comment

Log in with itch.io to leave a comment.