March Project End Report

This month I changed my initial plan and went on and continued my February project – Finnish Word Form Collector I added several new components to the system. As I reminder my goal is to generate big enough dataset to test a library that generates Finnish noun forms (e.g. if you asked for plural genitive of 'koira' it would return 'koirien'). The bulk of this data has so far come from the Finnish books of Project Gutenberg.

This system now has five services running in GCP

Report of new word forms found. Report generated at 2023-03-31 04:07:06.

Total: 383627

Total last 7 days: 298

Total last 24 hours: 60

Prompts for 2023-03-31T00:07:07.942637

  • nugaa Astevaihtelu: _ -> _ (plural nominatiivi nimentö mikä? kuka? monikossa -t)

  • tappo Astevaihtelu: p -> pp, pp -> p (singular elatiivi sisäeronto mistä? kenestä? -sta, -stä)

  • susijahti Astevaihtelu: t -> d, d -> t (plural elatiivi sisäeronto mistä? kenestä? -sta, -stä)

  • markka Astevaihtelu: k -> kk, kk -> k (singular genetiivi omanto minkä? kenen? -n, -en, -in, -den, -ten, -tten)

  • kookosvoi Astevaihtelu: _ -> _ (singular ablatiivi ulkoeronto miltä? keneltä) -lta, -ltä)

  • kajuutta Astevaihtelu: t -> tt, tt -> t (singular genetiivi omanto minkä? kenen? -n, -en, -in, -den, -ten, -tten)

  • askellin Astevaihtelu: lt -> ll, ll -> lt (plural adessiivi ulko-olento millä? kenellä? -lla, -llä)

  • taival Astevaihtelu: p -> v, v -> p (singular inessiivi sisäolento missä? kenessä? -ssa, -ssä)

  • säppi Astevaihtelu: p -> pp, pp -> p (singular partitiivi osanto, eronto mitä? ketä? -a, -ä, -ta, -tä)

  • syli Astevaihtelu: _ -> _ (plural instruktiivi keinonto miten (keinoa, välinettä) -n)

Sent by the New Word Forms Report Generator

The final function collects the existing data and send it in to my email inbox every morning.

Additionally, I created extremely low effort hack to generate an RSS feed timokoola/quickrssfrommd: Repository that contains scripts to generate RSS feed from source markdowns. I use this to add new word forms to the system until I have an editor UI in my use.

Conclusion of the March project

I am decently happy about the results and the functionality of the system. It now runs mostly on its own and generates me roughly 50-60 new word forms a day with zero effort (and when I add word forms manually to RSS I get way more). I also got pretty comfortable with Google Cloud Function and Cloud Run. The monthly cost of the system is roughly 30 euro cents and most of that is object storage cost. I have now all the pieces in place to start the editor project later on.

April Project

The April project team is "templates" and "documentation". My first goal is to create a template for a command line tool. I often have a need to create a command line tool and I would be much faster and had more of them, if I had a pre-thought structure to follow. I have also decided to us Go lang for my command line tools in the future. While my brain thinks in Python, the "single deployable binary" is extremely convenient. I have a couple of ideas that probably two gets implemented in April.

I also will document and formalize a template for a Google Cloud Function projects (at least for Python, stretch goal for Go lang as well). timokoola/wordformemailreport is a good basis for more generic template.

And finally, I want to document the architecture of the word form collector in a single post I can use as a landing page in the future.