Necessary features for better translations? #19

Onurbon · 2023-11-01T17:30:35Z

Onurbon
Nov 1, 2023
Maintainer

We've heard of several individuals and teams who would be interested in producing really high quality translations in various languages.

Here is a brief overview of how TttC is currently supporting multiple languages:

You can upload a CSV files including comments made in different languages (e.g. some comments in English and some in Mandarin) and then ask GPT-4 (as part of the prompt) to respond in English during the argument extraction step.
At the end of pipeline, the translation step can then translate everything into any language (or list of languages) specified by the user, using again GPT-4 (or other model) under the hood. By "translating everything" we mean: the extracted arguments, the cluster labels, the summaries, and even all the text used in the UI.

One benefit of the current approach is that the user can really specify really any language they want and the results will probably fine assuming GPT-4 is fairly good at producing this language. There are however some limitations:

Automatically translating the UI copy doesn't always work really well. The automatic translations could be improved by giving GPT-4 a little bit more context (e.g. explaining in the prompt how this copy is used in the UI) but it will often be best to just provide hardcoded translations for the most commonly used languages, as proposed in issue Better translations for UI copy #1.
At the moment, the code assumes that English will be used as the primary language. To make sure that the pipeline is run in a different language, the user would first need to translate all the prompts to this languages and/or instruct the LLMs to respond in that language. This can in fact already be done without changing any code, simply by providing the alternative prompts as part of the job config. However, some code changes would still be needed if we want users to be able to display the right language name (and corresponding flag) in the generated reports. See issue Make primary language configurable #18.
Some users will want to use other AI services than OpenAI. We already plan to provide support for more LLMs in general (see issue Support Claude, Llama2 and other LLMs #5), but maybe there are translations services that we may want to use?

The goal of this discussion thread is not only to gather feedback on the above but also to hear about other requirements and/or feature ideas that different users might have.

If you're reading this and working on translation for other projects related to digital democracy, please also don't hesitate to share pointers to your work for everyone's awareness (even if your work is not yet ready to integrate to the TttC pipeline).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Necessary features for better translations? #19

{{title}}

Replies: 0 comments

Select a reply

Necessary features for better translations? #19

Onurbon Nov 1, 2023 Maintainer

Replies: 0 comments

Onurbon
Nov 1, 2023
Maintainer