improvement: Context #1281

nmorel · 2022-09-14T10:28:54Z

Hello 👋

@JSteunou and I needed the context to work for our company so we tried to fix the different issues we encountered with the current implementation.

We tried to follow the same "schema" for the internal structure and limit the changes.
It works (closes #1198) but it has some flaws.

For an example of the structure, for this file, it gives the following on extraction :

{
  "Component A": {
    "extractedComments": [],
    "origin": [["collect/componentA/componentA.js", 1]]
  },
  "Hello World": {
    "extractedComments": ["Comment A", "Comment A again"],
    "origin": [
      ["collect/componentA/componentA.js", 2],
      ["collect/componentA/componentA.js", 3]
    ]
  },
  "world ctxt": {
    "Hello World": {
      "extractedComments": [],
      "origin": [["collect/componentA/componentA.js", 4]]
    }
  }
}

With this, we no longer have a flat structure and context add another level.
We now have at the same level a key that can represent a message id or a context.
At various location, I added a test to know which one I'm manipulating.
If a user needs to translate the words origin or translation with a context, it breaks.
This structure also add lots of complexity.

Before we fix the remaining issues, what do you think ?
Should we change something ?

Some solutions we thought about but wanted your impressions first :

change the key to add the context like described here
add some default context in the structure :

{
  "_": {
    "Component A": {
      "extractedComments": [],
      "origin": [["collect/componentA/componentA.js", 1]]
    },
    "Hello World": {
      "extractedComments": ["Comment A", "Comment A again"],
      "origin": [
        ["collect/componentA/componentA.js", 2],
        ["collect/componentA/componentA.js", 3]
      ]
    }
  },
  "world ctxt": {
    "Hello World": {
      "extractedComments": [],
      "origin": [["collect/componentA/componentA.js", 4]]
    }
  }
}

For a complete example, I updated one of the repository example (the ^3.14.1 version references a local version I built).

vercel · 2022-09-14T10:29:00Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
js-lingui	✅ Ready (Inspect)	Visit Preview	Sep 14, 2022 at 4:39PM (UTC)

JSteunou · 2022-09-14T12:01:15Z

As @nmorel said, it fixes the primary issue without changing the logic & structure already in place.

I think it would be better to focus on this here, and if some changes should be made, tailor this in another PR.

nmorel · 2022-09-14T15:21:37Z

One thing I forgot to add, I'm not used to ramda 😅
So there is a few places where I used vanilla JS instead.

…nd id

semoal · 2022-09-20T13:35:04Z

Will review this afternoon guys, thanks for the hard work!

examples/create-react-app/package.json

timofei-iatsenko · 2022-12-12T17:53:59Z

@semoal is something in that PR what stops it from being merged?

andrii-bodnar · 2023-01-06T13:40:25Z

@thekip could you please review this PR? 🙂

@nmorel It would be great if you could rebase the branch as it currently has conflicts that need to be resolved, sorry for the inconvenience

nmorel · 2023-01-06T14:40:13Z

@andrii-bodnar I prefer to get feedback on the PR first before putting any more work on current branch.
The fix isn't perfect and I still have some unanswered questions.

timofei-iatsenko · 2023-01-08T18:18:13Z

@andrii-bodnar @nmorel I reviewed the code in general it looks good to me.

Here are my thoughts (doesn't mean you have to do this, just want to rise an opinion):

It's probably would be better to add in the docs more real-life examples. Right now, docs for context would be good only for people how already know what context is and would look for that in the docs.

Example from other i18n project docs which I've found helpful:

The same text elements with different meanings are extracted with different IDs. For example, if the word "right" uses the following two definitions in two different locations, the word is translated differently and merged back into the application as different translation entries.

correct as in "you are right"
direction as in "turn right"
If the same text elements meet the following conditions, the text elements are extracted only once and use the same ID.

Relaying on the field origin and translation feels a bit dirty and fragile. I like option with default context in internal structures. So lookup would always be as catalog[context || DEFAULT_CONTEXT][messageId]. This also would be nicer in the code as you will not have to copying these checks everywhere (message.hasOwnProperty('origin') || message.hasOwnProperty('translation') || message.context)

But actually i want to rise a bigger discussion here, read at the end of the comment.

How context would be serialized in formats other than .po? I didn't check the code in action, unfortunately. Would it bring breaking changes in already existing catalogs?

Discussion:

Implementation offered in this PR is ok just as quick workaround for such essential feature as context. And this is fine.
However, I think better implementation is needed here, but it would require bigger changes in sourcecode and how the lib would be used.

Part of this was already discussed here, I want to summarize.

Current lingui implementation has a flaw: you have your original message in the bundle at least 2 times + 1 translation.
For a line "Hello world" you would have one in the source code as id in i18n call. Another one as key in the message catalog and then a translation itself. Strings could be very long, not just a couple lines, so this may bring more kb to the bundle.

This PR would also add even more kilobytes because for each msg which should have a context you would have context explicitly written in the bundle + in the msg catalog.

A much better option would be generating a "stable" ID based on the msg + context as hash with fixed length.
So internal structure would look like

{
 [<hash>]: {..}
}

And lookups in runtime would be much simpler.

Hash would be calculated at build time by macros + compile time for catalogs. So macros instead of

const message = t({
   context: 'My context',
   message: `Hello`
})

// ↓ ↓ ↓ ↓ ↓ ↓

import { i18n } from "@lingui/core"
const message = i18n._(/*i18n*/{
   context: 'My context',
   id: `Hello`
})

Would generate:

import { i18n } from "@lingui/core"
const message = i18n._(/*i18n*/{
   id: "<hash(message + context)>",
   message: `Hello`, // only for development purpose, would be dropped in production
})

Compile catalog command would also calculate hashes from msg + context on the fly.
So context would affect only how msg id generates and will not have it's place in the bundle.

However, there are amounts of users who use a lib without a macros. I don't know who they are and what were the reasons behind this decision. I think if you don't use a macro, you lose everything what making lingui awesome. Without a macro there are better alternatives with more stars and bigger community. So I think we should concentrate on one thing and do it well instead of supporting multiple different usages.

This optional status of the macro and necessity of implementing all existing features in 2 modes adds additional complexity to the project and prevents the possibility of implementing some really cool features because we always should think about "runtime" users.

Anyway, it would be great to know how many of users don't use a macro and how painful would be dropping support of the non-macro usage.

@andrii-bodnar what do you think? We can discuss dropping non-macro usage separately somewhere. Depending on this, we can later decide a what to do with this PR.

PS this proposal with build-time id generation still can be used without dropping non-macro users, but it would require 2 runtimes, 2 sets of tests and confusing setup options, that's why I want to avoid it. And avoid generating hashes at runtime as well.

timofei-iatsenko · 2023-01-08T18:38:14Z

I read test snapshots carefully one more time and the following came up. If we bring "a default" context as was proposed, this brings "nested" structures to these catalog formats:

Lingui (Raw) and Simple JSON catalog formats
https://lingui.js.org/ref/catalog-formats.html#json.

If we do that, we also need to provide an upgrade path for the users. This should be relatively simple.
However, if nested structure would be consumed by some translation tools, such as crowdin, it's probably would be treated as one entry, so it may break a flow for some.

So I would say context feature not really compatible with supported JSON formats.

Also generating a "hashed" id as proposed by me would definitely break simple JSON format flow, because key would no longer be an original string.

That's another questions to discuss.

JSteunou · 2023-01-09T08:41:03Z

Very well summarized on the structure. This PR was careful to not bring any breaking changes and only use what was previously attempted (but not finished) on context support. That's why there is no default context that would change the all structure to add one level in depth for everyone, not only context users.

I totally agree that a better way to handle the internal structure (and json format) would be with key as hash. So instead of having 2 breaking changes in a row, I think it would be better to skip the default context improvement and immediately jump on all flat with key as hash.

nmorel · 2023-01-09T13:25:37Z

For the default context, it's possible I think to introduce it without breaking change.
It will only be present in the internal code. When we write to a file (json/js/po..), the default context can be removed.

timofei-iatsenko · 2023-01-09T15:27:18Z

@nmorel could you elaborate on this more?

nmorel · 2023-01-09T16:29:08Z

If I take examples from here, we will have this with the current PR :

po

#: src/App.js:3
#  Comment for translators
msgid "messageId"
msgstr "Translated Message"

#: src/App.js:4
#  Comment for translators
msgctxt "messageCtxt"
msgid "messageId"
msgstr "Translated Message with ctxt"

json

{
  "messageId": "Translated Message",
  "messageCtxt": {
    "messageId": "Translated Message with ctxt"
  }
}

raw

{
  "messageId": {
    "translation": "Translated message",
    "defaults": "Default message",
    "description": "Comment for translators",
    "origin": [["src/App.js", 3]]
  },
  "messageCtxt": {
    "messageId": {
      "translation": "Translated Message with ctxt",
      "defaults": "Default message",
      "description": "Comment for translators",
      "origin": [["src/App.js", 4]]
    },
  }
}

We can keep this format and only introduce the default context internally with a structure like this :

{
  "<user context or default context>": {
    "<message id>": {
      "translation": "<message translation>",
      "defaults": "<default message>",
      "description": "<comments>",
      "origin": [<origin1>, <origin2>]
    }
  }
}

When parsing po files, if there is no context, we set the default context to the message.
When parsing json files, if the value is a string, we set the default context to the message.
Raw files is the hardest to parse. We could test origin/translation fields like the PR to assign the default context or introduce a special property to context to recognize them easily :

{
  "messageCtxt": {
    "__isContext__": true,
    "messageId": {
      "translation": "Translated Message with ctxt",
      "defaults": "Default message",
      "description": "Comment for translators",
      "origin": [["src/App.js", 4]]
    },
  }
}

When converting the internal structure to po/json/raw files, it won't be too hard to remove the default context.

With this, we keep the format intact (except for message with context) and in lingui code, the structure is the same.
We no longer need to test if the object we manipulate is a message or a context containing messages.

But if you want to introduce the hash solution soon, is it really necessary ? 🙂

andrii-bodnar · 2023-01-10T10:39:18Z

@nmorel @thekip thanks a lot for your involvement here!

I'm still learning the Lingui and it's a bit hard to make decisions about the concerns above. @Martin005 maybe you will have some thoughts?

Anyway, we'd like to introduce this feature in the nearest release. It's important to keep it backward compatible and not to provide any breaking changes for now.

Could you please guys briefly summarize what is currently blocking this PR?

And a small general question - what's the difference between comment and context here? For me, it has quite a similar purpose - providing some additional information for translation about the string

timofei-iatsenko · 2023-01-10T13:44:12Z

@andrii-bodnar the problem here if we release current implementation we would be forced to maintain it as it is right now. But the current approach has many flaws and will limit us in the implementing of new features.

So instead of making a release right now and returning to this question again and again in future we propose to implement it properly from the begining and probably ship it as version 4.

And a small general question - what's the difference between comment and context here? For me, it has quite a similar purpose - providing some additional information for translation about the string

This is a common for i18n process to have both. They might be called differently, f ex. meaning / description or context / comment as in Lingui. The purpose of context is provide few different translation for one source message:

correct as in "you are right"
direction as in "turn right"

Unlike context, comment don't affect message id generation, and used just as an additional information for the translator.
For example context could be "direction" where comment could be "used on the header in metablock, please keep it 20 symbols length max"

JSteunou · 2023-02-06T11:13:56Z

Just to throw another stone here: it seems better to do things right, with a v4, if the v4 is really coming soon. But if it is coming soon, then there is no worries about supporting the v3, so the argument about being forced to maintain it turn void. So ship it quick and iterate would be better and non blocking.

On the contrary if the v4 is not coming soon, then it is also better to ship it now in v3 (maybe with the changes @nmorel explain above).

TLDR; in both cases I think it is better for users to ship it now.

timofei-iatsenko · 2023-02-15T22:50:27Z

Suppressed by #1440

vercel bot deployed to Preview September 14, 2022 10:30 View deployment

nmorel changed the title ~~Implement context~~ improvement : Context Sep 14, 2022

nmorel changed the title ~~improvement : Context~~ improvement: Context Sep 14, 2022

JSteunou mentioned this pull request Sep 14, 2022

WIP: fix msgctxt #1205

Closed

vercel bot deployed to Preview September 14, 2022 15:12 View deployment

nmorel added 7 commits September 14, 2022 17:27

test: Add tests on context with macro

7446ad8

improvement: Context mechanism

1f27349

fix: extraction when having more than one message with same context a…

9b281a7

…nd id

fix: fallback to default locale with a context

c13cfa2

fix: stats with context

7fc52bc

fix: clean obsolete messages with context

0985a84

docs: Add context example

bd5e22c

nmorel force-pushed the context branch from deb592c to bd5e22c Compare September 14, 2022 15:31

vercel bot deployed to Preview September 14, 2022 15:32 View deployment

vercel bot deployed to Preview September 14, 2022 16:14 View deployment

Add "context" in doc

c9b6438

JSteunou force-pushed the context branch from e70e57d to c9b6438 Compare September 14, 2022 16:38

vercel bot deployed to Preview September 14, 2022 16:39 View deployment

andrii-bodnar reviewed Oct 19, 2022

View reviewed changes

examples/create-react-app/package.json Show resolved Hide resolved

andrii-bodnar marked this pull request as draft January 27, 2023 10:23

andrii-bodnar linked an issue Jan 27, 2023 that may be closed by this pull request

RFC: Context + hash based message id #1360

Closed

timofei-iatsenko closed this Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement: Context #1281

improvement: Context #1281

nmorel commented Sep 14, 2022 •

edited

Loading

vercel bot commented Sep 14, 2022 •

edited

Loading

JSteunou commented Sep 14, 2022

nmorel commented Sep 14, 2022 •

edited

Loading

semoal commented Sep 20, 2022

timofei-iatsenko commented Dec 12, 2022

andrii-bodnar commented Jan 6, 2023

nmorel commented Jan 6, 2023

timofei-iatsenko commented Jan 8, 2023

timofei-iatsenko commented Jan 8, 2023

JSteunou commented Jan 9, 2023

nmorel commented Jan 9, 2023

timofei-iatsenko commented Jan 9, 2023

nmorel commented Jan 9, 2023

andrii-bodnar commented Jan 10, 2023

timofei-iatsenko commented Jan 10, 2023 •

edited

Loading

JSteunou commented Feb 6, 2023

timofei-iatsenko commented Feb 15, 2023

improvement: Context #1281

improvement: Context #1281

Conversation

nmorel commented Sep 14, 2022 • edited Loading

vercel bot commented Sep 14, 2022 • edited Loading

JSteunou commented Sep 14, 2022

nmorel commented Sep 14, 2022 • edited Loading

semoal commented Sep 20, 2022

timofei-iatsenko commented Dec 12, 2022

andrii-bodnar commented Jan 6, 2023

nmorel commented Jan 6, 2023

timofei-iatsenko commented Jan 8, 2023

Discussion:

timofei-iatsenko commented Jan 8, 2023

JSteunou commented Jan 9, 2023

nmorel commented Jan 9, 2023

timofei-iatsenko commented Jan 9, 2023

nmorel commented Jan 9, 2023

andrii-bodnar commented Jan 10, 2023

timofei-iatsenko commented Jan 10, 2023 • edited Loading

JSteunou commented Feb 6, 2023

timofei-iatsenko commented Feb 15, 2023

nmorel commented Sep 14, 2022 •

edited

Loading

vercel bot commented Sep 14, 2022 •

edited

Loading

nmorel commented Sep 14, 2022 •

edited

Loading

timofei-iatsenko commented Jan 10, 2023 •

edited

Loading