Jump to content

Template talk:Wiktionary

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

no italics for foreign scripts please

[edit]

It says somewhere, (maybe not at WP:MOS#Italics but somewhere) that foreign scripts shouldn't be in italics. It certainly makes kanji really hard to read and doesn't seem to have a point. Is there a way I can manually make a non-italic wiktionary tag? or could the template maybe check the unicode or whatever and do that for me? Siuenti (talk) 01:20, 3 April 2017 (UTC)[reply]

Is it gauche to bump this? Because this still glaringly needs to be done, one way or another. Remsense 02:16, 18 October 2023 (UTC)[reply]

Maybe I could substitute it and fix it? is there a rule? Siuenti (talk) 01:23, 3 April 2017 (UTC)[reply]

It would be best to make an edit request: ask a template editor to add a parameter that turns off italics. — Eru·tuon 03:16, 3 April 2017 (UTC)[reply]
Given the complexity of the template code required to do this, I think it would be best if a module were created to generate the content instead. — Eru·tuon 03:17, 3 April 2017 (UTC)[reply]
Would this WP:MODULE do what I just did? You'd end up with a subst-ed template? Siuenti (talk) 03:50, 3 April 2017 (UTC)[reply]
No, a module is a program that the template would use to generate the content, but it would not be substituted. — Eru·tuon 18:38, 3 April 2017 (UTC)[reply]

Just a couple of comments. I agree 100% with the sentiment of this request, but strictly speaking it is not that "italics" should not be used, it is that the html <i> tag♠ should not be used. For the Roman alphabet in a serif font this produces italic; for a sans serif font or any other script it produces "sloped", which is not really the same thing. There's more: consider a description of Chinese numerals, 一 二 三 : now where is the middle one in the Wiktionary box on the right? ¶ ♠Note that <em> is no better: it actually produces exactly the same sloping, only it's called semantic sloping; it is not how emphasis is applied to Chinese characters, for example. See: Hi! こんにちは! привет! ¶ Conclusion: except for Roman, there should be no extraneous markup at all. Imaginatorium (talk) 07:08, 3 April 2017 (UTC)[reply]

@Erutuon: Per my note below, I am planning to implement this request. Please let me know if you have any thoughts on the issue.
@Imaginatorium: I added {{wikt|example|竹|島|三|一|二}} to show the current output.

Following is a simulation of a possible new implementation given input {{wikt|example|NOITALICS|竹|島|三|一|二}}. The idea is that NOITALICS would make any following items upright and ITALICS would make any following items italic (default). I chose uppercase because wikt:italics is a valid Wiktionary entry.

Is the output correct? Any comments on the syntax? Johnuniq (talk) 07:46, 31 May 2019 (UTC)[reply]

@Johnuniq: It would be convenient if the template behaved somewhat like {{lang}} and italicized if is_Latin from Module:Unicode data returns true. Then {{wikt|example|竹|島|三|一|二}} would have the same output as you proposed for {{wikt|example|NOITALICS|竹|島|三|一|二}}. That would apply the correct italicization in most cases. Italicization might have to be forced rarely on words in a mixed Latin-and-Greek alphabet like that of Halkomelem (discussed in Template talk:Lang/Archive 7 § Italicisation of Halkomelem), but I am not sure if the reverse would ever be true. — Eru·tuon 09:57, 31 May 2019 (UTC)[reply]
Thanks, I will investigate that. I thought of ITALICS ("reverse") because there may be cases where, for example, a Japanese word is wanted first and an English word second. My brushes with Unicode made me think that auto-detection would be very difficult but the module you mentioned might have solved that. Johnuniq (talk) 10:02, 31 May 2019 (UTC)[reply]
@Johnuniq: Thanks for responding. Initially I was hoping to try myself to make a non-italic version to test, but simply couldn't find how to get (read) access to the source of the template. (Would be helpful to know anyway.)
Yes, basically, your suggestion would solve the first problem, of inapropriate sloping. Personally I find chucking in flagwords ("ITALICS" etc) a bit dubious engineeringwise; I was thinking of a single parameter for the whole call. You could argue that it would be better in any case to avoid a mixture of sloped and nonsloped entries in the list in a single "Wiktionary" box.
Stepping back and thinking bigger... why not have a language parameter? Then anything non-roman could automatically be nonsloped; and there's more - Wiktionary entries are a collection of all the words in all languages which happen to be written with the same sequence of letters. A language parameter could go to the correct entry as well. And I have just seen the comment above which suggests that checking for "Roman" would be easy. The second problem is the underline, but that's a bit vast: I started making some comments in my sandbox, and I'll leave them on my user page for now: User:Imaginatorium/sandbox#Cyrillic_stuff — Preceding unsigned comment added by Imaginatorium (talkcontribs) 05:34, May 31, 2019 (UTC)
The problem is that several items might be wanted, but only some of them are in a non-Latin script so an option for all parameters might not work. See the discussion above with today's date where Erutuon mentioned an automatic system that might work. Johnuniq (talk) 11:07, 31 May 2019 (UTC)[reply]
Sure. And automatically not sloping non-Roman would minimise bother to editors, because it only needs a note in the manual. But I think it would be a good idea to consider the "lang" parameter, because as I said above, Wikt doesn't do disambiguation. Try looking up the Danish for 1, 2, 3: the entry for 'to' covers 41 languages. Even more, really, really, it should go to the required entry, which is probably a POS within a language. Imaginatorium (talk) 12:54, 31 May 2019 (UTC)[reply]
I'm a Wiktionary editor and it's possible to have the link go to the correct language section, but nearly impossible to have it go to the correct POS within that language section. We use sense ids (see wikt:Template:senseid) to link to individual definitions, but there is no guarantee that those will not change, and we are likely to only check Wiktionary when making changes to them. — Eru·tuon 16:20, 31 May 2019 (UTC)[reply]
For data to enable the template to link to the correct section of the Wiktionary entry, see Module:Wikt-lang/data, which is used by {{wikt-lang}} and {{wt}}. The language code supplied to the template have to be validated, sometimes replaced with the language code for the language that actually has entries (for instance, hr, Croatian, with sh, Serbo-Croatian) and the correct language name has to be looked up, which is sometimes different from the name of the Wikipedia article or the name returned by the mw.language module. If the template goes this way, the linkToWiktionary function in Module:Language performs the basic function of creating a Wiktionary link, and the redirects table in Module:Wikt-lang/data provides the official Wiktionary codes for various language codes used on Wikipedia. — Eru·tuon 16:37, 31 May 2019 (UTC)[reply]
@Erutuon: Thanks, although I'm not sure I'm ready to expand the scope of this template to handle the complexity which I'm glimpsing from your comments—certainly not without a demonstrated need.
@Imaginatorium: My role here is as a coder and I don't know what you want from POS and only have a vague idea of how lang might be useful. This started as a request to allow some items to not be in italics—presumably lang would be an additional option to link to a section at Wiktionary?. I need everything spelled out preferably with examples of syntax (even if only provisional), with a brief statement of what would be achieved by the syntax, for example with a [[wikt:example#anchor]] link. Johnuniq (talk) 03:54, 1 June 2019 (UTC)[reply]

No italics specification

[edit]

Here's my suggestion. Each parameter is either an 'entry', a word to be listed, or a lang=xx parameter ('language'). The lang setting applies to all following entries. Default is 'en'. The entries then link to the specified language section within Wikt (or none if English, since by default it is at the top, but surely overspecification is not bad). The following should be (mostly) simple:

  • Get script from language code. Messy: things like Serbian, which use both Latin and Cyrillic scripts
  • Make italic conditional on script: for Latin only. (Cyrillic: bad idea to use pseudo:italic, because one day WP might switch to a Roman font (as it should do, IMO), then the Cyrillic will be Russian cursive, which is too confusing.)
  • Convert to appropriate wikt:link.

Erutuon appears to be a wikt expert: I absolutely take his word for it that trying to do POS is too hard. I don't know how easy it would be to suppress the underline, but in any event that could wait. Imaginatorium (talk) 06:09, 1 June 2019 (UTC)[reply]

(edit conflict) About automatic Latin script detection: the method used by the is_Latin function in Module:Unicode data is pretty reliable. It checks that all characters (code points) belong to the Latin (Latn), Common (Zyyy), and Inherited (Zinh) scripts and that there is at least one Latin character. The Latin script includes mainly letters (most notably a-z, A-Z), Common includes a lot of punctuation (including apostrophes), and Inherited includes combining diacritics. So the whole sentence in this xkcd comic (I found it by searching for "lots of diacritics"), including the piles of diacritics (Inherited) and the period (Common), would qualify as Latin.

As a practical example, I've been looking for links to English entries on Wiktionary where the entry name includes suspect characters that do not belong to one of these three scripts or to the Braille script. That gives a list that contains some errors, and some genuine cases where a non-Latin character is used in English, like β-carotene, where the suspect character is β. That might at least give you an idea of what doesn't qualify as Latin under this definition. — Eru·tuon 06:34, 1 June 2019 (UTC)[reply]

Thanks all, this is beginning to look like a workable specification. Re Imaginatorium's suggestion above, an exact example would be good because that would show problems. Consider {{wikt|A|ja|竹|en|B}} which might output the Wiktionary box with items:
  • [[wikt:A#English|''A'']]A
  • [[wikt:竹#Japanese|竹]]
  • [[wikt:B#English|''B'']]B
If the "English" or "Japanese" anchors exist, the link would go to the correct section; otherwise, it would go to the top of the page.
  • Q1 Should the #English anchor always be added if en, or should there be no anchor for en?
  • Q2 Or should there be no anchors? I don't know what is intended.
  • Q3 There are hundreds of language codes and the module would need a list of at least the ones that might be used (code → name, such as en → English). Where is that list? The name would have to agree with what is used at Wikidata.
  • Q4 If "en" and other language codes worked as in this example, you could never link to a Wiktionary entry for en. It is not possible to have syntax such as lang=en because only the last of those would have any effect, and it would apply to all parameters (the module would have no knowledge of where lang=en occurred). It would be possible to use a special character such as underscore to indicate that it is a language code, so the parameters would be _en and _ja etc. That's a bit ugly but perhaps acceptable?
  • Q5 If we rely on automatic detection (Latin = italics) then there is no ability to handle an exception. That issue could be deferred until it occurs, but some thought about a syntax to handle it might be helpful. Perhaps parameters ITALICS and NOIALICS that I mentioned? They should be rarely needed, if at all.
Johnuniq (talk) 07:47, 1 June 2019 (UTC)[reply]

I started looking and I see that Module:Wikt-lang/data has the answer to my question about a table of language codes and names although the variations there are over my head. But I also see {{wt}} and {{wikt-lang}} that Erutuon mentioned. Do they do all that is required? The following shows what they output for some examples adapted from the docs:

  • {{wt|fr|langue}}[[wikt:langue#French|langue]]langue
  • {{wt|fr|''langue''}}[[wikt:langue#French|''langue'']]langue
  • {{wikt-lang|fr|langue}}<i lang="fr" xml:lang="fr">[[wikt:langue#French|langue]]</i>langue
  • {{wikt-lang|ru|язы́к}}<span lang="ru" xml:lang="ru">[[wikt:язык#Russian|язы́к]]</span>язы́к

Perhaps {{wiktionary}} could just use the above when the normal bold italics is not wanted? Then this template would only need an option to say "use the parameters exactly as given". My problem at the moment is that I still do not know exactly what output is wanted for what input. Johnuniq (talk) 10:38, 1 June 2019 (UTC)[reply]

Thanks for looking at this. I hadn't realised the parameter limitations, so perhaps you are right: language(s) have to be indicated by flag-words. Is using an initial _ standard/common, at all?
Also, I realise I am conflating two separate issues: going to the right language section, and not italicising non-Roman. This isn't optimal, I know. I can't see why it is necessary to italicise anything (oh, I just did!!); I also think that much effort is put into making information presentation look just like "real, ordinary, text", and this is completely counterproductive. The Wikipedia template at Wiktionary doesn't do this; it writes "English Wikipedia has an article on:" followed by the article title underneath. Sorry, thinking out loud here, but if some things just have to be italicised, there will have to be a mechanism to switch it off manually. Imaginatorium (talk) 16:39, 1 June 2019 (UTC)[reply]
I'm not sure if language codes could be supplied in the numbered parameters. Some language codes are valid words, like en, though I suppose they would rarely be linked to. On Wiktionary we would use list parameters like |lang1=, |lang2= (and so on) that would apply to the first and second numbered parameters, so in that scheme {{wikt|A|ja|竹|en|B}} would be {{wikt|A|竹|lang2=ja|B|lang3=en}}. We have wikt:Module:parameters to automatically gather list parameters into an array, but I'm not sure if anything like that exists on Wikipedia. — Eru·tuon 17:08, 1 June 2019 (UTC)[reply]
At the risk of complicating the discussion, there is more to indicating a foreign language than italics. It's just a convention that we commonly italicise Latin script to indicate a foreign language, but that's only a part of it. Semantically, we should be marking the foreign text in such a way that it indicates unambiguously that it is foreign text. That has the advantages of (1) being potentially understandable (and pronounceable) by screen readers, and (2) it allows translators to machine-translate our rendered text into languages which don't use italics to indicate foreign text. In other words we should not naively use italics, as that has no agreed semantic meaning, but we should use what we have available: {{em}} to indicate emphasis; {{lang}} to indicate foreign text; and perhaps leave '' for things like titles. Visually the first three pieces of text below look similar, but if you examine the rendered html, you'll see they are quite different. The {{lang}} template does not italicise Japanese text:
  • I live in a {{em|house}} → I live in a house
  • ''House'' (TV series)House (TV series)
  • {{lang|de|Haus}}Haus
  • {{lang|ja|家}}
Anyway, the point is that we have a template (and module), {{lang}} that renders good markup for most languages and we should be using it rather than making up our own markup. Hopefully that should give Johnuniq all he needs to solve the italics/not-italics question. That should leave just the destination issues to sort out. --RexxS (talk) 23:13, 1 June 2019 (UTC)[reply]
It would be good to use the backend of {{lang}} to handle italicization and language-tagging, but there's some complexity involved. There would need to be logic to handle the differences between Wikipedia and Wiktionary language codes, and the template would have to do something sensible (preferably involving automatic italicization) with the current instances of {{Wiktionary}} that don't have language codes. At the moment the HTML-generating function in Module:Lang (make_text_html) requires a language code. Maybe it could just omit the lang attribute when there isn't a language code. Pinging Trappist the monk, who maintains that module.
Awhile ago I took a stab at handling Wiktionary and Wikipedia language codes in Module:Language (the backend of {{wikt-lang}} and {{wt}}), but it's messy and not quite able to interoperate with Module:Language Module:Lang. It can maybe handle script and private-use subtags (I'm not sure because I never made proper testcases), but not region or variant.
My plan was to allow either Wiktionary language codes or Wikipedia language tags and use the Wikipedia language tag in the HTML, but the Wiktionary language code for determining the section to link to in the Wiktionary entry. So both {{wikt-lang|hr|reč}} and {{wikt-lang|sh|reč}} should link to the entry for Serbo-Croatian (sh), but the one should use hr (Croatian) in the HTML and the other sh. Unfortunately, though, while they link to the correct entry, {{wikt-lang|hr|reč}} puts sh in the HTML. So the module needs work. I do have the necessary knowledge of Wiktionary's infrastructure, but not sure about motivation. — Eru·tuon 05:54, 2 June 2019 (UTC)[reply]
Maybe it could just omit the lang attribute when there isn't a language code. Since the purpose of {{lang}} is to properly render html, omitting the attributes that make the rendering proper seems to be not such a good thing. I don't think that make_text_html() requires anything so one might create some sort of front-end that calls make_text_html() with something like this when language is not specified:
make_text_html ('', <text>, 'span', <rtl>, <italic>, <size>, '')
handling Wiktionary and Wikipedia language codes in Module:Language ..., but ... not quite able to interoperate with Module:Language. what?
There would need to be logic to handle the differences between Wikipedia and Wiktionary language codes Wikipedia and Wiktionary should not be in the business of making-up their own language codes.
Trappist the monk (talk) 11:13, 2 June 2019 (UTC)[reply]
Just because I was curious, I've hacked Module:lang/sandbox and added function make_text_html_experiment() which calls make_text_html() with whatever values are in the {{#invoke:}} (if any); empty or omitted parameters are converted to empty strings or appropriate defaults; no error checking:
{{#invoke:lang/sandbox|make_text_html_experiment|code=|text=text|tag=span|rtl=|style=italic|size=|language=}}
<strong class="error"><span class="scribunto-error mw-scribunto-error-f426fddb">Script error: The function &quot;make_text_html_experiment&quot; does not exist.</span></strong>{{#invoke:lang/sandbox|make_text_html_experiment|code=|text=text|tag=span|rtl=|style=italic|size=|language=}}
This suggests that if there were some sort of wiktionary-code to wikipedia-code map table in Module:lang/data (only those that are different need be mapped) then some sort of front-end could be created in Module:lang that would do as Editor Erutuon suggested.
Trappist the monk (talk) 12:01, 3 June 2019 (UTC) edit: test function in Module:Lang/sandbox deleted 16:44, 5 June 2019 (UTC)[reply]
@Trappist the monk: Module:Wikt-lang/data provides that information in a Wikipedia_code field in the data table for each Wiktionary language code, so that could be migrated to Module:Lang/data if desired. But if Module:Lang is not going to actually contain the Wiktionary linking function, the Wiktionary code could be converted to the Wikipedia code before make_text_html is called, because the Wiktionary codes are only really useful for generating a Wiktionary link (to retrieve the section name and the rules for diacritic removal). — Eru·tuon 19:32, 3 June 2019 (UTC)[reply]
There is a clear default: "English" or 'en' if you like. It has to be a good thing if the huge proportion of editors who never handle anything other than English can just continue with {{wiktionary|fishpaste|jam|marmite}}. Also, the default 'en' case at wiktionary needs no anchor, because the 'en' entry is always first. There is one other case to consider: "no language", or "translingual", which would include I think some symbols (hard to see why wiktionary would be useful), and also Chinese characters generically. There has to be a code for this, which may not match an existing language code. Imaginatorium (talk) 11:38, 2 June 2019 (UTC)[reply]
Shouldn't the 'default' language be the language of the local wiki? Here at en.wiki that would be English, but copying this template to the af:Sjabloon:Wikt, the default language would be Afrikaans, right?
ISO 639-3 § Special codes
Trappist the monk (talk) 12:02, 2 June 2019 (UTC)[reply]
Wikipedia and Wiktionary should not be in the business of making-up their own language codes. Unfortunately English Wiktionary has some "made-up" language codes, so Wikipedia templates that link to Wiktionary entries have to deal with them somehow. They should be replaced with a valid code in HTML on Wikipedia. For instance, the Wiktionary code ine-pro, for Proto-Indo-European, can be used in {{wikt-lang}} and it is used to look up the language name for the link, but it is replaced with the Wikipedia code ine-x-proto (the Wikipedia language code, which apparently is valid because it uses a private-use subtag) in the generated HTML. — Eru·tuon 00:25, 3 June 2019 (UTC)[reply]
See also the correction to the nonsensical sentence that you quoted. — Eru·tuon 00:27, 3 June 2019 (UTC)[reply]
Material that is not in a Latin-based script, is not italicized per MOS:FOREIGN (Cyrillic, Greek, Chinese, Ethiopian, Sanskrit, etc.). If we just us the {{lang}} template for this markup, it will be taken care of for us already.  — SMcCandlish ¢ 😼  18:05, 2 January 2020 (UTC)[reply]

Examples

[edit]

This shows what happens using {{lang}} with no anchors in the Wikidata links. Italics occur only for non-English, Latin text. The items are on separate lines to make it easier to see the code.

{{Sister project
|project=wiktionary
|text=Look up
'''[[wiktionary:hello |{{lang|en|hello|nocat=yes}} ]]''',
'''[[wiktionary:langue|{{lang|fr|langue|nocat=yes}}]]''',
'''[[wiktionary:|{{lang|ja||nocat=yes}}    ]]''',&amp;nbsp;or
'''[[wiktionary:язык  |{{lang|ru|язык|nocat=yes}}  ]]''',
in Wiktionary, the free dictionary.
}}

Johnuniq (talk) 07:17, 2 June 2019 (UTC)[reply]

(Pleased to see this all moving in the right direction. More later, but first...) There's an amusing bug in the example above: Once you say 島 is Japanese, Japanese linebreak rules kick in, so it happily breaks the line before the ("non-Japanese") comma. Changing to a Japanese comma (irrelevant really, since that's no way to go) doesn't seem to affect it, nor removing the non-break space. This, IMO, reinforces my suggestion that it is much better to avoid attempting to create a natural English sentence. Much better to show the information in a clear tabular way (somehow). (Also stray spaces causing some confusion...) Imaginatorium (talk) 07:51, 2 June 2019 (UTC)[reply]
So {{lang}} doesn't italicize English text. The documentation for {{lang}} doesn't mention an option to force the template treat English like any other language (and this exception is hardwired into Module:Lang). Perhaps that could be changed, or italicization could just be turned on (|italic=yes) in the module function for {{Wiktionary}} if the language code is en. — Eru·tuon 04:43, 4 June 2019 (UTC)[reply]

Module ready for test

[edit]

I have created Module:Wiktionary which is called with {{wiktionary/sandbox}}. I put some tests in my sandbox (permalink). It's very new code so problems should be expected but it shows the general idea so we can evaluate whether calling {{lang}} for each item is desirable. I will add more here soon to outline the rules around underscore which can be used to specify a language or an anchor. Johnuniq (talk) 05:43, 4 June 2019 (UTC)[reply]

Notes on use of underscore:
  • A parameter starting with an underscore is either a language code (for example, _fr for French) or an anchor (for example, _French to link to the #French section on a Wiktionary page.
  • Specifying an anchor can only occur immediately after a language code.
  • Text that does not start with an underscore is assumed to be a Wiktionary item.
  • Using "_" in place of a language code resets the current language to that of the Wikipedia (en here).
  • Using "_" in place of an anchor gives a link with no anchor.
See my sandbox for examples.
I imagine that 99% of people using this template will not need to know about underscore—the defaults should do what they want (particularly since the defaults are all they can do at the moment). Only people wanting to tune entries will need to understand what underscore does. Johnuniq (talk) 09:48, 4 June 2019 (UTC)[reply]
I tweaked Module:Wiktionary:
  • I set the initial value for langCurrent to ISO 639-3 code und; until the language is specified, it is unknown
  • langDefault goes away in favor of langLocalWiki
  • let {{lang}} make the italicize / don't-italicize decision; I did this because the module was improperly italicizing the Russian-language text (second left-side example at your sandbox)
  • for consistency, call {{lang}} when the word is the article title
{{lang}} does not italicize English language text (that should probably be fixed so that it's local language that is not italicized) so {{wiktionary}} and {{wiktionary/sandbox}} renderings are slightly different.
I left your original code in-place.
Trappist the monk (talk) 12:19, 4 June 2019 (UTC)[reply]
Tweaked again to require ('Module:Lang') and avoid frame:expandTemplate().
Trappist the monk (talk) 13:18, 4 June 2019 (UTC)[reply]
As noted above, I made the module italicize English because Module:Lang treats it specially and doesn't italicize it even when it's in the Latin script. I don't see the Russian word being italicized if I add this logic again. — Eru·tuon 17:15, 4 June 2019 (UTC)[reply]
Module:Lang attempts to adhere to MOS:FOREIGNITALIC. Because English at en.wiki is not a 'foreign' language, {{lang|en|<text>}} and {{lang-en|<text>}} do not italicize <text>.
MOS:WORDSASWORDS applies here because this template is about words-as-words. In this template, the words of interest are 'highlighted' with a bold font (and wikilinks). Per MOS:FOREIGNITALIC, italics identify non-English-Latin-script words. Words that do not use the Latin script are obviously non-English so don't need to be (shouldn't be) italicized to show that they are not English words. MOS:WORDSASWORDS expressly discourages the simultaneous application of both bold and italic highlighting to highlight words-as-words. Properly, italicized bold identifies non-English-Latin-script (italics) words-as-words (bold); bold upright identifies English (upright) or non-English-non-Latin-script (upright) words-as-words (bold). But, we violate that with the code that you restored which, I think, should be removed.
The Russian text was italicized by the markup beginning at line 12 in Module:Wiktionary.
Trappist the monk (talk) 18:44, 4 June 2019 (UTC)[reply]
Okay, well, my code behaves more like the current version of {{Wiktionary}}, which italicizes what are perhaps most often English words. But that version is already breaking the rules that you have described whenever the link is to an English entry. At the moment, there is no way to determine when that is true and when it isn't because there are no language codes (though if the entry name contains a script never used in English, like Devanagari, it's a good bet that it's not English). My preference would be to italicize and not embolden if a choice must be made. — Eru·tuon 19:08, 4 June 2019 (UTC)[reply]
Great! Perhaps the next thing to add is for the anchor "Translingual", used to access basic information about a Han character, rather than the language-specific (zh, ja, ko...) details. Arguably, it should also be used to look up the letter "A" for example, and this same heading (Translingual) is used there. OTOH, this is also (always?) the first entry on the page, so it might not be necessary. I still think that a way of avoiding italicising anything would be an improvement. (For example, if you have a code for "Translingual", in the current scheme it has to be italicised iff it's a Roman letter. grr!) Imaginatorium (talk) 07:12, 4 June 2019 (UTC)[reply]
I need examples. What currently is not satisfactory and what should it do? If an option is wanted to set the anchor to "Translingual" for some cases, try _Translingual per new documentation that I have inserted above. Johnuniq (talk) 09:48, 4 June 2019 (UTC)[reply]
To link to the Translingual header, people can also use the language code mul. — Eru·tuon 17:10, 4 June 2019 (UTC)[reply]

Italics for English words

[edit]

Can we please settle the question of whether this template should put English words in italics. Here are some examples of current behavior using the main and sandbox templates. Currently, the sandbox output has "down" in bold italics and holding the mouse over it shows pop-up "undetermined language text".

Above, Trappist the monk said italics should not be used. Following is a paraphrased copy of Trappist's text above:

  • Module:Lang attempts to adhere to MOS:FOREIGNITALIC. Because English at en.wiki is not a 'foreign' language, {{lang|en|<text>}} and {{lang-en|<text>}} do not italicize <text>.
  • MOS:WORDSASWORDS applies here because this template is about words-as-words. In this template, the words of interest are 'highlighted' with a bold font (and wikilinks). Per MOS:FOREIGNITALIC, italics identify non-English-Latin-script words. ... MOS:WORDSASWORDS expressly discourages the simultaneous application of both bold and italic highlighting to highlight words-as-words.

Erutuon points out that the current template uses italics for English words, and such words are very much the most common and says:

  • My preference would be to italicize and not embolden if a choice must be made.

I would use italics when writing "look up down in a dictionary". However, bold and linked should be enough: "look up down in a dictionary". I can see both sides and essentially do not mind what the outcome is. However, the issue should be settled so please add thoughts. Johnuniq (talk) 03:58, 5 June 2019 (UTC)[reply]

The sandbox example that you provide appears to be working as it should because the default language code is und which is not en so the template does not know that down is an English word. Were you to change that template call to:
{{wiktionary/sandbox|up|_en|down|_ja|島}}
you will see much the same rendering except the tool-tip will read "English language text" (at the time of this writing, down is italicized because part of the changes that I made to Module:Wiktionary were reverted). Preview this page with this version of Module:Wiktionary to see how the template renders English words that are defined as English words.
We are having these discussions because the current {{wiktionary}} is badly flawed – it indiscriminately italicizes everything; using it as an example of what we should do in the future doesn't make much sense to me. If we are going to fix it, we should fix it correctly and not continue to perpetuate its flaws in a new version.
MOS:FOREIGNITALIC provides for only one way to highlight foreign words written with the Latin script: italics. MOS:WORDSASWORDS is more flexible in allowing italics, bold, or quoted (but not combinations of these). So, to comply with MOS:FOREIGNITALIC and MOS:WORDSASWORDS, and using this example, {{wiktionary/sandbox|up|_en|down}}, we have these options:
  1. Look up up, or down in Wiktionary... → words-as-words: bold; non-English Latin-script: italic
  2. Look up up, or down in Wiktionary... → words-as-words: italic; non-English Latin-script: italic
  3. Look up "up", or "down" in Wiktionary... → words-as-words: quoted; non-English Latin-script: italic
There is a further complication. MOS:FOREIGNITALIC also says that non-Latin script text should not be rendered in bold or italic font. Using this example, {{wiktionary/sandbox|_ja|島}}, the template has these options (MOS:WORDSASWORDS is mute on how or if non-Latin-script words should be highlighted):
  1. Look up 島 in Wiktionary...
  2. Look up "島" in Wiktionary...
If this talk page and its archives are any indication, bold font for non-Latin script text has not presented a problem that editors cared to mention so perhaps this last is not an issue that we need to worry about now.
Trappist the monk (talk) 11:45, 5 June 2019 (UTC)[reply]
Bold font for a word or phrase that is the article title (or a redirect to it) is only used at the first appearance of the word/phrase in the article. That's made clear in MOS:BOLD and in MOS:WORDSASWORDS. So it's very unlikely to be a MoS-compliant option for this template, which invariably appears near the foot of the article. I think that if you want to comply with MoS, the only options you have are:
Look up "house" in Wiktionary... → words-as-words: quoted; English Latin-script: non-italic
Look up "Haus" in Wiktionary... → words-as-words: quoted; non-English Latin-script: italic
Look up "家" in Wiktionary... → words-as-words: quoted; non-English non-Latin-script: non-italic
In the German Wikipedia, I'd expect "house" to be italicised and "Haus" not to be italicised, but that's a problem for de-wiki to sort out – although it would be nice if the module automatically recognised the local wiki's language or the user's preferred language for Commons and similar wikis (mw.language.getContentLanguage / mw.getCurrentFrame():preprocess( '{{int:lang}}' ) see findLang function in Module:WikidataIB as an example) and treated it as the non-foreign language. --RexxS (talk) 13:29, 5 June 2019 (UTC)[reply]
No opinion on what type of emphasis to use if any, a common use for this template is on disambiguation pages where it appears at the top of the page (not that this make any difference with regards to the question at hand). olderwiser 14:03, 5 June 2019 (UTC)[reply]
At Template talk:Wiktionary § Module ready for test I wrote: {{lang}} does not italicize English language text (that should probably be fixed so that it's local language that is not italicized)... so that point is already noted. {{Wiktionary}} is not used at wikidata but a variant is used at commons. If ever we revise this template and if ever commons decides to adopt that revision, they will also need to adopt whatever module support we use locally or build their own. There is another variant at de.wiki which, is so markedly dissimilar to the en.wiki template that I doubt they would ever adopt ours.
I admit to some personal bias in that I prefer bold highlighting over quoted highlighting of the terms-to-be-defined – bold, to me, appears cleaner; quotes seem to clutter the text in the side box. I suppose that, in part, the bias arises because, for description-lists (;), MediaWiki applies bold markup to the term-to-be-defined so bold markup in {{wiktionary}} is painting the term-to-be-defined with the same brush.
Trappist the monk (talk) 15:19, 5 June 2019 (UTC)[reply]
Treating English words-as-words different from non-English words-as-words seems bizarre to me, as well as italicizing a foreign word that's presented in quotes. In some of the examples above, non-English words are both bolded and italicized, or both italicized and quoted, while English words are only bolded or italicized or quoted. MOS:WORDSASWORDS doesn't explicitly say that it only applies to English words-as-words, but maybe it does because the only examples presented are English. If it does apply to non-English words, then non-English words-as-words, like English words-as-words, should only be either quoted, italicized, or bolded, not two or more of these. — Eru·tuon 16:00, 5 June 2019 (UTC)[reply]

Editor Johnuniq has mentioned this discussion at Wikipedia talk:Manual of Style/Text formatting § Italics for English words in a side box.

Trappist the monk (talk) 10:52, 8 June 2019 (UTC)[reply]

More than a month and nary a comment at Wikipedia talk:Manual of Style/Text formatting § Italics for English words in a side box. Is this a dead discussion or is there reason to continue?
Trappist the monk (talk) 11:48, 21 July 2019 (UTC)[reply]
Yes, italicize them. This template isn't a magical exception to MOS:WAW, and as the examples at the top of this thread clearly show, removing the italics will easily produce results that are confusing. The idea "MOS restricts italics to foreign words" is simply false. Also, per MOS:FOREIGN, do not italicize non-English material that is not in a Latin-based alphabet (e.g. Cyrillic, Chinese, Arabic).  — SMcCandlish ¢ 😼   — SMcCandlish ¢ 😼  18:03, 2 January 2020 (UTC)[reply]

I want to revisit this issue with a (possibly) simpler request for an all-or-nothing manual opt-out for italics/sloping to use with foreign scripts (like to use here); something like |italic=unset. I know it won't solve cases like the one to the right → , but let's not make the perfect the enemy of the good. Italics+bold in complex scripts with small typsets is hard to read. (......😱) Otherwise, I agree with User:SMcCandlish et al that this template isn't an exception to MOS:WAW. I also caution against basing italicization on language codes. In the case of 南山, the link should direct to the entry as a whole and not any specific language section. But if that's the only way, maybe better than nothing. AjaxSmack  14:45, 11 December 2023 (UTC)[reply]

The language code and italics wrapped around something like 南山 is only about the character string displayed in this page with that markup around it; it has nothing to do with the language or script used in the page that is the link target. As for adding something like |italic=unset, I would expect this to be abused (cf. comments by several parties above that basically resolve to a WP:IDONTLIKEIT opinion). It would be better, I would think, if this relied on language-handling code already written for {{lang}}, etc., and auto-suppressed italics for cases that need it because they are in non-Latin scripts. What we probably need is the ability to do something like {{wiktionary|island|pulau|{{lang|zh|nocat=y|島}}}}, which does not presently work. And maybe it can't for technical reasons. But perhaps something like {{wiktionary|island|pulau|島|lang3=zh}} or even {{wikitionary|island|pulau|zh:島}} (the citation templates already support some colon syntax like that for certain parameters – see Template:Cite web#Title) could be implemented. Lua modules are pretty badass (though I totally suck at making them).  — SMcCandlish ¢ 😼  15:21, 11 December 2023 (UTC)[reply]
Interesting ideas for a solution, but I am unable to implement them and rely on the charity of others. "I would expect this to be abused..." Oh, yeah.  AjaxSmack  16:47, 11 December 2023 (UTC)[reply]
Would require someone with some expertise (lookin' at you Trappist!). I would imagine this could be built into a reusable module that might be of use in a variety of other templates. Imagine, e.g., {{Infobox ancient site|name=ga:Dún Aonghasa|...}} just working automagically. Kinda got a chill up mine spine just then.  — SMcCandlish ¢ 😼  17:16, 11 December 2023 (UTC)[reply]
I have created Module:Sandbox/Trappist the monk/Wiktionary based on this version of a module in Editor Johnuniq's sandbox. My version eschews bold markup as unnecessary and uses the language tag prefix mechanism suggested by Editor SMcCandlish. Compare:
{{wiktionary}} comparisons
markup comments rendering
{{wiktionary|up|down|}} current live template
{{#invoke:Sandbox/Trappist the monk/Wiktionary|main|up|down|}} no language tag prefixes
{{#invoke:Sandbox/Trappist the monk/Wiktionary|main|en:up|down|ja:島}} mixed language prefixes
Floating your mouse pointer over the the linked terms will show the standard {{lang}} tool tips. English terms in italic because MOS:WORDSASWORDS. Terms in non-Latn script are not italicized per MOS:BADITALICS.
Error messaging and categorization are a couple of things that, if this module is adopted, will need improvement.
Trappist the monk (talk) 16:24, 13 December 2023 (UTC)[reply]
It looks beautiful to me. At that font point, wikilinking, italics and bolding is a little much. I'm agnostic about the floating text.  AjaxSmack  16:56, 13 December 2023 (UTC)[reply]
Magic! That's impressive. I'm not sure what the |main is for, though. Minor quibble: Maybe a tooltip of "undetermined-language text" isn't actually useful except in some particular use case in other kinds of templates in which we would expect a language needed to be specified because English would not be the presumptive default. Here, we have an original template mostly used for English terms and having no markup yet for non-English ones, so just having un-marked-up parameter values should arguably not generate a tooltip, since it's not informative.
Now I wonder whether the presence of something like "en:" or "ja:" could also link to the specific-language section of the Wiktionary article. I notice that the base {{Wiktionary}} template doesn't support such specificity at all, but {{Wiktionary pipe|tree#English|tree}} does. That might be superfluous if {{Wiktionary|en:tree}} replicated the other template's |tree#English functionality.  — SMcCandlish ¢ 😼  18:02, 13 December 2023 (UTC)[reply]
|main is the exported function to call when MediaWiki processes {{#invoke:Sandbox/Trappist the monk/Wiktionary}}.
Every term in the call to the module is passed through Module:Lang to get the html markup. If a language tag prefix is not attached to a term, we don't know the term's language so the module assumes a default value of und (undetermined).
The module does link to the appropriate section at Wiktionary. For example,
{{wiktionary}} comparisons
markup comments rendering
{{#invoke:Sandbox/Trappist the monk/Wiktionary|main|pt:alien}} links to wikt:alien#Portuguese
{{#invoke:Sandbox/Trappist the monk/Wiktionary|main|nl:alien}} fails because Module:Wikt-lang/data does not recognize nl (Dutch)
Apparently there are quite a few languages that Module:Wikt-lang/data does not recognize. If we could be sure that Wiktionary's 2-character language tags matched the ISO 639-1 definitions ... But, alas, we cannot. For example, in Module:Wikt-lang/data there is a redirects table where language tags redirect to other tags: bs (Bosnian), cnr (Montenegrin), hr (Croatian), kjv (Kaikavian Literary Language), sr (Serbian) all redirect to sh (Serbo-Croatian). This redirection is not yet accounted for in the sandbox module. Because of this redirection, we cannot assume that the language name available from Module:Lang will match the section headings (anchors) used at Wiktionary. If we adopt this or a similar module for {{wiktionary}}, it will be necessary to do some work on Module:Wikt-lang/data so that, at the least, the basic ISO 639-1 language tags are supported.
Trappist the monk (talk) 19:00, 13 December 2023 (UTC) struck failure comment 01:53, 14 December 2023 (UTC)[reply]
The table redirects these language codes because Wiktionary doesn't use any of the corresponding language names as level-2 headers and instead uses "Serbo-Croatian". See wikt:Wiktionary:Language treatment. Of them, only Kajkavian is currently listed as a variant of a language in wikt:Module:etymology languages/data (possibly not when Module:Wikt-lang/data was created), so they were manually entered and can't be synced from a Wiktionary module. — Eru·tuon 22:13, 17 December 2023 (UTC)[reply]
I have created a supplemental data module that holds language tags and names (~150) that are present in wikt:Module:languages/data/2 but not present in our Module:Wikt-lang/data. Adding support for that data fixed the Dutch-language 'alien' example above.
Another tweak allows the use of an anchor (url fragment) as part of the 'term';
{{wiktionary}} comparisons
markup comments rendering
{{#invoke:Sandbox/Trappist the monk/Wiktionary|main|alien#Catalan}} links to wikt:alien#Catalan
Trappist the monk (talk) 01:53, 14 December 2023 (UTC)[reply]
Does the 20 seconds of silence mean this new iteration is good to go?  AjaxSmack  05:15, 17 December 2023 (UTC)[reply]
I'm not feeling an awkward silence.
There may be problems with the source module that we use to translate language tags to Wiktionary anchors (en#English, fr#French, etc). I have asked about what appears to be missing data at Module talk:Language § missing data? but have not yet received a reply.
Trappist the monk (talk) 20:39, 17 December 2023 (UTC)[reply]
Template:Wiktionary/sandbox now uses Module:Wiktionary. This version has error messaging and categorization. Articles are categorized when Module:Wiktionary detects errors (unrecognized language tags), when {{Wiktionary}} gets its 'term' from the article title, when the term given to {{Wiktionary}} includes an anchor.
See Template:Wiktionary/testcases.
Trappist the monk (talk) 00:26, 19 December 2023 (UTC)[reply]

Edit request 14 June 2023

[edit]

Description of suggested change: I believe the comma at the end of "Wiktionary, the free dictionary." should be removed. I don't know if WP:CAP applies here, but it would be more consistent with picture captions that do not have a comma if they are 1 sentence long. Diff:

Wiktionary, the free dictionary.
+
Wiktionary, the free dictionary

Cocobb8 (💬 talk to me! • ✏️ my contributions) 17:10, 14 June 2023 (UTC)[reply]

Your request seems a bit muddled. You wrote: I believe the comma at the end of "Wiktionary, the free dictionary." should be removed. But then, in your diff you removed the terminal period (full stop); not a comma. If the terminal period is what you want removed, to my mind that is something that should not be done. The {{wiktionary}} rendering is a complete sentence:
Look up canonical in Wiktionary, the free dictionary.
so the terminal dot is appropriate.
Trappist the monk (talk) 01:17, 15 June 2023 (UTC)[reply]
Sorry, I did mean I wanted the period to be removed. And I guess that makes sense! Cocobb8 (💬 talk to me! • ✏️ my contributions) 10:02, 15 June 2023 (UTC)[reply]

Template translation help

[edit]

I want to translate this Wiktionary template into my motherlanguage (in this case, Indonesia), so those who clicked a word (say ratio) will be redirected to ID Wiktionary (a.k.a. Wikikamus), instead of EN Wiktionary. Yes, I just need to change EN in https://en.wiktionary.org into ID instead, but how exactly (after seeing the source code of this template), I have no idea how to do it. I realize this template has been translated to Indonesian, but the link is still redirected to EN Wiktionary, which is what I want to fix. I appreciate if you're willing to help me.

The Winter Lettuce (talk) 07:55, 21 November 2024 (UTC)[reply]