denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)
Denise ([staff profile] denise) wrote in [site community profile] dw_biz2012-06-19 09:23 pm

RFC: Specifying languages in profiles: how should we do it?

This entry is being posted on behalf of the programmer who is working on a bug that sprang from a suggestion to make it easy for people to find other people blogging in languages they speak. (They wrote it; I'm just posting it! I don't want to take credit for any of this.) We're looking for some thoughts on our current ideas and want to make sure we aren't missing something super obvious :)

Turning it over now:



Way way back in 2009, it was suggested that we create a way of stating which languages a journal uses. There's obvious advantages to this: it'll make it easier to find non-anglophone areas of Dreamwidth for those as want to, as a first or second or third etc language. This is a feature we're going to implement: what we want to know is how. (For those of you who are interested, there's a fair bit of discussion going on in the Bugzilla comments, but it's all repeated here.)

This post has been separated out into the separate areas that need refinement. In each section we outline our current thoughts: we'd love it if you gave us feedback on them, and we'd love it even more if you came up with an obvious better solution we've not thought of yet. :-)

The areas in question are:
  1. Languages field: usage
  2. Language entry options
  3. What does our standardised list look like?
  4. How do we choose the languages on our standardised list?
  5. How do we organise the list?
  6. Any Other Business


1. Languages field: usage


The original suggestion was for a field allowing people to state which languages they update in. Users are quite likely to enter languages they read in any such field: we need to think about how to handle this.

Options we can think of are:
  1. word the legend very carefully, and accept that the field will sometimes be used inaccurately
  2. provide separate "writes in" and "reads" boxes under an overall "Languages" heading
  3. if we go with 2, optionally include a "reads is the same as writes" (or "writes is the same as reads") tickybox, to reduce work for the user


2. Language entry options


Users will want to enter more than one language; therefore, this needs to be possible.

Ideally we would provide a standardised list so that we did not end up with confusion between "French", "french", "francais", "français" etc all pointing to different places, as they currently do if listed in interests.

However, this means we need to think about how to present this list. Some ideas that have been tossed around so far:
  1. provide check-boxes, possibly making the languages section collapsible. Advantage: easiest to select multiple options. Disadvantage: takes up a lot of use on the Manage Profile page, and will probably be edited much less frequently than e.g. interests or bio.
  2. provide a single drop-down, with the option to "add another" to produce another drop-down.
  3. provide a free-text box that behaves like the "Tags" field in the Create Entries page: it is possible to type in anything that you like, but you will be given suggestions from a standardised list and you'll be able to click "browse" to be shown the full list with check-boxes.
  4. implement one of (1) or (2), and provide a link reading something like "your language not listed?", which will reveal a free-text box for languages not on our standardised list.


Combinations of the above are naturally also possible, and we're very open to better ideas!

3. What does our standardised list look like?


As discussed above, we would like a standardised list. This gives us two problems: what our standard should be.

In comments in [site community profile] dw_suggestions, it was suggested that we use BCP-47 international language tags, wherein, for instance, "en-GB" is British English and "ta-SG" is Singaporean Tamil. It's also possible to specify scripts, allowing people to distinguish whether they're writing ru-Latn (Russian in Latin script) or ru-Cyrl (Russian in Cyrillic script).

Using only the BCP-47 tags is somewhat opaque, but they do allow a way to be very specific -- while also potentially allowing translation.

One possible model would be associating the BCP-47 tag names and the language names or descriptions in a more readable form, i.e. en would be associated with the string "English". (The ideal would be to make these associations such that if translations of Dreamwidth occur, it would be trivial to generate a file of language names in the target language while preserving the associations, e.g. "Anglais" would automatically map to en.)

We also need to consider whether it would be useful to allow people to restrict their searches further. For example, a user searching for "de" (German) would have all results returned, including de-AT (Austrian German), de-CH (Swiss German), de-DE (standard German), de-1996 (German pre-spelling-reform), etc - but do we want to allow these subtags, and therefore allow people to narrow their searches for only people writing in Swiss German?

4. How do we choose the languages on our standardised list?


Of course, the biggie - and the one we'd most like input on - is how to choose the "seed list" languages in the first place. This is the area where - we think - we're most likely to muck up and (at best) create an unhelpful list, so it's where we'd most like your input.

Methods currently under consideration (as ever, please suggest more):
  1. Grab the top 15 countries from Dreamwidth's usage statistics, and use their official languages as the seed list. Short and sweet. Possibly too short; privileges English over native, regional, and indigenous languages in countries that were colonised (New Zealand is the notable exception to this). Would result in ~25 languages on the list.
  2. Grab the top 15 countries from Dreamwidth's usage statistics, and use their official languages and their recognised regional languages as the seed list. This would mean that for, say, the UK, in addition to English there would also be the options Irish, Scottish Gaelic, Ulster Scots, Welsh, Cornish, etc. This list would be much, much longer (probably 50-150 languages on the list).
  3. hold a poll in a [site community profile] dw_news post, and populate the seed list with any language that gets more than n votes (in which case, what should n be?)
  4. what kind of suggestions system should we have for adding new languages to the standard list?
  5. some combination of the above with the top 15 global languages, which, while it adds length, has the advantage that we're not privileging English quite so ridiculously.


5. How do we organise the list?


  • by language tag? i.e. en-GB, en-US, en-CA..?
  • by language name? i.e. English, French, Japanese, Tagalog...?
  • by both? i.e. English (en) --> en-GB/British English, en-CA/Canadian English; French (fr) --> fr-CA/Canadian French, fr-FR...?
  • by country? i.e. Canada: English, French; Singapore: English, Tamil, Chinese, Malay?


6. Any Other Business?


We're sure there's things we're forgetting - these questions are only what's come out of two people thinking about this on-and-off for two days, and more brains is better brains! Please, please let us know what we're missing, and let us know what you think the correct course of action among the options listed above should be.

spoken vs. updating languages

[personal profile] tamouse 2012-06-20 10:39 am (UTC)(link)

  1. As I read the original suggestion, this:

    Have a field in which you select the language/s you speak; people speaking the same language can find each other more easily.


    leads me to a different conclusion than this:

    The original suggestion was for a field allowing people to state which languages they update in. Users are quite likely to enter languages they read in any such field: we need to think about how to handle this.


    as it seems the OP was hoping to establish connections, not just with people who update in a particular language, but also speak a language.

    Morphing it into a set of languages one updates is not necessarily a bad thing, but as others pointed out, languages spoken can definitely be an identity-level thing.


  2. It might further aid the OP to have a way of setting what language a post is written in. This might be considered a future enhancement.


  3. I like the idea of presenting the list as an unfolding set of checkboxes; keep it hidden at first, then unfold the more commonly used languages, then further expand to all languages; these should be clearly marked (emulating the cut tag feature would make for a consistent interface across the site).


  4. In addition to the checkbox list, a write-in space should be allowed, as I can see people posting in conlangs, possibly (toki-pona springs to mind as something potential journalists might use). This might be a future enhancement as well.



Re: spoken vs. updating languages (aside, OT)

[personal profile] tamouse 2012-06-20 10:41 am (UTC)(link)
I just noticed the odd way it formatted my ordered list of points there... is it supposed to do that?
pne: A picture of a plush toy, halfway between a duck and a platypus, with a green body and a yellow bill and feet. (Default)

Re: spoken vs. updating languages (aside, OT)

[personal profile] pne 2012-06-20 11:34 am (UTC)(link)
If you mean the extra spaces between the blocks, you might want to choose "More Options" and then "Don't auto-format".

Comments automatically turn carriage returns into line breaks.
sophie: A cartoon-like representation of a girl standing on a hill, with brown hair, blue eyes, a flowery top, and blue skirt. ☀ (Default)

Re: spoken vs. updating languages (aside, OT)

[personal profile] sophie 2012-06-20 11:38 am (UTC)(link)
When Dreamwidth auto-formats your comment, it automatically adds a linebreak for every newline, which happens even when you use HTML tags.

You can disable it for a comment by clicking More Options and ticking the "Don't auto-format" box. Your comment is then interpreted without inserting newlines automatically (and URLs won't be automatically turned into links).

My preferred system, though, is to delete the newlines manually after doing the text.
Edited 2012-06-20 11:39 (UTC)

Re: spoken vs. updating languages (aside, OT)

[personal profile] tamouse 2012-06-20 11:56 pm (UTC)(link)
Ah, okay, that's good to know. I assumed it only did that when there were two consecutive line breaks. The other oddity was the numbering: 01., 02., etc -- the leading zero was initially what struck me as odd.
sophie: A cartoon-like representation of a girl standing on a hill, with brown hair, blue eyes, a flowery top, and blue skirt. ☀ (Default)

Re: spoken vs. updating languages (aside, OT)

[personal profile] sophie 2012-06-21 12:35 am (UTC)(link)
It's not doing that on my browser. Not sure why your browser might be doing it!

Re: spoken vs. updating languages (aside, OT)

[personal profile] tamouse 2012-06-21 01:17 pm (UTC)(link)
Somewhere in the theme hierarchy, this is set:
ol {
    list-style-type: decimal-leading-zero;
}

(Line 557 on the streamlined CSS that is emitted.)
I added a snippet of custom CSS that put it back to decimal.
sophie: A cartoon-like representation of a girl standing on a hill, with brown hair, blue eyes, a flowery top, and blue skirt. ☀ (Default)

Re: spoken vs. updating languages (aside, OT)

[personal profile] sophie 2012-06-21 04:45 pm (UTC)(link)
Ah! I hadn't realised you were viewing this in your style. That makes sense!

Glad you got it sorted :D
kaberett: Overlaid Mars & Venus symbols, with Swiss Army knife tools at other positions around the central circle. (Default)

Re: spoken vs. updating languages

[personal profile] kaberett 2012-06-20 11:10 am (UTC)(link)
1. I read it the say I did because of the title of the suggestion, "Specify blogging language in user profile." I'm well aware that languages spoken can be an identity-level thing; this is why I'm so keen for us to have "languages read" as well as "languages written in" (though I'm very open to having the precise wording of that changed!).

2. Mmm. I don't think that's something I'm going to try to shoehorn into this, but will stick it on my List Of Spec/talk to Fu about it/open a bug for it once I've got the first bit done.