We have moved to a new Sailfish OS Forum. Please start new discussions there.

Tagging Automation on TJC

asked 2015-01-16 22:11:54 +0200

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

updated 2015-06-02 23:43:39 +0200

rdmo gravatar image

Automated tag generation based on words and clauses in questions, comments and answers could encourage content clean-up on together.jolla.com.

All terms would be managed. High-frequency words would filter out using regular expressions. Low frequency content like typographical errors would become more obvious and get revised. Troubling activity or content might stand out more, to then be effectively addressed.

Non-native English users could get notified of potential improvements to the fluency of their writing. It's nicer to get feedback from a machine signalling "hey, perhaps this would be an improvement" than to know that your actions distracted someone from other more valuable parts of their job.

The way I see it, a vital problem-solution dynamism that together.jolla.com encourages would strengthen with such a tool set.

Its essence would be a moderated index of words and concepts with metainformation, statistics, messages and links, accessible to varying degrees by members, more so by moderators, with advisories and reports available internally and externally, privately and publically according to needs and policies.

This could make the site tend more towards seeming like a reference book and a compelling story, than a throng of variously noisy competing interests. At least, I believe this is a way of tackling issues in a fashion that lends strongly to user empowerment and encouragement, yet seems less burdensome on both the paid and the otherwise benevolent participants in this environment.

There may be 77,000 questions and texts to power through. A vitality statistic to show by each listed word or term will be age range, based on the date of the first and last accepted posts in which the text was present.

This is now editable by everyone so please add regular expressions below, not in the comments, because comments do not stay editable long.

Regular expressions, to use to analyse together.jolla.com articles of text by its community:

  • s/tba/ (total word count)
  • s/tba/ (average word count per entry)
  • s/tba/ (word index and word instance index and count)
edit retag flag offensive close delete


3 Answers

Sort by » oldest newest most voted

answered 2015-01-17 14:28:56 +0200

foss4ever gravatar image

updated 2015-01-18 23:50:44 +0200

rdmo gravatar image

No, please don't design or implement anything like this for TJC-forum, it just wouldn't work. The tags are like keywords or annotation metadata which is best to be left for humans to add, edit, modify, and manage.

Also, if you just need to see statistics or reports of tag utilization you don't use regexps or awk for that. This is a web-site/forum not a standalone Linux Bash-script :/

Have you ever thought why e.g. FB, Twitter, or all the millions of blog-sites and forums out there (Wordpress-based or similar) don't generate tags automatically based on message, tweet, status-update, blog-article, etc content..?

edit flag offensive delete publish link more


To be clear, this post is not about replacing tags added and edited by people but rather to add to existing automated tools in use.

Please convert your answer to a comment, since that is what it is.

rdmo ( 2015-01-17 20:54:09 +0200 )edit

Are you sure that your (not very relevant) "proposal" shouldn't be a feature-request and not an "idea"?

foss4ever ( 2015-01-18 04:04:13 +0200 )edit

answered 2015-01-19 01:39:48 +0200

Okw gravatar image

Context analysis and statistics are nice. There's no arguing with that.

I would, however, much rather have all of that design and implementation effort put into SailfishOS than TJC. Afterall, TJC is a small community and the benefits of context analysis etc. would be quite insignificant. Compared to the amount of effort I don't think it's going to be worth it.

Natural language processing isn't just a bunch of regexps. There are hidden Markov models (HMM) and various kinds of word-sense-disambiguation (WSD) algorithms involved.

edit flag offensive delete publish link more


Yes but, as you noted, this is a question about the statistics from regular expressions, not Natural language processing. More visually compelling fancy spell-check/indexer than Markov.

rdmo ( 2015-01-19 12:39:02 +0200 )edit

answered 2015-01-24 18:15:48 +0200

simo gravatar image

Automation, the curse and the need. Yep, if built nicely, it sure would be a great feature - However, it's counter-productive for the community. Why? For first, we enter here with a very different experience on using tags. For some users they've never heard about them, for some they've been actively using them for long. Now, the experiened PEOPLE are helping other PEOPLE in a proper way of using tags, bringing in discussion, arguments, agreements and disagreements, and after all something new to learn for both the experienced and unexperienced users. Isn't this exactly what a community needs to be a people powered?

Yep, automation might reduce the needed time to get something done - but at the same time it might also reduce the values mentioned above.

As an answer, instead of automation, I'd like to see (at max) some automated suggestions for tags, or (maybe better) just an automated reminder "Are you sure about your tagging" IF the bot would notice that the selected tags are missing, or if they seem to be totally unrelated to what's written in the question. Nothing more, let's keep the rest PEOPLE POWERED ;)

edit flag offensive delete publish link more


To effectively label something a labour of love as a selling point or to claim to add community value is almost like selling out by undercutting potential attractiveness to future users and community members. To make an analogy, pictures speak thousands upon thousands of full text academic essays compared to de-inventing the find command or not adding meta-information to a community's content. I suppose Google and Bing have readier (internal) access to meta-information on this subdomain than its own members.

Not to reduce value. Rather, more like to add value by adding varied perspectives.

rdmo ( 2015-03-27 18:45:00 +0200 )edit

And besides, people, are busy.

rdmo ( 2015-04-28 22:26:29 +0200 )edit
Login/Signup to Answer

Question tools

1 follower


Asked: 2015-01-16 22:11:54 +0200

Seen: 436 times

Last updated: Jan 24 '15