Lists: Sorting list by title but ignoring articles

  • 11
  • Idea
  • Updated 4 years ago
  • (Edited)
Is there any way to sort lists by title but have it ignore articles like "the", "a", etc? A lot of movies begin with "the" and it's annoying having them all listed in the T's. I think MyMovies used to do this by default but it does not seem be implemented in the Lists. :-(If this is not presently possible can it be added to the wishlist?
Photo of Jim Beam

Jim Beam

  • 2 Posts
  • 0 Reply Likes
  • sad

Posted 7 years ago

  • 11
Photo of Mark

Mark

  • 14 Posts
  • 2 Reply Likes
I agree and have just added my agreement to your comment/question.

After all the years of database sorting your suggestion should be the default.
Photo of Emperor

Emperor, Champion

  • 6418 Posts
  • 3002 Reply Likes
It is a good idea. I'd say post it as a suggestion (if it is needed - this might be good enough).
Photo of Giancarlo Cairella

Giancarlo Cairella, Official Rep

  • 1107 Posts
  • 1038 Reply Likes
Just to provide some historical perspective/context: until a few years ago, titles used to be stored internally with the article appended at the end, e.g.

Blues Brothers, The (1980)
Dolce vita, La (1960)
Boot, Das (1981)
Mariachi, El (1992)
Clockwork Orange, A (1971)
etc.

This was better when generating/displaying alphabetized lists (as it eliminated the inconvenience you are describing) but it was not devoid of complications.

For display purposes. the title needed to be converted to the more readable format (e.g. The Blues Brothers, Das Boot, La dolce vita). This made sorting better in many cases, but presented its own set of challenges -- the article needed to be put back at the beginning when displaying the title, but different languages have different rules for that (e.g. French titles beginning with indefinite article needed to alphabetized under the letter U, but those beginning with the definite article needed to be alphabetized by the letter of the first following noun); capitalization rules also vary, creating all sorts of complications (when converting an Italian or Spanish title you'd have to change the first letter of the title to lowercase when prefixing an article, but you need to keep it uppercase for English and German).

Also, detecting which titles had an article in need to be moved to the front versus those which didn't was a challenge in itself, requiring the need for built-in exceptions (otherwise a title like "Die, Mommie, Die" would be rendered as "Die Die, Mommie" unless the system was made aware that 'Die' here was not a German definite article)

Last but not least, a non-trivial amount of users objected to this format and complained about the sort order being inaccurate because of titles 'La dolce vita' appearing under 'D' and not 'L' where they wanted/expected it to be.

To be honest, I don't remember exactly when we switched to the current format (possibly around 2005). But I do know that apart from the 'alphabetical list' inconvenience of having many titles grouped under the letter T for 'The' etc. (which is not necessarily a problem for some users, as explained above) it did not generate significant complaints or feedback.

I'm not saying that the current system is necessarily better, although it did simplify the way we approached certain technical choices; just pointing out there is no clear-cut solution and that each approach has its shortcomings. Ideally of course we would be able to store both versions of a title (one for sorting, one for display) but adding/backfilling all that information that would be a non-trivial task.
Photo of Peter

Peter, Champion

  • 6130 Posts
  • 7226 Reply Likes
The switch happened in April 2009.
Photo of DavidAH_Ca

DavidAH_Ca, Champion

  • 3261 Posts
  • 2917 Reply Likes
I recall a number of complaints when the switch was made - almost certain more than the 'non-trivial' number who complained about the correct sort.

I have mentioned several times that a better method than moving the leading article or storing two different versions of the title is storing a number that indicates how many characters should be skipped when creating the sort key. (This system is used by libraries in North America -- the specific version is in the MARC 21 system.)

Most titles would have a code of 0, and creating the corrections for the English, French, & German articles should be easy to do automatically (although with a manual check for 'Die' and 'Les')

This system would also allow titles like *batteries not included to be moved to the B's instead of being at the beginning of the list.

I think IMDb should seriously consider creating a system like this.
Photo of Emperor

Emperor, Champion

  • 6418 Posts
  • 3002 Reply Likes
Indeed. You'd could make the manual check less laborious by keeping it language specific, "die" would only be flagged up for German language titles and would avoid all the Die Hard and Die X Die false positives. Oddities like Les Miserables could also be decided upon manually - would most English speakers expect it to be under L? Quite possibly. Whereas would most people expect "Les liaisons dangereuses" to be sorted under Li?

I'd also add an option to allow people to opt out of the definite article ignoring sorting. You can guarantee someone will moan about this, even though I'd imagine most people would appreciate it.
Photo of Lady Aleena

Lady Aleena

  • 16 Posts
  • 7 Reply Likes
DavidAH_Ca: I always include the * as part of sorting titles and putting *Batteries Not Included at the top, though A, An, and The are always going to be the bane of my sorting my lists because they are not ignored while sorting.
Photo of DavidAH_Ca

DavidAH_Ca, Champion

  • 3261 Posts
  • 2917 Reply Likes
Lady Aleena:

That is really a matter of personal preference: I prefer to see *batteries not included with the B's because when I say or think the title I think "batteries not included" not "asterisk batteries not included", so in my database I sort starting with the B. You might well prefer to sort on the asterisk.

Perhaps a better example is And God Created Woman. There are two versions of this with two different titles :
the 1956 version is : ...And God Created Woman but
the 1988 version is : And God Created Woman.

In a straight sort of the titles in my database, the first title is listed right at the top (9th) while the 1988 version is well down into the A's (766th). Personally I would prefer to have them show up together, as I see the ellipsis as no more important than the article, so I have set my system to ignore it. This places the titles together (575th & 576th) which makes sense to me.
Photo of Emperor

Emperor, Champion

  • 6418 Posts
  • 3002 Reply Likes
I must admit, I assumed you'd kept the original sorting, so you could sort against that and display the other. You could have even offered people the option of sorting with or without the article.

The backfilling shouldn't be that difficult - you could hand-fill out some of the weird exceptions (as you mention, those that start with "Die") and then run some bespoke SQL to process the various titles that start with the various articles.
Photo of Alex Seaver

Alex Seaver

  • 4 Posts
  • 0 Reply Likes
This reply was created from a merged topic originally titled
Dont organize movies by the "The" in the title for our lists..
Photo of Unknown

Unknown

  • 3 Posts
  • 1 Reply Like
The longer this isn't corrected the more of a 'non-trivial' task it becomes.

Possible to enlist the help of users?

Articles being included in sorting is absolutely dreadful. 
Photo of Mark

Mark

  • 14 Posts
  • 2 Reply Likes
I suggest that IMBd contact Collectorz.com, specifically Alwin Hoogerdijk and see if they'll sell you the solution. They have solved all the IMBd issues that I've ever read about. Of course, they are a paid service specializing in books, movies, music, comics and games with a great feature set. Problems are usually corrected within a couple of days. I switched to their products almost a year ago after not getting any response to the article issue here.
Photo of DavidAH_Ca

DavidAH_Ca, Champion

  • 3261 Posts
  • 2917 Reply Likes
The problem isn't so much finding a solution as implementing it.

In the various solutions a new field is needed for both Titles and Alternate Titles (either the number in my suggestion or a 'sort title' in others). Then there needs to be a way of updating that field which means updating the Update system. Also the sorts would all have to be changed to the new version.

IMDb is a unique database system; there is no way that they could just purchase and drop in a sub-system from another site.
Photo of Unknown

Unknown

  • 3 Posts
  • 1 Reply Like
So, how do we start?
Photo of Craig Lish

Craig Lish

  • 1 Post
  • 1 Reply Like
Either IMDB backfilled to combine the article with the title in the db's title field or it is still separate in the db and they are combining it before it gets sent to the client. If the latter is the case then simply separate it in the presentation bean and combine it in the client, at which point offering the option of prepending or appending would be easy enough. If they actually modified the title field in the db then they would be left with the decision of which articles to move and which to leave alone. Giving an option to leave it as is (default) or move the article to the end would put the choice in the user's hands and we'd get what we asked for.

I don't sort A-Z because of the clump at "The ...", which I think is ridiculous.
Photo of Lady Aleena

Lady Aleena

  • 16 Posts
  • 7 Reply Likes
I am a very amateur programmer, but here is the regex I use in my perl sort subroutine.
s/\s*\b(A|a|An|an|The|the)(_|\s)//xi; # Strip off leading articles (in English).
I'm sure your programmers could take into account the title language and sort accordingly.
(Edited)
Photo of DavidAH_Ca

DavidAH_Ca, Champion

  • 3261 Posts
  • 2917 Reply Likes
That works well for English only, but when I changed my database to include an ignore field, I checked out the possibilities. I found a list that included 44 languages and 138 possible articles (not counting the multiple possible options for the Arabic' 'al-'). And these give rise to 225 valid combinations.

Admittedly there are some languages that are less likely to be in IMDb Titles (for instance Tagalog) but even removing them there would still be a significant number of valid combinations that would need to be handled.

MARC was set up to handle book titles for libraries, so it had to handle any language even when it started in the States, but even moreso now that it is an international standard. That system encodes the number of characters to be skipped as part of the first byte of the Title, but the concept of recording this ignore count works equally well with a separate field.

I just think that with a concept that works so well for handling the sorting of titles, IMDb should consider adopting it. I do not think that the current mis-sorting is a good  or even a reasonable option.

P.S. Someone above mentioned automating by language; however, this is not without its problems. For example, quite often German AKAs consist of the original English title followed by the German. As an example, The Final Cut (2004) has the AKA  German (DVD title) set to The final Cut - Dein Tod ist erst der Anfang and Portugal set to The Final Cut - A Última Memória, so both of them need to be handled like English titles (I assume) in spite of the language
(Edited)