Deprecation of encoding attributes for alternate titles

  • 4
  • Announcement
  • Updated 1 month ago
  • (Edited)
We're pleased to announce a simplification to the contribution guidelines for alternate titles.

We recently upgraded our systems so that the following attributes are no longer needed to perform the correct character encoding. We have therefore replaced all these attributes with imdb display title (or alternative title if there was already an imdb display title), and removed the options from the contribution form:
  • ISO-LATIN-2 title
  • Cyrillic KOI8-R title
  • Greek ISO-8859-7 title
  • Turkish ISO-8859-9 title
  • original ISO-LATIN-2 title
  • original Cyrillic KOI8-R title
  • original Turkish ISO-8859-9 title
  • original Greek ISO-8859-7 title

We have also changed the transliterated ISO-LATIN-1 attribute to just "transliterated title" and created a guideline for this attribute here.

As always, we would welcome details of any issues you find, or feedback you have via this thread.

Many thanks!
Mike
Photo of Mike

Mike, Employee

  • 14 Posts
  • 16 Reply Likes

Posted 2 months ago

  • 4
Photo of Owen Rees

Owen Rees

  • 252 Posts
  • 397 Reply Likes
Really good to see this. I have a good idea how much had to be done behind the scenes to make this possible.

Should this be an announcement rather than a question? It looks like it to me.
(Edited)
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Hi, Mike.

Issues concerning Ukrainian titles:

1) A contributor for films (originally in Ukrainian language) is constantly attributing as "alternative transliteration" those titles that are in fact "transliterated title". I'm starting to be tired with this to correct, there are so many... The most recent ones: Human with a Stool (I corrected thi one) and Skvot32 (I left this one for editors to examine deeper);

2) A data-editor manipulates the "imdb display" attribute changing this for some of the alternate titles submitted initially to change only the encoding. Is this same editor going to revise all the Russian titles used to be encoded KOI8-R when this attribute is dropped? I believe not, but why he/she touches the Ukrainian titles attributes this way while they are submitted for encoding correction only? Nonsense! Moreover, the attribute is often switched incorrectly to "alternative" then no localized title appers for those movies that were screened in theaters under such titles. It's a kind of vandalism legalized by a data-editor;

3) The whole process is sabotaged by a data-editor who operates those submissions that autimatically go for a deeper control. I can say, 99% of those not approved in short time then are declined for no reasons. When I explain just "Correction of Latin latters to Ukrainian letters", they decline as "duplicate", but then no change for that title at all, so it was not a "duplicate" in fact. When I explain "not a duplicate", they decline as "no reason" or "can not verify", and so on by circle. Some of the titles (a third part of them) then are approved after second or third re-submission, but not all of them having a correct attribute.

PS: And I believe this same editor does not really cares what he/she does, because I also have declines for release dates submitted during this process, and those dates fixed on official BoxOffice are DECLINED. Someone just having fun on paid basis.
Photo of Owen Rees

Owen Rees

  • 252 Posts
  • 397 Reply Likes
The change Mike describes at the top of this thread looks to me like an automated data conversion following a change to the underlying system that made it possible. I think it is very unlikely that a person has gone through all the alternate titles that had encoding attributes and manually edited them. I just had a look at the alternate title update form for Solyaris (1972) which has several entries that would have had encoding attributes. They are all gone.

As for submissions declined as 'duplicate', I suspect that detection of duplicates is mostly automated and there may be issues that still need to be resolved following the recent changes. Submissions made before this announcement that depend on the changes that made the announcement possible could easily have been caught up in some transitional state where either the data conversion or the changes to the processes that handle the data were not yet complete.

The ability to do things that were previously not possible has not been announced (beyond the Japanese titles announced separately) even though it seems likely that the change to the underlying systems makes them possible, at least to some extent. I do not think it is reasonable to complain that a feature that has not been announced is not working correctly in all cases even if it works in some cases.
 
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
With all my respect, Owen, do you really need the screenshots of all the 1200+ submitted?


PS: those titles changed automatically to "imdb display ttile", they still need the Latin symbols to be corrected. Such changes are also partially sabotaged by a data-editor.
(Edited)
Photo of Owen Rees

Owen Rees

  • 252 Posts
  • 397 Reply Likes
Why submit only that one correction for that title? There were several other alternate titles on just that one film that needed to be fixed and they have had the data converted and the attribute changed.

I had not seen any announcement before today that changes of the kind you submitted were possible but I did suspect that they might be once the Japanese title option had been announced.

Converting all the alternate titles with encoding attributes without shutting down contributions or having alternate titles go missing is an interesting technical problem. I would expect there to be a transitional phase that would last several days during which time some titles have been converted and others not and perhaps involving intermediate representations and data not visible outside the technical team. Some submissions working, others not, and repeats working where earlier attempts failed is what I would expect during the transitional phase. It might have been possible to avoid even that but the only people affected will be those making submissions for an as yet not publicly announced feature so it is not really worth the extra effort.
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
If that's the point, there is nothing to discuss because of no announcement yet: 


I hope sometime all the editors find the "extra time" to learn and understand the changes the system went to, and only then there will be the announcement. Thanks, no more extra time.
Photo of Mike

Mike, Employee

  • 14 Posts
  • 16 Reply Likes
Hi

Many thanks for raising these issues!

> "A contributor for films (originally in Ukrainian language) is constantly attributing as "alternative transliteration" those titles that are in fact "transliterated title". I'm starting to be tired with this to correct, there are so many... The most recent ones: Human with a Stool (I corrected thi one) and Skvot32 (I left this one for editors to examine deeper);"

Agreed, it does seem the (alternative transliteration) attribute is unnecessary in these cases, and that (transliterated title) is more suitable. We will investigate how best to address this and respond in the new year.

> "A data-editor manipulates the "imdb display" attribute changing this for some of the alternate titles submitted initially to change only the encoding. Is this same editor going to revise all the Russian titles used to be encoded KOI8-R when this attribute is dropped? I believe not, but why he/she touches the Ukrainian titles attributes this way while they are submitted for encoding correction only? Nonsense!"
...and > "PS: those titles changed automatically to "imdb display ttile", they still need the Latin symbols to be corrected. Such changes are also partially sabotaged by a data-editor."

Please could you provide one or two examples? We just want to make sure we fully understand the problem here. We agree the Ukrainian titles should be fixed to replace the latin characters with Ukrainian equivalents. We could fix all of these in bulk if that seems like the right approach.

> "Moreover, the attribute is often switched incorrectly to "alternative" then no localized title appers for those movies that were screened in theaters under such titles. It's a kind of vandalism legalized by a data-editor;"

For all the existing alternate titles with attribute (Cyrillic KOI8-R title) and (original Cyrillic KOI8-R title), we changed the attribute to “imdb display title”. If there was already an imdb display title for the same country and language then we used “alternative title” instead. Are you still seeing any issues with these? One or two examples would really help us investigate.

> "The whole process is sabotaged by a data-editor who operates those submissions that autimatically go for a deeper control. I can say, 99% of those not approved in short time then are declined for no reasons. When I explain just "Correction of Latin latters to Ukrainian letters", they decline as "duplicate", but then no change for that title at all, so it was not a "duplicate" in fact. When I explain "not a duplicate", they decline as "no reason" or "can not verify", and so on by circle. Some of the titles (a third part of them) then are approved after second or third re-submission, but not all of them having a correct attribute."

This seems strange. Sorry to ask, but again one or two examples (a submission reference) would really help us investigate.

Again, many thanks for raising these issues. We look forward to resolving them. We also appreciate your patience if responses are delayed until the new year.

Many thanks!
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Hi

Reffering to the above [ 2) A data-editor manipulates the "imdb display" attribute changing this for some of the alternate titles submitted initially to change only the encoding...... ]
I could gain more statistics after Christmas or in 2020 when having more free time. For now, some examples I remember: 

(I) https://contribute.imdb.com/updates?update=tt4574334:akas.correct
"Дивнi дива" (2016) (Ukraine) (alternative title)
as it is now: 

and how it was initially submitted (# 191207-233741-595000, 191208-125934-504000, 191209-115713-134000): 



(II) https://contribute.imdb.com/updates?update=tt1212450:akas.correct
Найп'янкiший округ у свiтi (2012) (Ukraine) (alternative title) 
as it is now: 

and how it was initially submitted (#191213-084016-745000): 


(III) https://contribute.imdb.com/updates?update=tt1216487:akas.correct
Дiвчина, яка грала з вогнем (2009) (Ukraine) (alternative title) 
as it is now: 

and how it was initially submitted (#191213-083648-857000): 


(IV) https://contribute.imdb.com/updates?update=tt1220888:akas.correct
Кримiнальна фiшка вiд Генрi (2011) (Ukraine) (alternative title) 
as it is now: 

and how it was initially submitted (#191213-082923-663000): 


None of the above were duplicates. None of the above were approved, so the titles include the Latin symbols as it was at the time of submissions until I resubmtted them much later (today and couple of days before).
(Edited)
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Reffering to the above [ 3) The whole process is sabotaged by a data-editor who operates those submissions that autimatically go for a deeper control. I can say, 99% of those not approved in short time then are declined for no reasons...... ]
The bold-italic phrase above says it all. I'm not sure what examples are expected from me exactly. Could you please check the track of my contribution history for the period December 3-12, those days I mostly contributed to correct the alternate titles. The majority of those declined were approved after 2nd or 3d resubmissions a few days later.

=======================================

Mike, please initiate the bulk processing for the rest of Ukrainian Cyrillic alternate titles to replace the Latin "i" letters with Ukrainian Cyrillic "і" letters (respecting the caps). 

I manually finished correcting all the complicated cases of substitutions that have being imitated the Ukrainian letters (і, ї, є), such as doubles "ii", "i" followed by apostrophe, Latin "e", Russian "э". The final results were checked, submitted and approved.

At this moment, only three related submissions are pending: 191221-184711-477000, 191221-181136-809000, 191221-172444-563000. After they approved, the rest of titles seem cause no problems after bulk processing.

Thank you

PS: The above mentioned last submissions are approved.
(Edited)
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Mike, there is no need for bulk processing any more. They are done manually. 

I also corrected the "alternative" instead of "imdb display" as well as "alternative transliteration" cases mentioned above (1 and 2). It seems the "alternative transliteration" instances resulted from automatic corrections on those titles contributed initially wrong as English originals while they should be contributed as transliterated Ukrainian titles.
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Hi again

I was not going to address the complex issue so deep here, but the quality of responses from the "Contact us" has drastically dropped down taking much more time and efforts to solve obvious objective problems.

Mike, reffering back to your interest expressed above
This seems strange. Sorry to ask, but again one or two examples (a submission reference) would really help us investigate.
please do something to make some data-editor(s) a bit thinking or more knowledgable to solve the related problems, if this is not a case of real sabotage on someone's end.

Some recent submissions on alternate titles are declined again:
200106-112132-051000  /  200106-180103-617000  /  200107-151503-641000  /  200107-211030-764000  /  200107-215412-969000  /  200107-225632-662000  /  200107-225738-328000  /  200107-225923-359000  /  200107-230128-022000  /  200107-230238-799000  /  200109-124306-001000  /  200109-132505-619000  /  200109-135444-065000  /  200109-160033-028000

Let's take the first one for example. I pretend I'm a data editor who knows nothing about Latin-Cyrillic issue but there is lots of submissions from a contributor explaining one and the same reason. OK, even if I know nothing about the issue, I could be a thinking person and check if a reason exists. At least ONE check; it is not hard at all to just search for the Latin "i" in any modern browser or app:

Voila. As anyone can see, the submitted data is different comparing to the existing in the IMDb. I can hardly believe that a job a person is paid for might be so unmotivated that the person is too lazy to perform such simple action once. It looks more like sabotage, and the specific working hours make me thinking that there is 1(-2) data editor(s) doing this. The same person(s) who had declined all the submissions that are approved later after 2nd and 3d re-submission: 
191203-235046-206000  /  191204-130050-914000  /  191204-162032-703000  /  191204-162524-209000  /  191204-165530-650000  /  191204-170209-715000  /  191204-183208-119000  /  191204-184107-251000  /  191204-184240-950000  /  191204-184328-151000  /  191204-200952-647000  /  191204-201356-150000  /  191204-201758-116000  /  191204-203246-816000  /  191204-203338-203000  /  191204-204452-295000  /  191204-211323-311000  /  191204-212928-248000  /  191204-215212-555000  /  191204-221954-884000  /  191204-222207-526000  /  191204-222328-296000  /  191204-230524-827000  /  191204-230620-460000  /  191204-230911-695000  /  191204-232408-959000  /  191204-232731-842000  /  191205-005039-789000  /  191207-141747-572000  /  191207-170802-822000  /  191207-200903-537000  /  191207-203328-998000  /  191207-203613-860000  /  191207-204716-370000  /  191207-204821-088000  /  191207-223338-723000  /  191207-224502-020000  /  191207-231824-569000  /  191207-231927-048000  /  191207-232239-251000  /  191207-233304-111000  /  191207-233647-532000  /  191207-233741-595000  /  191208-005402-770000  /  191208-005959-942000  /  191208-010258-146000  /  191208-010340-345000  /  191208-010458-041000  /  191208-010548-619000  /  191208-010838-361000  /  191208-010912-263000  /  191208-010953-212000  /  191208-011345-608000  /  191208-011638-226000  /  191208-011721-448000  /  191208-012540-978000  /  191208-100421-671000  /  191208-101535-877000  /  191208-102233-550000  /  191208-103414-713000  /  191208-103555-194000  /  191208-105148-755000  /  191208-115311-676000  /  191208-115432-040000  /  191208-115714-905000  /  191208-124006-815000  /  191208-125934-504000  /  191208-131040-247000  /  191208-132835-889000  /  191208-133702-870000  /  191208-142336-969000  /  191208-142412-020000  /  191208-161607-100000  /  191208-161706-841000  /  191208-162251-134000  /  191208-162323-045000  /  191208-162928-955000  /  191208-194822-781000  /  191208-200736-697000  /  191208-214244-070000  /  191208-214543-087000  /  191208-214720-035000  /  191208-215106-290000  /  191208-221458-321000  /  191208-221720-319000  /  191208-221756-072000  /  191208-221851-519000  /  191208-222434-992000  /  191208-222515-065000  /  191208-222649-604000  /  191208-231523-560000  /  191208-232115-689000  /  191208-232933-291000  /  191208-233018-926000  /  191208-233130-944000  /  191208-233219-690000  /  191208-233352-255000  /  191208-233424-939000  /  191208-233545-139000  /  191208-233827-062000  /  191208-233858-428000  /  191208-233927-262000  /  191208-234353-511000  /  191208-235153-989000  /  191208-235313-651000  /  191208-235530-555000  /  191208-235605-539000  /  191209-001049-858000  /  191209-002516-815000  /  191209-002556-803000  /  191209-093711-815000  /  191209-110154-240000  /  191209-111144-501000  /  191209-111959-961000  /  191209-115713-134000  /  191209-184701-382000  /  191210-113922-316000  /  191210-114402-133000  /  191210-114437-791000  /  191210-114808-258000  /  191210-163246-122000  /  191210-174959-813000  /  191210-180134-698000  /  191210-180831-645000  /  191210-180910-801000  /  191210-181105-314000  /  191210-181140-449000  /  191210-181558-297000  /  191210-181738-918000  /  191210-182649-992000  /  191210-182941-786000  /  191210-183017-912000  /  191210-183634-146000  /  191210-183734-831000  /  191210-183900-373000  /  191210-183949-408000  /  191210-184034-638000  /  191210-184122-406000  /  191210-184200-469000  /  191210-184340-048000  /  191210-184653-827000  /  191210-184729-983000  /  191210-184815-821000  /  191210-185340-218000  /  191210-185514-692000  /  191210-193432-109000  /  191210-193704-438000  /  191210-195050-456000  /  191210-195451-054000  /  191210-195716-883000  /  191210-195847-865000  /  191210-195924-445000  /  191210-200534-309000  /  191210-200649-262000  /  191210-201120-841000  /  191210-201201-930000  /  191210-201519-279000  /  191210-204029-826000  /  191210-223624-618000  /  191210-232810-643000  /  191210-235253-840000  /  191210-235658-624000  /  191212-092253-530000  /  191212-095604-588000  /  191212-095720-494000  /  191212-102149-578000  /  191212-102629-882000  /  191212-102849-317000  /  191212-104402-543000  /  191212-105930-955000  /  191212-110204-214000  /  191212-110258-438000  /  191212-115924-165000  /  191212-121303-172000  /  191212-122936-310000  /  191212-123239-925000  /  191213-082923-663000  /  191213-083648-857000  /  191213-084016-745000  /  191213-085350-450000  /  191215-205345-409000  /  191216-130548-694000  /  191216-222545-896000  /  200101-184402-247000  /  200102-110429-242000  /  200102-221842-034000  /  200105-155108-478000  /  200105-201245-352000  /  200105-204443-266000  /  200105-210156-435000

If IMDb owners and officials would not care for quality of the job done and the staff they hire and pay for, then contributors invest their time and efforts for nothing.
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
Issue concerning 2 Baltic letters (used to be ISO-LATIN-2): 
Šš Žž (codes U+0160, 0161, 017D, 017E)

They are OK to input and then you see them colored green after "Check this update", but in fact those automatically turn to the non-accented Ss Zz. All the rest seems working flawless for Baltic languages.
Photo of Jonny

Jonny, Employee

  • 7 Posts
  • 17 Reply Likes
Hi,

Thank you for the report, this has now been fixed.

Thanks,
Jonny
Photo of MAthePA

MAthePA

  • 2078 Posts
  • 3519 Reply Likes
I confirm the whole characters set for Baltic languages works now.
Thanks