Advanced Name Search: Please Help

  • 1
  • Question
  • Updated 5 years ago
  • Answered
Dear Champions / Staff. I would like to ask you two questions concerning the 'Advanced Name Search' function:

A) Is it possible to specify one's search? For instance, if one is looking specifically for persons that have one or more directing credits to their name? I have tried all manner of criteria but can't seem to get the desired result.

B) The 'Advanced Name Search' - Sort by: A-Z function lists on a first name basis. Is there a way to change this listing to list family names in alphabetical order? This would make research and compiling lists infinitely easier.

Your help or suggestions in regards to this will be much appreciated.
Photo of Faust

Faust

  • 16 Posts
  • 5 Reply Likes

Posted 5 years ago

  • 1
Photo of Dan Dassow

Dan Dassow, Champion

  • 14381 Posts
  • 15201 Reply Likes
I vaguely remember that there is a way to find people with at least one directing credit. If I remember how, I will post it here.

Currently titles and names are sorted strictly lexicographically. It would be nice to sort titles alphabetically. If IMDb only had titles and names in English, it would be relatively simple to implement an algorithm to sort titles and names in alphabetical order. Since IMDb contains titles and names in many different languages it become complicated to sort titles and names alphabetically. One method would to maintain a sort key for each title and name, which would be used for alphabetical sorts.
Photo of DavidAH_Ca

DavidAH_Ca, Champion

  • 3261 Posts
  • 2917 Reply Likes
Unfortunately IMDb does store titles in a manner that pretty much forces lexicographic sorting,

There are problems storing titles with the leading article at the end in a multilingual situation: how is a piece of software supposed to know if Satana, Die should be left as is or has the German article and should be listed as Die Sanata?

One site about alphabetizing titles lists 138 'articles' that should be ignored (and that count notes that al- is meant to cover all of the spellings in romanization [e.g., "as" in "as-sijill"]) depending on the language.

Libraries (at least in North America) use the MARC Standard which includes a half-bye that indicates the number of characters at the beginning to be skipped when sorting. I really think IMDb needs to implement something similar.

Names, on the other hand, are stored in Surname, Givenname format specifically so that they can be sorted alphabetically, so there is no excuse not to order them this way. (Admittedly, they won't be able to create the language-specific order for names with diacritics.)
Photo of Faust

Faust

  • 16 Posts
  • 5 Reply Likes
Hi Dan Dassow

Thanks for your reply. As you've mentioned; If you could find a way to find people with at least one directing credit I would really appreciate it, thank-you. Momentarily finding directors names is a long process of viewing titles, taking the directors names and listing them under the applicable letter of the alphabet, and then listing them, per letter, alphabetically.
Or, using external sites then linking the name back to IMDb for confirmation and including if applicable.
Your 'sort key' may prove to be a good idea.

Hi DavidAH

Likewise, thanks for your response. My questions above are specifically aimed at peoples names, so thanks for addressing the 'Surname' issue. I feel this may prove to be less of a problem than listing titles which are more inconsistent, as compared to surnames. Titles alternate radically between countries and are further complicated by alternative titles, even in a specific language or country. Not to mention symbols and numbers that complicate issues even more. As far as diacritics in surnames is concerned, can one not have an algorithm that bypasses the diacritic and views the letter as is? And as far as extensive surnames are concerned, IMDb already has this information in their database, so new additions should not be a problem for that algorithm.

Anyway, many thanks Dan & David for your help. Hopefully a system can be implemented in the near future, to solve this issue.

With kindest regards
Dr-Faustus
(Edited)
Photo of White Phoenix

White Phoenix

  • 7 Posts
  • 1 Reply Like
Considering that there are tons of productivity and office suites (which include databases) that can sort names and titles (including foreign names and titles), there really is no excuse why this cannot be done on IMDb. At the very least, they should do it for the paid pro version. All languages have rules for alphabetizing names and titles. The programmer would simply follow those rules. Works with multiple titles would have all titles listed and could be noted when they are an alternate title.
Photo of Faust

Faust

  • 16 Posts
  • 5 Reply Likes
Hi White Phoenix

Thanks for the information. I am not very knowledgeable about the inner workings of such a system but, I have done some research on the internet and as you've mentioned such a system can be implemented. But I guess, at the end of the day, IMDb staff will have to decide if it is viable to implement such changes.
Hopefully they will take it under advisement?
Photo of ljdoncel

ljdoncel, Champion

  • 850 Posts
  • 1808 Reply Likes
Hi, Faust:

I know this is not an ideal solution but, by using the plain text file directors.list (version 2014.12.12 21:57) from the repository, I've created an Excel file with all the people with directing credits in IMDb. Note that I've kept the format surname,givenname in which IMDb stores names internally (I could switch to givenname,surname if you liked to) and items are classified according to the type of title. Also, credited (Cred.) vs uncredited (Uncr.) items are shown in every category.

Here's a preview of the listing (9018 pages!!! Enfermo):


And here's a summary of the results:
Suspended directing credits (excluded from analysis): 8,913
TOTAL NAMES: 369,702
TOTAL DIRECTING CREDITS: 2,098,985
(8,494 uncredited)
- - General films: 782,029 (3,324 uncredited)
- - Made for video (V): 116,233 (760 uncredited)
- - Made for TV (TV): 100,915 (255 uncredited)
- - Videogames (VG): 5,613 (21 uncredited)
- - Series: 1,094,195 (4,134 uncredited)
- - - - - Series level: 74,089 (82 uncredited)
- - - - - Episode level: 1,020,106 (4,052 uncredited)

I'll be happy to publish here a link for downloading the file from my Dropbox account (39,8 Mbytes). However, since IMDb is the legal proprietary of the content of the list, I'd like an IMDb staffer to give me permission to do it Lee.


In the meantime, here are some "Did you know?" facts about directing credits:
(Edited)
Photo of Faust

Faust

  • 16 Posts
  • 5 Reply Likes
Hi ljdoncel

It's Official! You're the best. Thank-you very much for the spreadsheet. Absolutely, It will come to great use. Not to mention the amount of time it will save me.
The Terms & Conditions of Use will be respected. Any information, in any form, will be used exclusively on IMDb.

Thanks for your continued help & support, it is appreciated.
Cheers!
Dr-Faustus
(Edited)
Photo of Dan Dassow

Dan Dassow, Champion

  • 14381 Posts
  • 15199 Reply Likes
ljdoncel,

Again great work. This is yet another example of why we need to have IMDb designate you as a Champion.

Since you were kind enough to post the data, here are plots of the distribution of the data that Dr-Faustus and you may find interesting. It's not surprising that the plots are Exponential distributions.









(Edited)
Photo of (closed account)

(closed account)

  • 379 Posts
  • 430 Reply Likes
ljdoncel mentioned this T&C page:
http://www.imdb.com/conditions

Was this Help page also mentioned?
http://www.imdb.com/help/show_leaf?usedatasoftware

Excerpt: > "... The data can only be used for personal and non-commercial use ...."   [Emphasis in original.]

I'm not a lawyer, nor any kind of expert.
(I am confused, though. ;-))

Does "personal" use allow for sharing?  Does the next text answer that?

Excerpt continued: > "... and must not be altered/republished/resold/repurposed to create any kind of online/offline database of movie information (except for individual personal use). ..."   [Emphasis in original.]

Would that mean that I get the data for "individual personal" use only, but yet would it be possible to alter or derive and then republish on a public link so that others can make "individual personal use" of it (and could they in turn share likewise)?

My confusion is circular.  On my first reading, I had thought "individual personal use" might preclude public sharing?  No?

Another bit from the aforementioned Help page:

Excerpt: > "Please refer to the copyright/license information enclosed in each file for further instructions and limitations on allowed usage."

The Directors List file's header includes this:

Excerpt: > "... 2. Each of the database files may  be  distributed individually but only in an unaltered form. [...] 3. ... the files may NOT be used to construct any kind of on-line database (except for individual personal use). Clearance for ALL such on-line data resources  must be requested ...."

"Clearance ... must be requested," and so, clearance was duly requested.  We've not yet seen a reply?  If there's no reply, would that constitute permission?  (Is that really how IMDb grants permission; by not responding to the request?) ;-)
(Edited)
Photo of Faust

Faust

  • 16 Posts
  • 5 Reply Likes
Hi Dan Dassow

Thanks for supplying the above data. It is of interest, and calls for some homework on my part for a better understanding of it's workings.

Hi Lucas Anon

I understand your concern regarding the use of 'The Directors List' file. If I may explain; Data will only be used, as stated; for personal and non-commercial use. No online/offline database is to be created. Information will be used solely to create a list of a number of directors that are contained within the pages of IMDb, and made visible to the public through the 'User Lists' feature. The information, to my understanding, remains the property of IMDb. 'The Directors List' file, in other words, will solely be used to gather information that is already readily available on IMDb, and put in the context of a public list on IMDb, with no alterations to any given information whatsoever.

I am sure what I am doing is not in violation of 'Terms and Conditions of Use'. Yet, I would not like my actions to be perceived as going over IMDb's head. And especially, maybe doing damage to ljdoncel's reputation.

Maybe it would be best to wait for clearance from an IMDb staff member? Any advice would be appreciated in this regard.
Yours sincerely
Dr-Faustus
Photo of ljdoncel

ljdoncel, Champion

  • 850 Posts
  • 1808 Reply Likes
@Dan:

Thank you very much for those kind compliments that I don't deserve. The plots are great and should be seen by those who think that maths has nothing to do with real life. I'm sure you'll find interesting the following graph, made with the data I posted a couple of months ago (Benford rules!):



@Lucus (nice to see you're still around here! AgradableAgradable):

You're right that perhaps I've been too hasty in publishing the link and that I should have waited a reply from the staff before doing it (administrative silence means "no"; and, just in case, I've removed the file until further instructions). However, though I linked the wrong T&C (Doh), these Faust's words:
The Terms & Conditions of Use will be respected
and, specially...
Any information, in any form, will be used exclusively on IMDb..
convinced me that "his intentions were good" (Angel), as he has confirmed later.


After many years of contributing to IMDb I have the feeling that the amount of data is increasing greatly but the consistency has suffered a little, and there's a lack of tools to detect those flaws easily. Whenever I use the plain text files from the repository, my only intention is providing the rest of well-meaning contributors an easy way of detecting possible errors in our beloved database in order to fix them and achieve IMDb's goal of being the world's most trusted and authoritative source of movie, TV and celebrity content. Growing is a great thing (IMDb stats are impressive thanks to people like us all), but, in my humble opinion, trust is even more important.

Here are some examples of what I mean:

1) Finding series without end date (related post in CHB): Marco (a german long-time contributor) asked for a listing of old series that probably had finished years ago but that had no end date in IMDb. I made an Excel file I shared in the thread (that time I waited for the boss to give me permission Hail).


2) Describing the use of quotes around character names (related post in CHB): Before discussing a new policy it is important to set up the actual scenario first.

3) Assessing the consistency of labeling a title as 'complete' (related post in CHB): I analysed all the titles in IMDb marked as being 'complete' or 'complete and verified' and made a document (1,164 pages Enfermo) detailing possible flaws.


4) Finding actors with errors in their birth/death date: About 15 months ago I contacted privately with the lovely Luvs. She explained in this post how useful would be to have a spreadsheet or similar with birthdates and death dates in her research to identify mistakes and to complete missing data. I sent her an Excel file (with over 550,000 names) and two PDFs (6,414 and 6,489 pages respectively - names sorted by age or alphabetical).

(I'd never trust any site claiming that Bahadur Shah Zafar died at the age of 187 or that Raquel Bollo, the woman who right now is talking in the Spanish show where she works, is 224 years old - unless... "when there's no room left in hell..." Nia exorcista).

5) Detecting suspiciously inappropiate language in titles (related post in CHB).

6) Detecting discrepancies between the title's country of origin and the production companies's country: This is a project I'm thinking about. A recent post in GS made me wonder how common are the situations where a film is credited as being from one country but none of their listed production companies are from that country (or viceversa). This could help to identify titles with missing production companies or, like in Blue Is the Warmest Color, titles with countries that shouldn't been listed. By now, I've only made a descriptive table with the countries of origin, interesting to know how many films (I've exclude series and episodes) have been made in every country in the world and, in the case of co-productions, which countries have taken part (see previews below)


The complete table has been calculated with an Excel sheet (47,6 Mbytes) that contains over 60,000 formulae and occupies 6 Mbytes in a compressed-to-the-maximum (=poorest quality) JPG file and, for a bare minimum of legibility, the image has 7727x5554 pixels (aprox. 2,73x1,96 meters @72ppp). This is how it looks when I open it with Windows Image Viewer:



Maybe I'm a bit naive but I don't see how someone could make a "bad use" of the data contained in those files. I work as a cardiologist in a Spanish hospital and, luckily, watching films and contributing to IMDb in my spare time is just a hobby, so I'd never see a commercial value. On the contrary, retrieving information from the text files, compiling it, sorting it and formatting it, is often a hellish work that requires lots of effort and time and, unless I'm mistaken, I doubt that the internal tools of IMDb are ready to get those listings in an easy manner so, in the end, IMDb might be the first to be interested in them.

It's true that the files can also be used to find out curiosities or "Did you know?" facts as the ones I posted about directing credits, but that's just an amusing "side effect". How is it possible for someone to direct almost 750 porn films in 20 years?? That's more than 3 porn films every month!! BeatnikBeatnikBeatnik

Oh, that's the beauty of IMDb!! All in one place... And that's why we must take care of it...
(Edited)

This conversation is no longer open for comments or replies.