Friday, June 3, 2022
This week we introduced improvements to the algorithm that identifies documents whose title element is written in a different language or script than the content, and selects a title that is similar to the document’s language and script. It is based on the general principle that the title of a document must be written in the language or script of its main content. This is one of the reasons why it is possible to exceed the title element of the title of the web result.
Multilingual titles repeat the same phrase in two different languages or scripts. The most common pattern is to add an English version to the original title text.
गीतांजलिकीजीवनी-Hindi Biography of Geetanjali
In this example, the title consists of two parts (separated by hyphens), expressing the same content in different languages (Hindi and English). The titles are in both languages, but the document itself is written only in Hindi. Our system may detect such inconsistencies and use only Hindi heading text such as:
Latin script title
Character conversion is when content is written from one language to another script or another language that uses the alphabet. For example, consider the page title of a song that was written in Hindi but transliterated to use Latin letters instead of the Hindi native Devanagari letters.
jis desh me holi kheli jati hai
In such cases, the system will try to find an alternative title using the dominant script on the page. In this case:
In general, our system tends to use the page title element. For multilingual or transliterated titles, our system may look for alternatives that match the primary language of the page. Therefore, it’s a good idea to give it a title that matches the language and script of the main content of the page.