This file describes how the usage of entities on wiki pages is tracked. Tracking happens on two levels: * the client wiki tracks which pages use (which aspect of) which entity (from which repo). * each repo tracks which client uses which entity. This is used to optimize change notifications on two levels: * the repo sends notifications to the clients that use the modified entity in question. * the client compares incoming notifications with it's local tracking table to decide which pages to purge/update. == Client side usage tracking == Each client wiki tracks which pages use (which aspect of) which entity (from which repo). The "aspect" is used to decide which kind of change is relevant for the given kind of usage, and what kind of update is needed for the page in question. The following aspects are defined: * sitelinks: only an item's sitelinks are used. This would be the case for a client page that is connected to an item on the repo via an incoming sitelink, but does not access any data of the item directly. A change to the sitelinks may be applied without re-parsing the page, by overwriting the sitelinks in the cached ParserOutput opbject. * label: only the entity's label is used. This would be the case when a localized reference to the entity is shown on a page. It's also used in cases when a property is referenced by label. A page that uses a label should be updated when that label chances, but this kind of update my be considered low priority. The language in which the label is used is tracked along as a "modification" of the label aspect. In case language fallback is applied, all relevant languages are considered to be used on the page. * all: any and all aspects of the entity may be used on the given page. This includes statements, claims, and labels. This kind of usage triggers a full re-parse on any change to the entity. This aspect of use is recorded when entity data is accessed via Lua or the #property parser function. Entity usage on client pages is tracked using the following codes: * sitelinks (S): the entity's sitelinks are used. * label (L.xx): the entity's label in in language xx changed. * title (T): the title of the local page corresponding to the entity is used. * all (X): other aspects (such as statement data), or all aspects, are or may be used. Changes result in updates to pages that use the respective entity based on the aspect that is used. Changes are classified accordingly: * sitelinks (S): any change to the entity's sitelinks. Pages that use the S or X aspect are updated. * label (L.xx): the label in the language "xx" changed. Pages that use the L.xx or X aspect are updated. * title (T): the sitelink corresponding to the local wiki was changed. Pages that use the S, T, or X aspect are updated. * all (X): Everything about the entity changed, all pages using the entity are updated. * other (O): something else about the entity (such as statement data) changed. Only pages that use the X aspect are updated. This way, editing e.g. statements will not cause pages that just show the entities label to be purged. The database table for tracking client side usage is called wbc_entity_usage, and can be thought of as a links table, just like templatelinks or imagelinks. It has the following fields: eu_entity_id VARBINARY(255) NOT NULL -- the ID of the entity being used eu_aspect VARBINARY(37) NOT NULL -- the aspect of the entity. See EntityUsage::XXX_USAGE for possible values. eu_page_id INT NOT NULL -- the ID of the page that uses the entities. eu_touched BINARY(14) NOT NULL DEFAULT '' -- timestamp corresponding to page.page_touched (currently unused) The following indexes are provided for efficient access: UNIQUE INDEX ( eu_entity_id, eu_aspect, eu_page_id ) -- record one usage per page per aspect of an entity INDEX ( eu_page_id, eu_entity_id ) -- look up (and especially, delete) usage entries by page id NOTE: when tracking usage of entities from multiple repos, we either need distinct ID prefixes, or one table per repo, or both. An additional eu_entity_repo column would introduce a huge amount of redundant data, and would not play well with indexes. == Updating of usage entries == Usage tracking information on the client has to be updated when pages are edited (resp. created, deleted, or renamed) and when pages are re-rendered due to a change in a template. In addition to that, usages that become apparent only when the page is rendered with a specific target language need to be tracked when such a language specific rendering is committed to the parser cache. This is particularly important for per-language tracking of label usage on multilingual wikis. Tracking information, that needs to be added as new renderings of a page materialize, is being added to the tracking table, after the corresponding parser cache entry has been saved. Tracking data is being discarded whenever the corresponding parser cache entries are being invalidated. If that happens, we simply remove all records in wbc_entity_usage except for the ones in the new (post invalidation) parser cache entry. This implies that entity usage tracking is actually tracking usage in page renderings (technically, entries in the parser cache). It does not directly correspond to what is or is not present in a pages wikitext or templates. However, since a page is always rendered in at least one language when it is edited, this distinction is somewhat academic. Overview of events that trigger updates to usage tracking: * LinksUpdateComplete: add usage entries from ParserOutput * LinksUpdateComplete: prune all old entries, unsubscribe unused entries * ParserCacheSave: add new usage entries from ParserOutput * ArticleDeleteComplete: prune all entries, unsubscribe unused entries == Repo side usage tracking == Each repo tracks which client uses which entity. This is done in the table wb_changes_subscription, which has the following structure: cs_row_id BIGINT NOT NULL PRIMARY KEY AUTO_INCREMENT, cs_entity_id VARBINARY(255) NOT NULL, -- the ID of the entity subscribed to cs_subscriber_id VARBINARY(255) NOT NULL -- the ID of the subscriber (e.g. a domain name or database name) The following indexes are provided for efficient access: UNIQUE INDEX ( cs_entity_id, cs_subscriber_id ) -- look up a subscription, or all subscribers of an entity INDEX ( cs_subscriber_id ) -- look up all subscriptions of a subscriber This table is updated whenever the client side tracking table is updated. To do this, the client wiki must, whenever a page is edited, determine which entities are used (in order to record this in the local tracking table), but in addition to this, detect whether an entity that wasn't used previously anywhere on the wiki is now used by the edited page, or whether the edit removed the last usage of any of the entities previously used on the page.