The Content ID of Youtube is responsible for rejecting videos from Youtubers when protected content is identified. This service is based on a recognition program which compares the new videos to be put online to sounds and videos deposited in a database by rights holders.
Although this service was not designed to identify still images, nothing technically prevents the creation of a Photography Content-ID.
Here are the 3 main identification techniques — existing and operating — on which such a system can be based.
In the early 1990s, the original IPTC-IIM (information exchange model) scheme was developed to organize, systematize and unify the way information, metadata, was stored and transported with images in a format compatible with all software.
The IPTC scheme quickly became the standard for all post-production software (Photoshop, LightRoom, Capture 1, Photo Mechanic, etc.) used by creators and producers of professional photographic content.
As a result, today, professional photographers systematically fill in the IPTC fields of copyright, description, source, date, places, keywords, etc.
This metadata is of crucial information and identification value for the photographs.
Photographic agencies provide images to publishers with all of this metadata, sometimes enhanced by license type.
De facto, metadata constitute the first level of identification of photographs. Unfortunately, the metadata is erasable. Press publishers and sharing platforms remove it, except for a minority of them.
(also read : State of Image Metadata in News Sites).
There is a second level of identification, which is that of similarity. Via an algorithm, shapes and colors of images are compared to those of thousands of similar photos on the web.
It’s a technology that works well when images are unique. For all the others, results often generate in a mass of close photos, which requires substantial human verification.
On this subject, the reflections of a majority of speakers at the Stakeholder Dialogue Meeting express that similarity also generates errors and that its cost in human labor makes it expensive.
The third level of identification is that of the watermark. A watermark, an invisible code dispersed and hidden in the pixels, makes it possible to certify the identity of an image even entirely stripped of its metadata.
To be identified by the servers, the watermark must be one of those capable of withstanding standard manipulation on the web; compression, cropping, inversely horizontal, etc.
The first level of image identification that constitutes metadata is an objective already achieved by the vast majority of image creators and producers.
The third level of identification, the watermark, tightly secures the link between image and metadata.
The combination of levels 1 and 3 constitutes a very reliable, workload efficient identification system which is gaining adepts worldwide with the news wires and large corporations.
It is towards this solution that projects like that of the New York Times (The News Provenance Project) or Adobe (CAI) are heading, after having blockchain ultimately put aside because too greedy in energy.
Remains to enforce existing laws forbidding to delete metadata. It represents essential information and value, costly to produce and which makes it possible to verify the source and authorship of an image.
Its preservation and its display do not involve any additional cost. In contrast, its deletion is a disaster for democratic information and an ideal breeding ground for the multiplication of fake news.
Some publishers keep and display metadata – this is ultimately their well-understood interest for the monetization of their content – and some platforms too – notably Google – which has recently started to show metadata on Google Image.
Remains also to obtain from platforms and publishers alike that they open their databases to collecting organizations so that the contents used can be identified.