Detection and removal of child sexual abuse material (CSAM) by online service providers is one of the essential ways in which industry combats online child sexual exploitation and abuse (OCSEA). The majority of this detection is done voluntarily and proactively by companies such as those in the Tech Coalition.
There are three primary, voluntary mechanisms by which electronic service providers (ESPs) may detect CSAM on their platforms, regardless of jurisdiction:
- Hash-based detection for known CSAM: A hash is a digital fingerprint or signature that is unique to the content (such as an image or video). Hashes are then stored in secure databases for companies to match against, and quickly surface previously seen CSAM content. Some examples used by industry include PhotoDNA, MD5, PDQ, CSAI Match, and more. This technology generates the vast majority of CSAM identification and reports to the National Center for Missing and Exploited Children (NCMEC) or proper authorities.
- Machine learning classifiers: Machine learning classifiers flag suspected but previously undetected/unhashed CSAM, which is then confirmed by human reviewers before being reported to NCMEC or proper authorities. After a piece of content has been verified as CSAM, it is hashed and can be shared with other companies through secure hash databases.
- User or third-party reporting: Users or third-parties (such as parents or trusted flaggers) who encounter possible violative content can report it to the ESP using each of their reporting flows. Following a review of this content (which will often include running through hashing and classifying tools), it too, will be reported to NCMEC or the proper authorities, hashed, and securely shared.
According to data from the Tech Coalition’s 2023 annual member survey, voluntary detection methods are prevalent across the tech industry with 89% of TC members using at least one image hash-matcher; 59% of members reporting use of at least one video hash-matcher; and 57% of members reporting use of at least one classifier to detect undetected/unhashed CSAM.
More information about proactive detection measures can be found here and here.
We’ve included below insights and deep dives from some of our member companies in their own words. Microsoft shared a deep dive into PhotoDNA, the hash-based detection they helped develop. Snap, TikTok, and Sony Interactive Entertainment provided descriptions of how they each pursue voluntary detection, among other efforts to protect children online.
Microsoft
In 2009, Microsoft partnered with Dartmouth College to create PhotoDNA, a tool to combat the online distribution of CSEA images. Hash-matching technologies like PhotoDNA enable detection and disruption efforts to address harm at scale across many billions of images and videos.
PhotoDNA enables online services to detect and disrupt the dissemination of CSAM by creating a unique digital signature (a hash) of images that may be uploaded or disseminated on a service and then comparing that hash against a database of previously hashed materials that have already been identified as CSAM (“known CSAM”). PhotoDNA does not interpret an image: it merely enables a determination that two images are the same. Many companies then employ human review of any hash matches to verify whether an image contains CSAM.
The hash signature is created by converting images to grayscale and dividing them into smaller blocks. Each one is assigned a number, and these numerical representations of the blocks are linked together to generate the perceptual hash of the image. The algorithm produces a different hash even with minor alterations to the content of the image.
PhotoDNA employs perceptual hash matching, which compares the similarity between the hashes of two different images through a “distance” metric. A company will set a threshold to determine when there is enough similarity between the two hashes for the files to be considered a match. This allows modifications from the original file to be detected, depending on the distances used. Similar results can be achieved with videos by selecting individual frames, generating hashes for them, and individually matching each of those hashes. This process, enabled by PhotoDNA for Video, allows the identification of CSAM even if it has been edited or embedded into a video.
The effectiveness and accuracy of hash-matching technology relies on the quality of the hash database leveraged to detect harmful content. Collaboration between industry and non-government organizations, such as NCMEC and the Internet Watch Foundation, has been critical to help develop and maintain such databases. The Tech Coalition also provides licenses to PhotoDNA to its members.
Snap
Snap can become aware of child sexual exploitation and abuse imagery (CSEAI) on the platform in largely two ways:
- Reactively: Reported to Snap, usually in-app, by a Snapchatter or a Trusted Flagger. Snap’s Trusted Flagger program has more than 40 participants and is continually expanding; the bulk of these Trusted Flaggers report CSEAI.
- Proactively: Through Snap’s own detection, including the use of PhotoDNA to detect confirmed, already-hashed images, and via CSAI Match technology to detect confirmed, already-hashed videos. Snap is also investigating the development of technology to assist in identifying novel CSEAI.
These tools are proven and reliable and they can help to prevent re-victimisation and re-circulation of known, illegal material. According to NCMEC, 67 percent of CSEAI survivors say the distribution of their images impacts them differently than the hands-on abuse they suffered because the distribution never ends and the images are “permanent.”
Across industry and across the globe, hundreds of thousands of these images are identified, and hashes of those images are stored in databases that NGOs and companies compile and leverage to detect duplicates. Therefore, Snap can help prevent the re-circulation of this known illegal content and any follow-on re-victimisation.
With the help of these reliable and proven automated tools, combined with in-app reporting, in 2023, Snap took down approximately 1.6 million pieces of CSEAI content and submitted 690,000 CyberTip reports, in total, to NCMEC. Snap’s NCMEC reports also led to more than 1,000 arrests in 2023.
Reactively, Snap’s Trust and Safety teams work 24/7 around the clock and around the globe to review reports and remove CSEAI content quickly – generally within 15 minutes of receiving an in-app report.
Snap also leverages NCMEC’s Take It Down database, which is intended to help minors stop the spread of their nude or sexually explicit images online.
TikTok
Content uploaded to TikTok is reviewed by automated technology, which looks for CSEA before it can be viewed. When TikTok becomes aware of suspected CSEA, whether that's through internal detection methods, community reports, or industry partnerships, they take immediate action to remove it, terminate accounts, and report cases to NCMEC.
TikTok partners with a range of specialist organizations tackling child sexual abuse online and has a dedicated team for law enforcement outreach. In addition to TikTok’s own technology, through close collaboration with industry partners, TikTok also utilizes:
- Google's Content Safety API: Google has developed machine-learning technology to support the proactive identification of never-before-seen CSAM imagery so that it can be reviewed and, if confirmed as CSAM, removed and reported as quickly as possible.
- YouTube's CSAI Match: this helps to identify re-uploads of previously identified child sexual abuse material in videos.
- PhotoDNA: PhotoDNA creates a unique digital signature (known as a “hash”) of an image which is then compared against signatures (hashes) of other photos to find copies of the same image (see above for more details).
- NCMEC and IWF Hash Sharing Web Services: To enable the detection and removal of known CSAM at the point of upload.
- TikTok is a member of StopNCII.org and participates in 'Take it Down' - both services to help remove intimate personal images.
TikTok also maintains a database of keywords that enqueue accounts for review. This database is supplemented with intelligence provided by Thorn and the IWF. Additionally, they use a URL database provided by the IWF to help block links to CSAM.
By law, online platforms and companies like TikTok are required to file a report to NCMEC when they become aware of CSAM. They go beyond the basic requirements as set out in law by:
- Taking proactive steps to detect child sexual exploitation content.
- Submitting comprehensive information to NCMEC to help protect children and support law enforcement investigations.
- Continuously striving to improve the 'actionability' rate of reports. The vast majority (83%) of the reports TikTok submitted to NCMEC in 2022 were classified as actionable, meaning they contained quality information. NCMEC has said that in 2022, just over 50% of the 32.5 million reports submitted to the CyberTipline were informational (i.e. not actionable).
TikTok has taken deliberate safety-by-design decisions to make our app inhospitable for those intent on perpetuating CSEA. On TikTok:
- Direct messaging is not available to anyone under 16 and messaging is not end-to-end encrypted.
- Accounts for people under 16 are automatically set to private along with their content. Furthermore, their content cannot be downloaded and will not be recommended to people they do not know.
- Every teen under 18 has a screentime limit automatically set to 60 minutes.
- Only people over 18 are allowed to host a livestream.
Sony Interactive Entertainment (SIE)
Seventy-seven percent of Sony Interactive Entertainment’s NCMEC CyberTips in 2023 were the result of proactive image detection. While comprehensive 2023 CyberTip data is not yet available, for context, SIE reported 4,102 CyberTips in 2022.
Thanks to NCMEC API integration, SIE is able to process each CyberTip in less than 5 minutes.
SIE uses both the NCMEC and IWF hash databases to compare images against. According to the Tech Coalition's survey data, these are the two most common databases used by member companies.
SIE also contributes back to hash databases with their own content. SIE relies on NCMEC to convert reported CSAM to hash files, which are then shared broadly to other ESPs and contribute to overall industry safety.