Product

CelesTLSH

Name: CelesTLSH
Author: Magonia Research

Pronounced: Celestial-S-H

A fuzzy hash database and antimalware scanner that identifies attack tools and malicious binaries through similarity-based detection.

View on GitHub LimaCharlie Extension

100,000+ Unique TLSH Hashes

319+ Malware Families & Tools

~100 Organizations Using CelesTLSH

Background

What is Fuzzy Hashing?

Traditional cryptographic hashes like SHA-256 and MD5 produce a unique fingerprint for a file. Change a single bit, and the hash changes completely. This makes them useful for exact matching but useless for detecting modified variants of the same malware.

Fuzzy hashing solves this by generating hashes that remain similar when the underlying content is similar. Two files that share structural overlap will produce hashes with a low "distance" score, even if portions of the file have been altered.

TLSH (Trend Micro Locality Sensitive Hash) is a locality-sensitive hashing algorithm created by Trend Micro. Given a byte stream of at least 50 bytes, TLSH generates a hash value that can be used for similarity comparisons. The algorithm computes a distance score where smaller values indicate closer similarity and larger scores indicate greater differences.

"TLSH is a fuzzy matching library. Similar objects will have similar hash values, enabling the detection of similar objects by comparing their hash values."
— Trend Micro, TLSH Project

This makes TLSH particularly valuable for threat detection: attackers routinely modify their tools to evade hash-based signatures, but the structural similarity remains detectable through fuzzy hashing.

How It Works

CelesTLSH Architecture

CelesTLSH operates as a LimaCharlie extension deployed across all major operating systems. It leverages LimaCharlie's Binary Library (BinLib) feature, which captures a single unique copy of any binary that generates a CODE_IDENTITY event.

Binary Execution Captured

When a binary executes on an endpoint, LimaCharlie's BinLib captures a unique copy and computes its TLSH hash along with rich metadata (ImpHash, SHA-256, code signing, file type).

Similarity Comparison

CelesTLSH compares the binary's TLSH hash against a well-vetted database of over 100,000 unique hashes from 319+ known attack tools and malware families, with more added daily.

Alert & Enrichment

When a match is found, CelesTLSH generates an event with the matched threat name, similarity score, and contextual metadata. By default, alerts trigger for matches with a distance score of 50 or less (configurable).

A key advantage: if the same binary executes on a thousand systems, BinLib only stores and scans one copy. This dramatically reduces false positive volume and computational overhead compared to scanning every execution independently.

Applications

Use Cases

Threat Hunting

Leverage TLSH distance scores to identify modified or derivative versions of known attack tools used in adversarial operations. Discover variants that evade traditional hash-based signatures.

Detection Engineering

Establish baseline similarity thresholds to build high-fidelity detections. Use fuzzy hash matches as enrichment context in risk-based alerting models rather than standalone signals.

Incident Response

Quickly assess whether artifacts discovered in an environment are likely variants of known tools. A binary 85% similar to Sliver or CobaltStrike is an immediate investigation priority.

Event Enrichment

The future of fuzzy hashing is enrichment, not standalone alerting. CelesTLSH adds similarity context to process execution events, enabling more sophisticated detection logic.

Reference

TLSH Distance Score Thresholds

The relationship between TLSH distance and detection accuracy. Lower distance scores indicate greater similarity. The default CelesTLSH alert threshold is a score of 50 or less.

TLSH distance score thresholds showing false positive rate and detection rate
Score	False Positive Rate	Detection Rate
< 30	0.002%	32.2%
< 40	0.07%	49.6%
< 50	0.52%	65.3%
< 60	1.09%	76.0%
< 80	2.93%	89.0%
< 100	6.43%	94.5%
< 150	24.33%	98.1%

Source: TLSH — A Locality Sensitive Hash (Oliver, Cheng, Chen, 2013)

Open Source

FOSS Attack Tool Hash Database

The CelesTLSH Hash Database is open source and freely available. It tracks TLSH hashes for a wide array of publicly hosted attack tools and C2 frameworks, sourced directly from their official GitHub repositories and continuously updated.

The database currently covers tools including Mimikatz, CobaltStrike (Ghostpack), Sliver, Empire, BloodHound, Impacket, CrackMapExec, Responder, Rubeus, SharpHound, LaZagne, and hundreds more. Each tool directory contains TLSH hashes for all releases and variants.

Contributions are welcome. If you have suggestions for additional tools to monitor or improvements to the feed, submit an issue or pull request on the project repository.

Research

Publications & References

Magonia Research

Fuzzy Hashing Research: A Paper Highlight with Practitioner's Notes — Real-world lessons from running CelesTLSH in production across ~100 organizations, with practical techniques for reducing TLSH false positives.
Maximizing the Value of Indicators of Compromise and Reimagining Their Role in Modern Detection — Why the future of IOCs is enrichment, not alerting, and where fuzzy hashes fit in the modern Pyramid of Pain.

Academic Papers

Oliver, J., Cheng, C., & Chen, Y. (2013). TLSH — A Locality Sensitive Hash. 4th Cybercrime and Trustworthy Computing Workshop, Sydney.
Oliver, J., Forman, S., & Cheng, C. (2014). Using Randomization to Attack Similarity Digests. ATIS 2014, pp. 199–210.
Oliver, J., Ali, M., & Hagen, J. (2020). HAC-T and fast search for similarity in security. International Conference on Omni-layer Intelligent Systems (COINS), IEEE.
(2025). Bytewise approximate matching: Evaluating common scenarios for executable files. ScienceDirect.

Industry

Target Corporation. Implementing TLSH-Based Detection at Scale.
Bianco, D. (2022). Stop Using Hashes for Detection (and When You Should Use Them).

Get Started with CelesTLSH

Deploy fuzzy hash detection across your organization with the LimaCharlie extension, or use the open-source hash database in your own detection pipeline.

Hash Database on GitHub LimaCharlie Platform