CelesTLSH
Pronounced: Celestial-S-H
A fuzzy hash database and antimalware scanner that identifies attack tools and malicious binaries through similarity-based detection.
What is Fuzzy Hashing?
Traditional cryptographic hashes like SHA-256 and MD5 produce a unique fingerprint for a file. Change a single bit, and the hash changes completely. This makes them useful for exact matching but useless for detecting modified variants of the same malware.
Fuzzy hashing solves this by generating hashes that remain similar when the underlying content is similar. Two files that share structural overlap will produce hashes with a low "distance" score, even if portions of the file have been altered.
TLSH (Trend Micro Locality Sensitive Hash) is a locality-sensitive hashing algorithm created by Trend Micro. Given a byte stream of at least 50 bytes, TLSH generates a hash value that can be used for similarity comparisons. The algorithm computes a distance score where smaller values indicate closer similarity and larger scores indicate greater differences.
"TLSH is a fuzzy matching library. Similar objects will have similar hash values, enabling the detection of similar objects by comparing their hash values."
— Trend Micro, TLSH Project
This makes TLSH particularly valuable for threat detection: attackers routinely modify their tools to evade hash-based signatures, but the structural similarity remains detectable through fuzzy hashing.
CelesTLSH Architecture
CelesTLSH operates as a LimaCharlie extension deployed across all major operating systems. It leverages LimaCharlie's Binary Library (BinLib) feature, which captures a single unique copy of any binary that generates a CODE_IDENTITY event.
Binary Execution Captured
When a binary executes on an endpoint, LimaCharlie's BinLib captures a unique copy and computes its TLSH hash along with rich metadata (ImpHash, SHA-256, code signing, file type).
Similarity Comparison
CelesTLSH compares the binary's TLSH hash against a well-vetted database of over 100,000 unique hashes from 319+ known attack tools and malware families, with more added daily.
Alert & Enrichment
When a match is found, CelesTLSH generates an event with the matched threat name, similarity score, and contextual metadata. By default, alerts trigger for matches with a distance score of 50 or less (configurable).
A key advantage: if the same binary executes on a thousand systems, BinLib only stores and scans one copy. This dramatically reduces false positive volume and computational overhead compared to scanning every execution independently.
Use Cases
Threat Hunting
Leverage TLSH distance scores to identify modified or derivative versions of known attack tools used in adversarial operations. Discover variants that evade traditional hash-based signatures.
Detection Engineering
Establish baseline similarity thresholds to build high-fidelity detections. Use fuzzy hash matches as enrichment context in risk-based alerting models rather than standalone signals.
Incident Response
Quickly assess whether artifacts discovered in an environment are likely variants of known tools. A binary 85% similar to Sliver or CobaltStrike is an immediate investigation priority.
Event Enrichment
The future of fuzzy hashing is enrichment, not standalone alerting. CelesTLSH adds similarity context to process execution events, enabling more sophisticated detection logic.
TLSH Distance Score Thresholds
The relationship between TLSH distance and detection accuracy. Lower distance scores indicate greater similarity. The default CelesTLSH alert threshold is a score of 50 or less.
| Score | False Positive Rate | Detection Rate |
|---|---|---|
| < 30 | 0.002% | 32.2% |
| < 40 | 0.07% | 49.6% |
| < 50 | 0.52% | 65.3% |
| < 60 | 1.09% | 76.0% |
| < 80 | 2.93% | 89.0% |
| < 100 | 6.43% | 94.5% |
| < 150 | 24.33% | 98.1% |
Source: TLSH — A Locality Sensitive Hash (Oliver, Cheng, Chen, 2013)
FOSS Attack Tool Hash Database
The CelesTLSH Hash Database is open source and freely available. It tracks TLSH hashes for a wide array of publicly hosted attack tools and C2 frameworks, sourced directly from their official GitHub repositories and continuously updated.
The database currently covers tools including Mimikatz, CobaltStrike (Ghostpack), Sliver, Empire, BloodHound, Impacket, CrackMapExec, Responder, Rubeus, SharpHound, LaZagne, and hundreds more. Each tool directory contains TLSH hashes for all releases and variants.
Contributions are welcome. If you have suggestions for additional tools to monitor or improvements to the feed, submit an issue or pull request on the project repository.
Publications & References
Magonia Research
- Fuzzy Hashing Research: A Paper Highlight with Practitioner's Notes — Real-world lessons from running CelesTLSH in production across ~100 organizations, with practical techniques for reducing TLSH false positives.
- Maximizing the Value of Indicators of Compromise and Reimagining Their Role in Modern Detection — Why the future of IOCs is enrichment, not alerting, and where fuzzy hashes fit in the modern Pyramid of Pain.
Academic Papers
- Oliver, J., Cheng, C., & Chen, Y. (2013). TLSH — A Locality Sensitive Hash. 4th Cybercrime and Trustworthy Computing Workshop, Sydney.
- Oliver, J., Forman, S., & Cheng, C. (2014). Using Randomization to Attack Similarity Digests. ATIS 2014, pp. 199–210.
- Oliver, J., Ali, M., & Hagen, J. (2020). HAC-T and fast search for similarity in security. International Conference on Omni-layer Intelligent Systems (COINS), IEEE.
- (2025). Bytewise approximate matching: Evaluating common scenarios for executable files. ScienceDirect.
Industry
- Target Corporation. Implementing TLSH-Based Detection at Scale.
- Bianco, D. (2022). Stop Using Hashes for Detection (and When You Should Use Them).
Get Started with CelesTLSH
Deploy fuzzy hash detection across your organization with the LimaCharlie extension, or use the open-source hash database in your own detection pipeline.