tiny-lang-detect

Generate tiny models for language detection  https://p.ce9e.org/tiny-lang-detect/demo/
git clone https://git.ce9e.org/tiny-lang-detect.git

commit
27981b2f5adbba86e975add9b04cc0df32c62c9e
parent
d4608364ed033cd73186aa7f05ec76b777bbf42c
Author
Tobias Bengfort <tobias.bengfort@posteo.de>
Date
2025-05-12 07:07
README: tweak "how does it work"

Diffstat

M README.md 4 ++--

1 files changed, 2 insertions, 2 deletions


diff --git a/README.md b/README.md

@@ -56,5 +56,5 @@ pre-processing, and they use the euclidean distance to find the best match.
   56    56 This is ultimately a trade-off between accuracy and simplicity.
   57    57 
   58    58 To simplify the model, `gen_model.py` filters out all but the most significant
   59    -1 n-grams. N-grams are considered more significant if the absolute difference of
   60    -1 their frequencies in the candidate language is big.
   -1    59 n-grams. N-grams are considered more significant if their frequencies have a
   -1    60 large absolute difference beween the candidate languages.