tiny-lang-detect

Generate tiny models for language detection  https://p.ce9e.org/tiny-lang-detect/demo/
git clone https://git.ce9e.org/tiny-lang-detect.git

commit
d4608364ed033cd73186aa7f05ec76b777bbf42c
parent
430f0819379897f02e6e8af12bb744f17030b2e1
Author
Tobias Bengfort <tobias.bengfort@posteo.de>
Date
2025-05-12 07:07
README: inline classifier code

Diffstat

M README.md 14 ++++++++++++--

1 files changed, 12 insertions, 2 deletions


diff --git a/README.md b/README.md

@@ -29,8 +29,18 @@ A model might look like this:
   29    29 }
   30    30 ```
   31    31 
   32    -1 For examples how to use a model to classify languages, see `test.py` and
   33    -1 `demo/demo.js`.
   -1    32 You can use the model like this:
   -1    33 
   -1    34 ```py
   -1    35 def dist(a, b):
   -1    36     return sum((av - bv) ** 2 for av, bv in zip(a, b))
   -1    37 
   -1    38 
   -1    39 def classify(model, text):
   -1    40     n = len(text) + 1
   -1    41     freq = [text.count(g) / (n - len(g)) for g in model['ngrams']]
   -1    42     return min(model['freq'], key=lambda lang: dist(model['freq'][lang], freq))
   -1    43 ```
   34    44 
   35    45 ## How does it work?
   36    46