- commit
- 2f92e8aef5503af2ab121000b74fe1c1b3bf160a
- parent
- 27981b2f5adbba86e975add9b04cc0df32c62c9e
- Author
- Tobias Bengfort <tobias.bengfort@posteo.de>
- Date
- 2025-05-12 07:08
README: add single-ngram classifier
Diffstat
| M | README.md | 13 | +++++++++++++ |
1 files changed, 13 insertions, 0 deletions
diff --git a/README.md b/README.md
@@ -42,6 +42,19 @@ def classify(model, text): 42 42 return min(model['freq'], key=lambda lang: dist(model['freq'][lang], freq)) 43 43 ``` 44 44 -1 45 ## An even simpler classifier -1 46 -1 47 To take this idea to the exteme, you could reduce the model to the single most -1 48 siginificant n-gram: -1 49 -1 50 ```py -1 51 def classify(text): -1 52 freq = text.count('o') / len(text) -1 53 return 'en' if freq > 0.05 else 'de' -1 54 ``` -1 55 -1 56 This classifier still has an accuracy of 82.1% on the test data. -1 57 45 58 ## How does it work? 46 59 47 60 `langdetect` works by comparing n-gram frequencies. For example, the 3-gram