<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>tiny-lang-detect, branch HEAD</title>
<subtitle>Generate tiny models for language detection</subtitle>
<entry>
<id>e9340ad6f80c3c09853e5860ef694ba036de52ba</id>
<published>2026-01-28T07:30:27Z</published>
<updated>2026-01-28T07:30:27Z</updated>
<title type="text">add long cli arguments</title>
<link rel="alternate" type="text/html" href="commit/e9340ad6f80c3c09853e5860ef694ba036de52ba.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">add long cli arguments
</content>
</entry>
<entry>
<id>1698810e8a923192a8f15347644155f7a39eafe5</id>
<published>2025-11-08T08:14:25Z</published>
<updated>2025-11-08T08:14:25Z</updated>
<title type="text">README: typos</title>
<link rel="alternate" type="text/html" href="commit/1698810e8a923192a8f15347644155f7a39eafe5.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">README: typos
</content>
</entry>
<entry>
<id>7ce818b79da3ba429b8c3f176271cd51997124e1</id>
<published>2025-05-26T18:53:15Z</published>
<updated>2025-05-26T18:53:41Z</updated>
<title type="text">demo: update model</title>
<link rel="alternate" type="text/html" href="commit/7ce818b79da3ba429b8c3f176271cd51997124e1.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">demo: update model
</content>
</entry>
<entry>
<id>1a380f7c5ba83ef5a151e6b0bf4efa431d08db8a</id>
<published>2025-05-26T18:17:10Z</published>
<updated>2025-05-26T18:17:10Z</updated>
<title type="text">typo</title>
<link rel="alternate" type="text/html" href="commit/1a380f7c5ba83ef5a151e6b0bf4efa431d08db8a.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">typo
</content>
</entry>
<entry>
<id>fe18df939733a8d8e6286631930220e1ca5c1315</id>
<published>2025-05-26T15:40:33Z</published>
<updated>2025-05-26T18:15:58Z</updated>
<title type="text">probability: assume frequencies to be independent</title>
<link rel="alternate" type="text/html" href="commit/fe18df939733a8d8e6286631930220e1ca5c1315.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">probability: assume frequencies to be independent

For dependent probabilities that sum to 1, we get the total probability
by calculating:

n! * \prod(q_i^{k_i} / k_i!)

If we want to compare two of these probabilities, we can leave out
anything that does not depend on q. With $p_i = k_i/n$, this results in:

\prod(q_i^{p_i})

This is what we have done so far. However, our probabilities do not sum to
1. If we only look at 1-grams, we can add an &quot;everything else&quot; factor:

(1 - \sum q_i)^{1 - sum p_i}

While 1-grams, 2-grams and 3-grams are not totally independent, they are
also not dependent in the sense that one implies not-the-other. So we
could caluclate the probability of each group individually and multiply
the results.

In practice, just treating everything as independent seems to work just
as well and is much simpler.
</content>
</entry>
<entry>
<id>9c128468b32a9d10a1ac0dea0908e38d036832df</id>
<published>2025-05-26T15:37:12Z</published>
<updated>2025-05-26T15:37:24Z</updated>
<title type="text">allow utf-8 in json output</title>
<link rel="alternate" type="text/html" href="commit/9c128468b32a9d10a1ac0dea0908e38d036832df.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">allow utf-8 in json output
</content>
</entry>
<entry>
<id>6d8c914ce3795f6e5d2fc26800b3514000c8e4dd</id>
<published>2025-05-19T11:36:43Z</published>
<updated>2025-05-19T11:44:47Z</updated>
<title type="text">further simplify</title>
<link rel="alternate" type="text/html" href="commit/6d8c914ce3795f6e5d2fc26800b3514000c8e4dd.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">further simplify
</content>
</entry>
<entry>
<id>c70e23e44137b076ce3db100ac66b940358e4bd8</id>
<published>2025-05-19T09:17:30Z</published>
<updated>2025-05-19T11:08:03Z</updated>
<title type="text">simplify distance</title>
<link rel="alternate" type="text/html" href="commit/c70e23e44137b076ce3db100ac66b940358e4bd8.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">simplify distance

`prod (pi/qi)**pi` is how much more likely it is that observation p was
created by model p than by model q. Applying log() turns this into the
KL-divergence.

`prod qi**pi` is the probability that observation p was created by model
q.

`prod (pi/qi)**pi * prod qi**pi = prod pi**pi` is the same for each q.
</content>
</entry>
<entry>
<id>8184ad526ec1080060f790a1fa99b0533fdb1b8a</id>
<published>2025-05-14T05:10:30Z</published>
<updated>2025-05-14T05:43:59Z</updated>
<title type="text">switch from euclidean distance to Kullback–Leibler divergence</title>
<link rel="alternate" type="text/html" href="commit/8184ad526ec1080060f790a1fa99b0533fdb1b8a.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">switch from euclidean distance to Kullback–Leibler divergence

This has better results and is also more in line with the Bayesian
approach used by langdetect.
</content>
</entry>
<entry>
<id>2f92e8aef5503af2ab121000b74fe1c1b3bf160a</id>
<published>2025-05-12T07:08:10Z</published>
<updated>2025-05-12T07:11:03Z</updated>
<title type="text">README: add single-ngram classifier</title>
<link rel="alternate" type="text/html" href="commit/2f92e8aef5503af2ab121000b74fe1c1b3bf160a.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">README: add single-ngram classifier
</content>
</entry>
<entry>
<id>27981b2f5adbba86e975add9b04cc0df32c62c9e</id>
<published>2025-05-12T07:07:54Z</published>
<updated>2025-05-12T07:08:06Z</updated>
<title type="text">README: tweak &quot;how does it work&quot;</title>
<link rel="alternate" type="text/html" href="commit/27981b2f5adbba86e975add9b04cc0df32c62c9e.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">README: tweak &quot;how does it work&quot;
</content>
</entry>
<entry>
<id>d4608364ed033cd73186aa7f05ec76b777bbf42c</id>
<published>2025-05-12T07:07:38Z</published>
<updated>2025-05-12T07:07:47Z</updated>
<title type="text">README: inline classifier code</title>
<link rel="alternate" type="text/html" href="commit/d4608364ed033cd73186aa7f05ec76b777bbf42c.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">README: inline classifier code
</content>
</entry>
<entry>
<id>430f0819379897f02e6e8af12bb744f17030b2e1</id>
<published>2025-05-10T18:34:26Z</published>
<updated>2025-05-10T18:34:26Z</updated>
<title type="text">tweak readme</title>
<link rel="alternate" type="text/html" href="commit/430f0819379897f02e6e8af12bb744f17030b2e1.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">tweak readme
</content>
</entry>
<entry>
<id>e160d8cb04fb6d6e6c762edcb27cbc4f704fb1d1</id>
<published>2025-05-10T18:32:16Z</published>
<updated>2025-05-10T18:32:16Z</updated>
<title type="text">gen_model: most significant first</title>
<link rel="alternate" type="text/html" href="commit/e160d8cb04fb6d6e6c762edcb27cbc4f704fb1d1.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">gen_model: most significant first
</content>
</entry>
<entry>
<id>315458d0a09b6e790689e7e56ccb0428eca4a4c2</id>
<published>2025-05-10T18:29:51Z</published>
<updated>2025-05-10T18:29:51Z</updated>
<title type="text">gen_model: allow to pass more than two languages</title>
<link rel="alternate" type="text/html" href="commit/315458d0a09b6e790689e7e56ccb0428eca4a4c2.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">gen_model: allow to pass more than two languages
</content>
</entry>
<entry>
<id>281071942a7647e36686aae2deb2511b2079b637</id>
<published>2025-05-10T18:07:41Z</published>
<updated>2025-05-10T18:07:41Z</updated>
<title type="text">add some explanation</title>
<link rel="alternate" type="text/html" href="commit/281071942a7647e36686aae2deb2511b2079b637.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">add some explanation
</content>
</entry>
<entry>
<id>2479e1d6933082faf7d4fc5527d76f55c5087f3e</id>
<published>2025-05-10T16:50:14Z</published>
<updated>2025-05-10T16:50:14Z</updated>
<title type="text">limit precision</title>
<link rel="alternate" type="text/html" href="commit/2479e1d6933082faf7d4fc5527d76f55c5087f3e.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">limit precision
</content>
</entry>
<entry>
<id>127bc181a6b4c10fd266504951d68dbc2e9aa4ef</id>
<published>2025-05-10T15:40:52Z</published>
<updated>2025-05-10T15:40:52Z</updated>
<title type="text">tweak test output</title>
<link rel="alternate" type="text/html" href="commit/127bc181a6b4c10fd266504951d68dbc2e9aa4ef.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">tweak test output
</content>
</entry>
<entry>
<id>3f86e6fb263df69ee5a826710b4c22496916add0</id>
<published>2025-05-10T15:36:15Z</published>
<updated>2025-05-10T15:36:15Z</updated>
<title type="text">README: rm redundant line</title>
<link rel="alternate" type="text/html" href="commit/3f86e6fb263df69ee5a826710b4c22496916add0.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">README: rm redundant line
</content>
</entry>
<entry>
<id>49e8c884b8b27f1ae43ed1908aa228897c6c6007</id>
<published>2025-05-06T06:30:07Z</published>
<updated>2025-05-06T06:30:07Z</updated>
<title type="text">demo: add heading</title>
<link rel="alternate" type="text/html" href="commit/49e8c884b8b27f1ae43ed1908aa228897c6c6007.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">demo: add heading
</content>
</entry>
<entry>
<id>5698b03845de35c9d8a99ede35bc7a090d5f40d2</id>
<published>2025-05-06T06:29:17Z</published>
<updated>2025-05-06T06:29:17Z</updated>
<title type="text">include example model in README</title>
<link rel="alternate" type="text/html" href="commit/5698b03845de35c9d8a99ede35bc7a090d5f40d2.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">include example model in README
</content>
</entry>
<entry>
<id>60da7d4b74c54273652a4bbdb84408bb9f055886</id>
<published>2025-05-06T06:22:42Z</published>
<updated>2025-05-06T06:25:00Z</updated>
<title type="text">add README</title>
<link rel="alternate" type="text/html" href="commit/60da7d4b74c54273652a4bbdb84408bb9f055886.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">add README
</content>
</entry>
<entry>
<id>a3e48ff6c5886c46a18cbb9fa15646a3b1ce4041</id>
<published>2025-05-06T06:10:24Z</published>
<updated>2025-05-06T06:25:00Z</updated>
<title type="text">add js demo</title>
<link rel="alternate" type="text/html" href="commit/a3e48ff6c5886c46a18cbb9fa15646a3b1ce4041.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">add js demo
</content>
</entry>
<entry>
<id>1bf85abcec4588d0e3f0806016148a68ffb13918</id>
<published>2025-05-06T05:43:34Z</published>
<updated>2025-05-06T06:25:00Z</updated>
<title type="text">add test.py</title>
<link rel="alternate" type="text/html" href="commit/1bf85abcec4588d0e3f0806016148a68ffb13918.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">add test.py
</content>
</entry>
<entry>
<id>39930b5dc497f10d174802bfc943c5c18cdb4c6f</id>
<published>2025-05-06T05:20:08Z</published>
<updated>2025-05-06T06:25:00Z</updated>
<title type="text">add gen_model.py</title>
<link rel="alternate" type="text/html" href="commit/39930b5dc497f10d174802bfc943c5c18cdb4c6f.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">add gen_model.py
</content>
</entry>
<entry>
<id>3db712ebeea5d1322b6ef9d9470955a566710262</id>
<published>2025-05-06T05:06:16Z</published>
<updated>2025-05-06T06:25:00Z</updated>
<title type="text">convert to shell script</title>
<link rel="alternate" type="text/html" href="commit/3db712ebeea5d1322b6ef9d9470955a566710262.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">convert to shell script
</content>
</entry>
<entry>
<id>c2755acb5d4af6c990d63f827e18c83e33e20c69</id>
<published>2025-05-06T05:03:50Z</published>
<updated>2025-05-06T06:25:00Z</updated>
<title type="text">download data</title>
<link rel="alternate" type="text/html" href="commit/c2755acb5d4af6c990d63f827e18c83e33e20c69.html" />
<author>
<name>Tobias Bengfort</name>
<email>tobias.bengfort@posteo.de</email>
</author>
<content type="text">download data
</content>
</entry>
</feed>
