blog

git clone https://git.ce9e.org/blog.git

commit
f6bb96e25ebc47d8f800c7a57ef4bde656377094
parent
835add61864211ccad823dd0dd92ff0f4040fd02
Author
Tobias Bengfort <tobias.bengfort@posteo.de>
Date
2025-02-02 14:49
post: oidc

Diffstat

A _content/posts/2025-01-07-oidc/index.md 453 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 files changed, 453 insertions, 0 deletions


diff --git a/_content/posts/2025-01-07-oidc/index.md b/_content/posts/2025-01-07-oidc/index.md

@@ -0,0 +1,453 @@
   -1     1 ---
   -1     2 title: I don't really like OIDC
   -1     3 date: 2025-01-07
   -1     4 tags: [code, security]
   -1     5 description: "I will look into this single sign-on protocol and figure out why it is so darn complicated."
   -1     6 ---
   -1     7 
   -1     8 When an organization grows, centralized account management becomes an important
   -1     9 issue. The modern protocol to do single sign-on (SSO) is called [OpenID
   -1    10 Connect](https://openid.net/specs/openid-connect-core-1_0.html#) (OIDC). In
   -1    11 this post I will look into this protocol and figure out why it is so darn
   -1    12 complicated.
   -1    13 
   -1    14 ## My naive expectation
   -1    15 
   -1    16 This is going to be quite a long post. But let's kick things off with a
   -1    17 concrete example that illustrates what I expected, based on some basic
   -1    18 knowledge about SSO:
   -1    19 
   -1    20 1.  When I try to access the application, I get redirected to a SSO login form:
   -1    21 
   -1    22         $ curl https://myapp.example/
   -1    23         < HTTP/1.1 303 See Other
   -1    24         < Location: https://sso.example/login/?client_id=myapp
   -1    25 
   -1    26 2.  I authenticate (e.g. by providing a username and password) and get
   -1    27     redirected back to the application.
   -1    28 
   -1    29         $ curl https://sso.example/login/?client_id=myapp
   -1    30             --form "username=tobias"
   -1    31             --form "password=…"
   -1    32         < HTTP/1.1 303 See Other
   -1    33         < Location: https://myapp.example/?code=ABC123
   -1    34 
   -1    35 3.  I am back at the application, but now with a `code` parameter. To verify
   -1    36     this authorization code, the application (not my browser!) sends it back
   -1    37     to the SSO provider:
   -1    38 
   -1    39         $ curl https://sso.example/verify/
   -1    40             --form "code=ABC123"
   -1    41 
   -1    42 4.  The SSO provider verifies the authorization code and responds with some
   -1    43     information about my account, most notably a unique identifier.
   -1    44 
   -1    45         < HTTP/1.1 200 OK
   -1    46         < Content-Type: application/json
   -1    47         {
   -1    48             "username": "tobias",
   -1    49             "email": …,
   -1    50             "name": …,
   -1    51             "groups": […]
   -1    52         }
   -1    53 
   -1    54 That's it.
   -1    55 
   -1    56 
   -1    57 ## The actual protocol
   -1    58 
   -1    59 In reality, OpenID Connect has some additional steps:
   -1    60 
   -1    61 1.  The application fetches some information needed to interact with the
   -1    62     SSO provider:
   -1    63 
   -1    64         $ curl https://identifier-provider.example/.well-known/openid-configuration/
   -1    65         < HTTP/1.1 200 OK
   -1    66         {
   -1    67             "issuer": "https://sso.example",
   -1    68             "authorization_endpoint": "https://sso.example/login/",
   -1    69             "token_endpoint": "https://sso.example/token/",
   -1    70             "userinfo_endpoint": "https://sso.example/userinfo/",
   -1    71             "jwks_uri": "https://sso.example/jwks/",
   -1    72             "response_types_supported": ["code"],
   -1    73             "grant_types_supported": ["authorization_code"],
   -1    74             "id_token_signing_alg_values_supported": ["RS256"],
   -1    75             "token_endpoint_auth_methods_supported": ["client_secret_post"],
   -1    76             "code_challenge_methods_supported": ["S256"]
   -1    77
   -1    78         }
   -1    79 
   -1    80 2.  When I try to access the application, I get redirected to the authorization
   -1    81     endpoint:
   -1    82 
   -1    83         $ curl https://myapp.example/
   -1    84         < HTTP/1.1 303 See Other
   -1    85         < Location: https://sso.example/login/
   -1    86             ?client_id=myapp
   -1    87             &response_type=code
   -1    88             &scope=openid+email+profile
   -1    89             &redirect_uri=https%3A%2F%2Fmyapp.example%2F
   -1    90             &state=XXX
   -1    91             &nonce=YYY
   -1    92             &code_challenge=ZZZ
   -1    93             &code_challenge_method=S256
   -1    94 
   -1    95 3.  I authenticate (e.g. by providing a username and password) and get
   -1    96     redirected back to the application.
   -1    97 
   -1    98         $ curl https://sso.example/login/
   -1    99             ?client_id=myapp
   -1   100             &response_type=code
   -1   101             &scope=openid+email+profile
   -1   102             &redirect_uri=https%3A%2F%2Fmyapp.example%2F
   -1   103             &state=XXX
   -1   104             &nonce=YYY
   -1   105             &code_challenge=ZZZ
   -1   106             &code_challenge_method=S256
   -1   107             --form "username=tobias"
   -1   108             --form "password=…"
   -1   109         < HTTP/1.1 303 See Other
   -1   110         < Location: https://myapp.example/?code=ABC123&state=XXX
   -1   111 
   -1   112     As part of this authentication, I also explicitly consent that the
   -1   113     application may access my information on the SSO provider.
   -1   114 
   -1   115 4.  I am back at the application, but now with `code` and `state` parameters.
   -1   116     First, the application checks if the `state` parameter matches the one
   -1   117     it sent in step 2. After that, to verify the authorization code, the
   -1   118     application (not my browser!) sends it to the token endpoint:
   -1   119 
   -1   120         $ curl https://sso.example/token/
   -1   121             --form "client_id=myapp"
   -1   122             --form "client_secret=…"
   -1   123             --form "code=ABC123"
   -1   124             --form "code_verifier=…"
   -1   125             --form "grant_type=authorization_code"
   -1   126 
   -1   127 5.  The SSO provider checks that the `client_secret` and `code_verifier`
   -1   128     parameters match and that the authorization code is both valid and has not
   -1   129     been used before. Then it responds with some tokens.
   -1   130 
   -1   131         < HTTP/1.1 200 OK
   -1   132         < Content-Type: application/json
   -1   133         < Cache-Control: no-store
   -1   134         {
   -1   135             "id_token": …,
   -1   136             "access_token": "TTT",
   -1   137             "token_type": "Bearer,
   -1   138         }
   -1   139 
   -1   140 6.  The ID token is a JWT (basically a signed JSON blob) that contains some
   -1   141     additional information:
   -1   142 
   -1   143         {
   -1   144             "iss": "https://identifier-provider.example",
   -1   145             "iat": 1736000000,
   -1   146             "exp": 1736000020,
   -1   147             "aud": "myapp",
   -1   148             "nonce": "YYY"
   -1   149         }
   -1   150 
   -1   151     The application now does all kinds of verification:
   -1   152 
   -1   153     -   check the signature of the JWK (using the keys received from `jwks_uri`
   -1   154         in step 1)
   -1   155     -   check that "iss" matches the "issuer" from step 1
   -1   156     -   check that the token has been issued in the past (`iat`) and that it
   -1   157         has not yet expired (`exp`).
   -1   158     -   check that this token was created for this client (`aud`)
   -1   159     -   check that the nonce matches the one that was sent in step 2
   -1   160 
   -1   161 7.  Finally, the application fetches the user information from the userinfo
   -1   162     endpoint, using the access token received in step 5:
   -1   163 
   -1   164         $ curl https://sso.example/userinfo/ -H 'Authorization: Bearer TTT'
   -1   165         < HTTP/1.1 200 OK
   -1   166         < Content-Type: application/json
   -1   167         {
   -1   168             "sub": "tobias",
   -1   169             "email": …,
   -1   170             "name": …,
   -1   171             "groups": […]
   -1   172         }
   -1   173 
   -1   174 This protocol is obviously much more complicated than my naive expectation
   -1   175 (though the basic structure is the same). In the following sections I want
   -1   176 to examine all the little differences and ask: Why is it there and is it really
   -1   177 necessary?
   -1   178 
   -1   179 ## OAuth legacy
   -1   180 
   -1   181 As a first step it is important to understand that OpenID Connect is based on
   -1   182 [OAuth 2](https://www.rfc-editor.org/rfc/rfc6749).
   -1   183 
   -1   184 OAuth is not really an authentication protocol by itself. I feel like most
   -1   185 explanations are overly complicated, so I will use an example instead:
   -1   186 
   -1   187 *There is a cool new service called awesome-meetings.example. I want to start
   -1   188 using it immediately, but first it needs access to my calendar. So I press a
   -1   189 button and get redirected to serious-calendar.example, where I verify that I
   -1   190 indeed want to share my calendar with awesome-meetings.example. I get
   -1   191 redirected back and can start scheduling meetings.[^1]*
   -1   192 
   -1   193 [^1]: For another great example, see [this stackoverflow
   -1   194     answer](https://stackoverflow.com/questions/4727226/#32534239).
   -1   195     Another good introduction is [OAuth from First
   -1   196     Principles](https://stack-auth.com/blog/oauth-from-first-principles).
   -1   197 
   -1   198 What happens in the background is basically the same as the protocol I
   -1   199 described above. awesome-meetings.example ends up with an access token that it
   -1   200 can use to access my calendar. The `scope` parameter restricts what the token
   -1   201 can be used for. In this example, the token can only be used to access my
   -1   202 calendar, but not my address book.
   -1   203 
   -1   204 The OpenID Connect authors squinted at this and decided that being allowed to
   -1   205 access a user's data is really the same as authentication. They also figured
   -1   206 that big companies like Google, Facebook, or Microsoft would probably want to
   -1   207 provide both SSO and resource access. So combining the two seemed like a good
   -1   208 fit.
   -1   209 
   -1   210 OpenID Connect mostly adds the concept of the ID token, as well as the `nonce`
   -1   211 parameter. We will discuss both later in this article. They also add the
   -1   212 `.well-known/openid-configuration/` endpoint, which makes sense given all the
   -1   213 available options.
   -1   214 
   -1   215 Because oh boy are there options. The protocol I described above is just one of
   -1   216 many possible ways to do it. There are many different and incompatible
   -1   217 authentication schemes built on top of OAuth. OpenID Connect standardizes some
   -1   218 of that, and [OAuth 2.1](https://datatracker.ietf.org/doc/draft-ietf-oauth-v2-1/)
   -1   219 (still a draft) removes some further options.
   -1   220 
   -1   221 Even though some options have been removed, there are still plenty left. For
   -1   222 example, there are at least two ways to pass user information to applications
   -1   223 (none of which match my expectation): It can be included in the ID token or
   -1   224 received from a separate userinfo endpoint. I have seen both in the wild.
   -1   225 Realistically, SSO services need do both to be compatible.
   -1   226 
   -1   227 ## Terminology
   -1   228 
   -1   229 Quick note on naming things:
   -1   230 
   -1   231 -   SAML uses the terms "service provider" (SP) and "identity provider" (IdP)
   -1   232 -   OAuth uses the terms "client", "authorization server" (AS), and "resource server" (RS)
   -1   233 -   OpenID uses the terms "relying party" (RP) and "OpenID Provider" (OP)
   -1   234 -   I talk above about "application" and "SSO provider".
   -1   235 
   -1   236 I am sorry for adding yet another set of terms, but I find all the others
   -1   237 really confusing.
   -1   238 
   -1   239 ## Threat Analysis
   -1   240 
   -1   241 In non-SSO login, there are two main attack vectors: Either you manage to trick
   -1   242 the login (e.g. by guessing the password) or you manage to steal a session
   -1   243 cookie. Both of these vectors are the exactly the same with SSO.
   -1   244 
   -1   245 The benefits are that you only have a single login implementation, so you can
   -1   246 focus on making that really robust. You also only expose the password to a
   -1   247 single service, which is an improvement over older SSO mechanisms such as LDAP,
   -1   248 where the password was given to each application which verified it with the
   -1   249 SSO provider in the background.
   -1   250 
   -1   251 But there is also new attack surface. Authorization codes are sufficient to log
   -1   252 in, and they are easily stolen (e.g. from the browser history). It is therefore
   -1   253 crucial that they expire quickly, and also once they have been used. They
   -1   254 should also not contain any personal information about the user.
   -1   255 
   -1   256 A second, less obvious attack, is that an attacker could get a user to click a
   -1   257 link with a crafted authorization code. As a result, the user might do
   -1   258 something using the attackers account, while thinking they are using their own.
   -1   259 
   -1   260 Of course, misconfigured applications may also allow to bypass SSO, maybe even
   -1   261 register new accounts. Correct configuration is crucial.
   -1   262 
   -1   263 ## Threat Mitigations: State, Nonce, and Code Challenge
   -1   264 
   -1   265 These three parameters can be used to further limit the risk of authorization
   -1   266 code injection. They all work very similarly: A random value is stored in the
   -1   267 application session, and a cryptographic hash is sent in the initial request
   -1   268 and then passed along. When it comes time to check the value it is compared to
   -1   269 the hash of the value in the session again.
   -1   270 
   -1   271 This way the whole transaction is bound to the application session. Even if an
   -1   272 attacker would steel the authorization code, they could only use it if they
   -1   273 also manage to steal the session cookie (e.g. by getting physical access to the
   -1   274 device), by which point they don't really need the authorization code anymore.
   -1   275 
   -1   276 These mechanisms also significantly raise the bar for supplying crafted
   -1   277 authorization codes, because attackers need to include parameters that match
   -1   278 the ones in the user's session (e.g. by witnessing the initial authentication
   -1   279 request).
   -1   280 
   -1   281 The differences between these parameters are small: `state` is checked in step
   -1   282 4, so it can prevent making the token request. `code_challenge` is checked in
   -1   283 step 5, so the token request is made, but the application does not receive
   -1   284 tokens. `nonce` is checked in step 6, at the very end.
   -1   285 
   -1   286 One benefit of `code_challenge` is that it is checked by the SSO provider, so
   -1   287 by requiring it you can be sure that it is implemented correctly everywhere. Of
   -1   288 course that requires that all applications are compatible.
   -1   289 
   -1   290 So which one should you implement? This is another case where I wish the spec
   -1   291 had less options. Right now, for the sake of compatibility, it is probably best
   -1   292 to support all of them. On the other hand, this increases the risk of downgrade
   -1   293 attacks.
   -1   294 
   -1   295 ## ID token
   -1   296 
   -1   297 The main addition of OpenID Connect on top of OAuth is the ID token. From what
   -1   298 I understand, it is completely redundant.
   -1   299 
   -1   300 -   Its cryptographic signature can be used to verify that authorization code,
   -1   301     but we have already done that by sending it to the token endpoint over a
   -1   302     TLS connection.
   -1   303 -   It can contain information about the user, but we can also get that from
   -1   304     the userinfo endpoint.
   -1   305 
   -1   306 In an alternate world, we would receive the ID token directly instead of taking
   -1   307 the detour of using an authorization code (this is called the "implicit flow"
   -1   308 in OAuth). We would then validate the ID token and extract the user info, no
   -1   309 additional requests necessary.
   -1   310 
   -1   311 My main issue, again, is that there are too many options. We should pick one.
   -1   312 And we should certainly not have to support both, that is just unnecessary
   -1   313 complexity.
   -1   314 
   -1   315 In the implicit flow, the tokens are passed in the URL and end up in the
   -1   316 browser history, from where they can easily be stolen. This is not so much an
   -1   317 issue for the SSO usecase, because the tokens have limited use there. But in
   -1   318 the OAuth usecase, this is a real issue. I don't want people to steal the
   -1   319 access token to my calendar.
   -1   320 
   -1   321 OAuth 2.1 therefore went ahead and removed the implicit flow completely. This
   -1   322 is a huge step in the right direction (which would also make the
   -1   323 `response_type=code` parameter obsolete if it wasn't for backwards
   -1   324 compatibility). If the OpenID Connect spec got rebased onto that, it could be
   -1   325 simplified massively. Maybe the ID token could even be removed.
   -1   326 
   -1   327 ## Dynamic Redirects
   -1   328 
   -1   329 The authorization endpoint receives both a `client_id` and a `redirect_uri`
   -1   330 parameter. However, it would be insecure to allow arbitrary values for
   -1   331 `redirect_uri`. This would for example allow to redirect to an
   -1   332 attacker-controlled URI that steals the authorization code.
   -1   333 
   -1   334 Of course, always redirecting to the application start page would be annoying
   -1   335 for users. When I open a link and need to log in before accessing the page, I
   -1   336 want to get redirected to that page after login.
   -1   337 
   -1   338 In the end, only the application can decide which redirect URIs are safe. So
   -1   339 the best solution is to always redirect to a pre-defined URI and let the
   -1   340 application handle the rest. In the meantime, the application could store the
   -1   341 original URI in the session.
   -1   342 
   -1   343 In other words: The `redirect_uri` parameter is completely dispensable.
   -1   344 
   -1   345 ## Client Secret
   -1   346 
   -1   347 The token endpoint receives a `client_secret` parameter. This allows the SSO
   -1   348 provider to verify that the request comes from the same application for which
   -1   349 the authorization code has been created. This is of course important for the
   -1   350 OAuth usecase, because you don't want the wrong application to receive the
   -1   351 access token for your calendar.
   -1   352 
   -1   353 For the SSO usecase, this is less relevant though. What is the worst thing that
   -1   354 could happen? A malicious client learns that I can successfully authenticate?
   -1   355 That doesn't sound so bad. The token endpoint may give you access to [some
   -1   356 limited information about the
   -1   357 user](https://openid.net/specs/openid-connect-core-1_0.html#StandardClaims)
   -1   358 though.
   -1   359 
   -1   360 There may be more attacks that I don't see right now. Protecting the user
   -1   361 information alone might be worth it. So I don't really mind it.
   -1   362 
   -1   363 But again, there are way too many options: "the authorization server MAY accept
   -1   364 any form of client authentication meeting its security requirements (e.g.,
   -1   365 password, public/private key pair)."
   -1   366 
   -1   367 ## Native Applications
   -1   368 
   -1   369 So far I mostly assumed that the application is a server-side web application.
   -1   370 If instead the application is a SPA or a native app, things get more
   -1   371 complicated:
   -1   372 
   -1   373 -   The client secret is exposed
   -1   374 -   The values for `state`, `code_challenge`, and `nonce` are exposed
   -1   375 -   The request to the token endpoint uses the user's network, which makes MITM
   -1   376     attacks much simpler
   -1   377 -   The authorization endpoint cannot simply redirect to a native app as you
   -1   378     would to a web application
   -1   379 
   -1   380 I will not go into more detail here. The OAuth spec has a [whole section on
   -1   381 native applications](https://www.ietf.org/archive/id/draft-ietf-oauth-v2-1-12.html#name-native-applications).
   -1   382 Just be aware that they are special.
   -1   383 
   -1   384 ## Logout
   -1   385 
   -1   386 One nice feature of SSO is that you may not even notice it: Clicking the login
   -1   387 button in an application may seemingly just refresh the page and log you in.
   -1   388 This is because the authorization endpoint can just redirect you back
   -1   389 immediately if you are already logged in at your SSO provider.
   -1   390 
   -1   391 However, there is an issue: Users may not realize that they are logged in at
   -1   392 the SSO provider. Imagine someone using a shared computer in a library. They
   -1   393 log in to their email account using SSO, then log out of the email account
   -1   394 again. But they are still logged in on the SSO provider. The next person using
   -1   395 the device could trivially log back in.
   -1   396 
   -1   397 I can think of multiple solutions:
   -1   398 
   -1   399 -   When I log out of any application, I am also logged out of the SSO
   -1   400     provider.
   -1   401 -   When I log out of any application, I am also logged out of the SSO provider
   -1   402     and all other applications.
   -1   403 -   The SSO provider does not keep a session. When I want to log in to a second
   -1   404     service I have to authenticate again.
   -1   405 -   Just don't use shared devices.
   -1   406 
   -1   407 I believe the issue here is that we do not have a shared mental model of how
   -1   408 SSO logout should work. It may also depend on context. For example, I sometimes
   -1   409 use github for SSO, but I also use github for other things, so I know that I
   -1   410 have a session there. On the other hand, I would not remember to log out of
   -1   411 keycloak because that is literally only used for SSO.
   -1   412 
   -1   413 ## Zombie Sessions
   -1   414 
   -1   415 Having centralized account management is nice. When a person leaves your
   -1   416 organization, you can simply remove their account and they immediately loose
   -1   417 access.
   -1   418 
   -1   419 However, as I described so far, SSO is only used for initial authentication.
   -1   420 After that, each application has its own session. People might hold on to their
   -1   421 sessions long after the SSO account has been removed.
   -1   422 
   -1   423 In the OAuth usecase, the access tokens connected to the central account would
   -1   424 also expire. But in the SSO usecase, there is no standardized solution that I
   -1   425 know of. Each application must be handled individually.
   -1   426 
   -1   427 ## Permission management
   -1   428 
   -1   429 When you have centralized account management, you may also want to do
   -1   430 centralized permission management. To a degree this is possible.
   -1   431 
   -1   432 On a basic level, you can configure to which applications an account even has
   -1   433 access. You could also configure groups at the SSO provider that get mapped to
   -1   434 application groups. But in my experience, this only gets you so far. You will
   -1   435 probably still have some application specific permission management.
   -1   436 
   -1   437 ## Conclusion
   -1   438 
   -1   439 OpenID Connect is a solid SSO protocol. It also comes with a semi-automatic
   -1   440 [conformance test suite](https://www.certification.openid.net), which is great.
   -1   441 Unfortunately, it suffers from far too many options and some missed
   -1   442 opportunities. The job of a standard is not to show the set of possibilities,
   -1   443 but to restrict it. This is especially true for security sensitive protocols
   -1   444 such as this one.
   -1   445 
   -1   446 I do understand that some things should be pluggable. Cryptographic primitives
   -1   447 need regular updates. But that's basically it.
   -1   448 
   -1   449 OAuth 2.1 is a great step in the right direction. I am really looking forward
   -1   450 to it. It seems to be active, even though it has been in draft state for a long
   -1   451 time.
   -1   452 
   -1   453 But it still has way to many options.