- commit
- f6bb96e25ebc47d8f800c7a57ef4bde656377094
- parent
- 835add61864211ccad823dd0dd92ff0f4040fd02
- Author
- Tobias Bengfort <tobias.bengfort@posteo.de>
- Date
- 2025-02-02 14:49
post: oidc
Diffstat
| A | _content/posts/2025-01-07-oidc/index.md | 453 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 files changed, 453 insertions, 0 deletions
diff --git a/_content/posts/2025-01-07-oidc/index.md b/_content/posts/2025-01-07-oidc/index.md
@@ -0,0 +1,453 @@
-1 1 ---
-1 2 title: I don't really like OIDC
-1 3 date: 2025-01-07
-1 4 tags: [code, security]
-1 5 description: "I will look into this single sign-on protocol and figure out why it is so darn complicated."
-1 6 ---
-1 7
-1 8 When an organization grows, centralized account management becomes an important
-1 9 issue. The modern protocol to do single sign-on (SSO) is called [OpenID
-1 10 Connect](https://openid.net/specs/openid-connect-core-1_0.html#) (OIDC). In
-1 11 this post I will look into this protocol and figure out why it is so darn
-1 12 complicated.
-1 13
-1 14 ## My naive expectation
-1 15
-1 16 This is going to be quite a long post. But let's kick things off with a
-1 17 concrete example that illustrates what I expected, based on some basic
-1 18 knowledge about SSO:
-1 19
-1 20 1. When I try to access the application, I get redirected to a SSO login form:
-1 21
-1 22 $ curl https://myapp.example/
-1 23 < HTTP/1.1 303 See Other
-1 24 < Location: https://sso.example/login/?client_id=myapp
-1 25
-1 26 2. I authenticate (e.g. by providing a username and password) and get
-1 27 redirected back to the application.
-1 28
-1 29 $ curl https://sso.example/login/?client_id=myapp
-1 30 --form "username=tobias"
-1 31 --form "password=…"
-1 32 < HTTP/1.1 303 See Other
-1 33 < Location: https://myapp.example/?code=ABC123
-1 34
-1 35 3. I am back at the application, but now with a `code` parameter. To verify
-1 36 this authorization code, the application (not my browser!) sends it back
-1 37 to the SSO provider:
-1 38
-1 39 $ curl https://sso.example/verify/
-1 40 --form "code=ABC123"
-1 41
-1 42 4. The SSO provider verifies the authorization code and responds with some
-1 43 information about my account, most notably a unique identifier.
-1 44
-1 45 < HTTP/1.1 200 OK
-1 46 < Content-Type: application/json
-1 47 {
-1 48 "username": "tobias",
-1 49 "email": …,
-1 50 "name": …,
-1 51 "groups": […]
-1 52 }
-1 53
-1 54 That's it.
-1 55
-1 56
-1 57 ## The actual protocol
-1 58
-1 59 In reality, OpenID Connect has some additional steps:
-1 60
-1 61 1. The application fetches some information needed to interact with the
-1 62 SSO provider:
-1 63
-1 64 $ curl https://identifier-provider.example/.well-known/openid-configuration/
-1 65 < HTTP/1.1 200 OK
-1 66 {
-1 67 "issuer": "https://sso.example",
-1 68 "authorization_endpoint": "https://sso.example/login/",
-1 69 "token_endpoint": "https://sso.example/token/",
-1 70 "userinfo_endpoint": "https://sso.example/userinfo/",
-1 71 "jwks_uri": "https://sso.example/jwks/",
-1 72 "response_types_supported": ["code"],
-1 73 "grant_types_supported": ["authorization_code"],
-1 74 "id_token_signing_alg_values_supported": ["RS256"],
-1 75 "token_endpoint_auth_methods_supported": ["client_secret_post"],
-1 76 "code_challenge_methods_supported": ["S256"]
-1 77 …
-1 78 }
-1 79
-1 80 2. When I try to access the application, I get redirected to the authorization
-1 81 endpoint:
-1 82
-1 83 $ curl https://myapp.example/
-1 84 < HTTP/1.1 303 See Other
-1 85 < Location: https://sso.example/login/
-1 86 ?client_id=myapp
-1 87 &response_type=code
-1 88 &scope=openid+email+profile
-1 89 &redirect_uri=https%3A%2F%2Fmyapp.example%2F
-1 90 &state=XXX
-1 91 &nonce=YYY
-1 92 &code_challenge=ZZZ
-1 93 &code_challenge_method=S256
-1 94
-1 95 3. I authenticate (e.g. by providing a username and password) and get
-1 96 redirected back to the application.
-1 97
-1 98 $ curl https://sso.example/login/
-1 99 ?client_id=myapp
-1 100 &response_type=code
-1 101 &scope=openid+email+profile
-1 102 &redirect_uri=https%3A%2F%2Fmyapp.example%2F
-1 103 &state=XXX
-1 104 &nonce=YYY
-1 105 &code_challenge=ZZZ
-1 106 &code_challenge_method=S256
-1 107 --form "username=tobias"
-1 108 --form "password=…"
-1 109 < HTTP/1.1 303 See Other
-1 110 < Location: https://myapp.example/?code=ABC123&state=XXX
-1 111
-1 112 As part of this authentication, I also explicitly consent that the
-1 113 application may access my information on the SSO provider.
-1 114
-1 115 4. I am back at the application, but now with `code` and `state` parameters.
-1 116 First, the application checks if the `state` parameter matches the one
-1 117 it sent in step 2. After that, to verify the authorization code, the
-1 118 application (not my browser!) sends it to the token endpoint:
-1 119
-1 120 $ curl https://sso.example/token/
-1 121 --form "client_id=myapp"
-1 122 --form "client_secret=…"
-1 123 --form "code=ABC123"
-1 124 --form "code_verifier=…"
-1 125 --form "grant_type=authorization_code"
-1 126
-1 127 5. The SSO provider checks that the `client_secret` and `code_verifier`
-1 128 parameters match and that the authorization code is both valid and has not
-1 129 been used before. Then it responds with some tokens.
-1 130
-1 131 < HTTP/1.1 200 OK
-1 132 < Content-Type: application/json
-1 133 < Cache-Control: no-store
-1 134 {
-1 135 "id_token": …,
-1 136 "access_token": "TTT",
-1 137 "token_type": "Bearer,
-1 138 }
-1 139
-1 140 6. The ID token is a JWT (basically a signed JSON blob) that contains some
-1 141 additional information:
-1 142
-1 143 {
-1 144 "iss": "https://identifier-provider.example",
-1 145 "iat": 1736000000,
-1 146 "exp": 1736000020,
-1 147 "aud": "myapp",
-1 148 "nonce": "YYY"
-1 149 }
-1 150
-1 151 The application now does all kinds of verification:
-1 152
-1 153 - check the signature of the JWK (using the keys received from `jwks_uri`
-1 154 in step 1)
-1 155 - check that "iss" matches the "issuer" from step 1
-1 156 - check that the token has been issued in the past (`iat`) and that it
-1 157 has not yet expired (`exp`).
-1 158 - check that this token was created for this client (`aud`)
-1 159 - check that the nonce matches the one that was sent in step 2
-1 160
-1 161 7. Finally, the application fetches the user information from the userinfo
-1 162 endpoint, using the access token received in step 5:
-1 163
-1 164 $ curl https://sso.example/userinfo/ -H 'Authorization: Bearer TTT'
-1 165 < HTTP/1.1 200 OK
-1 166 < Content-Type: application/json
-1 167 {
-1 168 "sub": "tobias",
-1 169 "email": …,
-1 170 "name": …,
-1 171 "groups": […]
-1 172 }
-1 173
-1 174 This protocol is obviously much more complicated than my naive expectation
-1 175 (though the basic structure is the same). In the following sections I want
-1 176 to examine all the little differences and ask: Why is it there and is it really
-1 177 necessary?
-1 178
-1 179 ## OAuth legacy
-1 180
-1 181 As a first step it is important to understand that OpenID Connect is based on
-1 182 [OAuth 2](https://www.rfc-editor.org/rfc/rfc6749).
-1 183
-1 184 OAuth is not really an authentication protocol by itself. I feel like most
-1 185 explanations are overly complicated, so I will use an example instead:
-1 186
-1 187 *There is a cool new service called awesome-meetings.example. I want to start
-1 188 using it immediately, but first it needs access to my calendar. So I press a
-1 189 button and get redirected to serious-calendar.example, where I verify that I
-1 190 indeed want to share my calendar with awesome-meetings.example. I get
-1 191 redirected back and can start scheduling meetings.[^1]*
-1 192
-1 193 [^1]: For another great example, see [this stackoverflow
-1 194 answer](https://stackoverflow.com/questions/4727226/#32534239).
-1 195 Another good introduction is [OAuth from First
-1 196 Principles](https://stack-auth.com/blog/oauth-from-first-principles).
-1 197
-1 198 What happens in the background is basically the same as the protocol I
-1 199 described above. awesome-meetings.example ends up with an access token that it
-1 200 can use to access my calendar. The `scope` parameter restricts what the token
-1 201 can be used for. In this example, the token can only be used to access my
-1 202 calendar, but not my address book.
-1 203
-1 204 The OpenID Connect authors squinted at this and decided that being allowed to
-1 205 access a user's data is really the same as authentication. They also figured
-1 206 that big companies like Google, Facebook, or Microsoft would probably want to
-1 207 provide both SSO and resource access. So combining the two seemed like a good
-1 208 fit.
-1 209
-1 210 OpenID Connect mostly adds the concept of the ID token, as well as the `nonce`
-1 211 parameter. We will discuss both later in this article. They also add the
-1 212 `.well-known/openid-configuration/` endpoint, which makes sense given all the
-1 213 available options.
-1 214
-1 215 Because oh boy are there options. The protocol I described above is just one of
-1 216 many possible ways to do it. There are many different and incompatible
-1 217 authentication schemes built on top of OAuth. OpenID Connect standardizes some
-1 218 of that, and [OAuth 2.1](https://datatracker.ietf.org/doc/draft-ietf-oauth-v2-1/)
-1 219 (still a draft) removes some further options.
-1 220
-1 221 Even though some options have been removed, there are still plenty left. For
-1 222 example, there are at least two ways to pass user information to applications
-1 223 (none of which match my expectation): It can be included in the ID token or
-1 224 received from a separate userinfo endpoint. I have seen both in the wild.
-1 225 Realistically, SSO services need do both to be compatible.
-1 226
-1 227 ## Terminology
-1 228
-1 229 Quick note on naming things:
-1 230
-1 231 - SAML uses the terms "service provider" (SP) and "identity provider" (IdP)
-1 232 - OAuth uses the terms "client", "authorization server" (AS), and "resource server" (RS)
-1 233 - OpenID uses the terms "relying party" (RP) and "OpenID Provider" (OP)
-1 234 - I talk above about "application" and "SSO provider".
-1 235
-1 236 I am sorry for adding yet another set of terms, but I find all the others
-1 237 really confusing.
-1 238
-1 239 ## Threat Analysis
-1 240
-1 241 In non-SSO login, there are two main attack vectors: Either you manage to trick
-1 242 the login (e.g. by guessing the password) or you manage to steal a session
-1 243 cookie. Both of these vectors are the exactly the same with SSO.
-1 244
-1 245 The benefits are that you only have a single login implementation, so you can
-1 246 focus on making that really robust. You also only expose the password to a
-1 247 single service, which is an improvement over older SSO mechanisms such as LDAP,
-1 248 where the password was given to each application which verified it with the
-1 249 SSO provider in the background.
-1 250
-1 251 But there is also new attack surface. Authorization codes are sufficient to log
-1 252 in, and they are easily stolen (e.g. from the browser history). It is therefore
-1 253 crucial that they expire quickly, and also once they have been used. They
-1 254 should also not contain any personal information about the user.
-1 255
-1 256 A second, less obvious attack, is that an attacker could get a user to click a
-1 257 link with a crafted authorization code. As a result, the user might do
-1 258 something using the attackers account, while thinking they are using their own.
-1 259
-1 260 Of course, misconfigured applications may also allow to bypass SSO, maybe even
-1 261 register new accounts. Correct configuration is crucial.
-1 262
-1 263 ## Threat Mitigations: State, Nonce, and Code Challenge
-1 264
-1 265 These three parameters can be used to further limit the risk of authorization
-1 266 code injection. They all work very similarly: A random value is stored in the
-1 267 application session, and a cryptographic hash is sent in the initial request
-1 268 and then passed along. When it comes time to check the value it is compared to
-1 269 the hash of the value in the session again.
-1 270
-1 271 This way the whole transaction is bound to the application session. Even if an
-1 272 attacker would steel the authorization code, they could only use it if they
-1 273 also manage to steal the session cookie (e.g. by getting physical access to the
-1 274 device), by which point they don't really need the authorization code anymore.
-1 275
-1 276 These mechanisms also significantly raise the bar for supplying crafted
-1 277 authorization codes, because attackers need to include parameters that match
-1 278 the ones in the user's session (e.g. by witnessing the initial authentication
-1 279 request).
-1 280
-1 281 The differences between these parameters are small: `state` is checked in step
-1 282 4, so it can prevent making the token request. `code_challenge` is checked in
-1 283 step 5, so the token request is made, but the application does not receive
-1 284 tokens. `nonce` is checked in step 6, at the very end.
-1 285
-1 286 One benefit of `code_challenge` is that it is checked by the SSO provider, so
-1 287 by requiring it you can be sure that it is implemented correctly everywhere. Of
-1 288 course that requires that all applications are compatible.
-1 289
-1 290 So which one should you implement? This is another case where I wish the spec
-1 291 had less options. Right now, for the sake of compatibility, it is probably best
-1 292 to support all of them. On the other hand, this increases the risk of downgrade
-1 293 attacks.
-1 294
-1 295 ## ID token
-1 296
-1 297 The main addition of OpenID Connect on top of OAuth is the ID token. From what
-1 298 I understand, it is completely redundant.
-1 299
-1 300 - Its cryptographic signature can be used to verify that authorization code,
-1 301 but we have already done that by sending it to the token endpoint over a
-1 302 TLS connection.
-1 303 - It can contain information about the user, but we can also get that from
-1 304 the userinfo endpoint.
-1 305
-1 306 In an alternate world, we would receive the ID token directly instead of taking
-1 307 the detour of using an authorization code (this is called the "implicit flow"
-1 308 in OAuth). We would then validate the ID token and extract the user info, no
-1 309 additional requests necessary.
-1 310
-1 311 My main issue, again, is that there are too many options. We should pick one.
-1 312 And we should certainly not have to support both, that is just unnecessary
-1 313 complexity.
-1 314
-1 315 In the implicit flow, the tokens are passed in the URL and end up in the
-1 316 browser history, from where they can easily be stolen. This is not so much an
-1 317 issue for the SSO usecase, because the tokens have limited use there. But in
-1 318 the OAuth usecase, this is a real issue. I don't want people to steal the
-1 319 access token to my calendar.
-1 320
-1 321 OAuth 2.1 therefore went ahead and removed the implicit flow completely. This
-1 322 is a huge step in the right direction (which would also make the
-1 323 `response_type=code` parameter obsolete if it wasn't for backwards
-1 324 compatibility). If the OpenID Connect spec got rebased onto that, it could be
-1 325 simplified massively. Maybe the ID token could even be removed.
-1 326
-1 327 ## Dynamic Redirects
-1 328
-1 329 The authorization endpoint receives both a `client_id` and a `redirect_uri`
-1 330 parameter. However, it would be insecure to allow arbitrary values for
-1 331 `redirect_uri`. This would for example allow to redirect to an
-1 332 attacker-controlled URI that steals the authorization code.
-1 333
-1 334 Of course, always redirecting to the application start page would be annoying
-1 335 for users. When I open a link and need to log in before accessing the page, I
-1 336 want to get redirected to that page after login.
-1 337
-1 338 In the end, only the application can decide which redirect URIs are safe. So
-1 339 the best solution is to always redirect to a pre-defined URI and let the
-1 340 application handle the rest. In the meantime, the application could store the
-1 341 original URI in the session.
-1 342
-1 343 In other words: The `redirect_uri` parameter is completely dispensable.
-1 344
-1 345 ## Client Secret
-1 346
-1 347 The token endpoint receives a `client_secret` parameter. This allows the SSO
-1 348 provider to verify that the request comes from the same application for which
-1 349 the authorization code has been created. This is of course important for the
-1 350 OAuth usecase, because you don't want the wrong application to receive the
-1 351 access token for your calendar.
-1 352
-1 353 For the SSO usecase, this is less relevant though. What is the worst thing that
-1 354 could happen? A malicious client learns that I can successfully authenticate?
-1 355 That doesn't sound so bad. The token endpoint may give you access to [some
-1 356 limited information about the
-1 357 user](https://openid.net/specs/openid-connect-core-1_0.html#StandardClaims)
-1 358 though.
-1 359
-1 360 There may be more attacks that I don't see right now. Protecting the user
-1 361 information alone might be worth it. So I don't really mind it.
-1 362
-1 363 But again, there are way too many options: "the authorization server MAY accept
-1 364 any form of client authentication meeting its security requirements (e.g.,
-1 365 password, public/private key pair)."
-1 366
-1 367 ## Native Applications
-1 368
-1 369 So far I mostly assumed that the application is a server-side web application.
-1 370 If instead the application is a SPA or a native app, things get more
-1 371 complicated:
-1 372
-1 373 - The client secret is exposed
-1 374 - The values for `state`, `code_challenge`, and `nonce` are exposed
-1 375 - The request to the token endpoint uses the user's network, which makes MITM
-1 376 attacks much simpler
-1 377 - The authorization endpoint cannot simply redirect to a native app as you
-1 378 would to a web application
-1 379
-1 380 I will not go into more detail here. The OAuth spec has a [whole section on
-1 381 native applications](https://www.ietf.org/archive/id/draft-ietf-oauth-v2-1-12.html#name-native-applications).
-1 382 Just be aware that they are special.
-1 383
-1 384 ## Logout
-1 385
-1 386 One nice feature of SSO is that you may not even notice it: Clicking the login
-1 387 button in an application may seemingly just refresh the page and log you in.
-1 388 This is because the authorization endpoint can just redirect you back
-1 389 immediately if you are already logged in at your SSO provider.
-1 390
-1 391 However, there is an issue: Users may not realize that they are logged in at
-1 392 the SSO provider. Imagine someone using a shared computer in a library. They
-1 393 log in to their email account using SSO, then log out of the email account
-1 394 again. But they are still logged in on the SSO provider. The next person using
-1 395 the device could trivially log back in.
-1 396
-1 397 I can think of multiple solutions:
-1 398
-1 399 - When I log out of any application, I am also logged out of the SSO
-1 400 provider.
-1 401 - When I log out of any application, I am also logged out of the SSO provider
-1 402 and all other applications.
-1 403 - The SSO provider does not keep a session. When I want to log in to a second
-1 404 service I have to authenticate again.
-1 405 - Just don't use shared devices.
-1 406
-1 407 I believe the issue here is that we do not have a shared mental model of how
-1 408 SSO logout should work. It may also depend on context. For example, I sometimes
-1 409 use github for SSO, but I also use github for other things, so I know that I
-1 410 have a session there. On the other hand, I would not remember to log out of
-1 411 keycloak because that is literally only used for SSO.
-1 412
-1 413 ## Zombie Sessions
-1 414
-1 415 Having centralized account management is nice. When a person leaves your
-1 416 organization, you can simply remove their account and they immediately loose
-1 417 access.
-1 418
-1 419 However, as I described so far, SSO is only used for initial authentication.
-1 420 After that, each application has its own session. People might hold on to their
-1 421 sessions long after the SSO account has been removed.
-1 422
-1 423 In the OAuth usecase, the access tokens connected to the central account would
-1 424 also expire. But in the SSO usecase, there is no standardized solution that I
-1 425 know of. Each application must be handled individually.
-1 426
-1 427 ## Permission management
-1 428
-1 429 When you have centralized account management, you may also want to do
-1 430 centralized permission management. To a degree this is possible.
-1 431
-1 432 On a basic level, you can configure to which applications an account even has
-1 433 access. You could also configure groups at the SSO provider that get mapped to
-1 434 application groups. But in my experience, this only gets you so far. You will
-1 435 probably still have some application specific permission management.
-1 436
-1 437 ## Conclusion
-1 438
-1 439 OpenID Connect is a solid SSO protocol. It also comes with a semi-automatic
-1 440 [conformance test suite](https://www.certification.openid.net), which is great.
-1 441 Unfortunately, it suffers from far too many options and some missed
-1 442 opportunities. The job of a standard is not to show the set of possibilities,
-1 443 but to restrict it. This is especially true for security sensitive protocols
-1 444 such as this one.
-1 445
-1 446 I do understand that some things should be pluggable. Cryptographic primitives
-1 447 need regular updates. But that's basically it.
-1 448
-1 449 OAuth 2.1 is a great step in the right direction. I am really looking forward
-1 450 to it. It seems to be active, even though it has been in draft state for a long
-1 451 time.
-1 452
-1 453 But it still has way to many options.