- commit
- f6bb96e25ebc47d8f800c7a57ef4bde656377094
- parent
- 835add61864211ccad823dd0dd92ff0f4040fd02
- Author
- Tobias Bengfort <tobias.bengfort@posteo.de>
- Date
- 2025-02-02 14:49
post: oidc
Diffstat
A | _content/posts/2025-01-07-oidc/index.md | 453 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 files changed, 453 insertions, 0 deletions
diff --git a/_content/posts/2025-01-07-oidc/index.md b/_content/posts/2025-01-07-oidc/index.md
@@ -0,0 +1,453 @@ -1 1 --- -1 2 title: I don't really like OIDC -1 3 date: 2025-01-07 -1 4 tags: [code, security] -1 5 description: "I will look into this single sign-on protocol and figure out why it is so darn complicated." -1 6 --- -1 7 -1 8 When an organization grows, centralized account management becomes an important -1 9 issue. The modern protocol to do single sign-on (SSO) is called [OpenID -1 10 Connect](https://openid.net/specs/openid-connect-core-1_0.html#) (OIDC). In -1 11 this post I will look into this protocol and figure out why it is so darn -1 12 complicated. -1 13 -1 14 ## My naive expectation -1 15 -1 16 This is going to be quite a long post. But let's kick things off with a -1 17 concrete example that illustrates what I expected, based on some basic -1 18 knowledge about SSO: -1 19 -1 20 1. When I try to access the application, I get redirected to a SSO login form: -1 21 -1 22 $ curl https://myapp.example/ -1 23 < HTTP/1.1 303 See Other -1 24 < Location: https://sso.example/login/?client_id=myapp -1 25 -1 26 2. I authenticate (e.g. by providing a username and password) and get -1 27 redirected back to the application. -1 28 -1 29 $ curl https://sso.example/login/?client_id=myapp -1 30 --form "username=tobias" -1 31 --form "password=…" -1 32 < HTTP/1.1 303 See Other -1 33 < Location: https://myapp.example/?code=ABC123 -1 34 -1 35 3. I am back at the application, but now with a `code` parameter. To verify -1 36 this authorization code, the application (not my browser!) sends it back -1 37 to the SSO provider: -1 38 -1 39 $ curl https://sso.example/verify/ -1 40 --form "code=ABC123" -1 41 -1 42 4. The SSO provider verifies the authorization code and responds with some -1 43 information about my account, most notably a unique identifier. -1 44 -1 45 < HTTP/1.1 200 OK -1 46 < Content-Type: application/json -1 47 { -1 48 "username": "tobias", -1 49 "email": …, -1 50 "name": …, -1 51 "groups": […] -1 52 } -1 53 -1 54 That's it. -1 55 -1 56 -1 57 ## The actual protocol -1 58 -1 59 In reality, OpenID Connect has some additional steps: -1 60 -1 61 1. The application fetches some information needed to interact with the -1 62 SSO provider: -1 63 -1 64 $ curl https://identifier-provider.example/.well-known/openid-configuration/ -1 65 < HTTP/1.1 200 OK -1 66 { -1 67 "issuer": "https://sso.example", -1 68 "authorization_endpoint": "https://sso.example/login/", -1 69 "token_endpoint": "https://sso.example/token/", -1 70 "userinfo_endpoint": "https://sso.example/userinfo/", -1 71 "jwks_uri": "https://sso.example/jwks/", -1 72 "response_types_supported": ["code"], -1 73 "grant_types_supported": ["authorization_code"], -1 74 "id_token_signing_alg_values_supported": ["RS256"], -1 75 "token_endpoint_auth_methods_supported": ["client_secret_post"], -1 76 "code_challenge_methods_supported": ["S256"] -1 77 … -1 78 } -1 79 -1 80 2. When I try to access the application, I get redirected to the authorization -1 81 endpoint: -1 82 -1 83 $ curl https://myapp.example/ -1 84 < HTTP/1.1 303 See Other -1 85 < Location: https://sso.example/login/ -1 86 ?client_id=myapp -1 87 &response_type=code -1 88 &scope=openid+email+profile -1 89 &redirect_uri=https%3A%2F%2Fmyapp.example%2F -1 90 &state=XXX -1 91 &nonce=YYY -1 92 &code_challenge=ZZZ -1 93 &code_challenge_method=S256 -1 94 -1 95 3. I authenticate (e.g. by providing a username and password) and get -1 96 redirected back to the application. -1 97 -1 98 $ curl https://sso.example/login/ -1 99 ?client_id=myapp -1 100 &response_type=code -1 101 &scope=openid+email+profile -1 102 &redirect_uri=https%3A%2F%2Fmyapp.example%2F -1 103 &state=XXX -1 104 &nonce=YYY -1 105 &code_challenge=ZZZ -1 106 &code_challenge_method=S256 -1 107 --form "username=tobias" -1 108 --form "password=…" -1 109 < HTTP/1.1 303 See Other -1 110 < Location: https://myapp.example/?code=ABC123&state=XXX -1 111 -1 112 As part of this authentication, I also explicitly consent that the -1 113 application may access my information on the SSO provider. -1 114 -1 115 4. I am back at the application, but now with `code` and `state` parameters. -1 116 First, the application checks if the `state` parameter matches the one -1 117 it sent in step 2. After that, to verify the authorization code, the -1 118 application (not my browser!) sends it to the token endpoint: -1 119 -1 120 $ curl https://sso.example/token/ -1 121 --form "client_id=myapp" -1 122 --form "client_secret=…" -1 123 --form "code=ABC123" -1 124 --form "code_verifier=…" -1 125 --form "grant_type=authorization_code" -1 126 -1 127 5. The SSO provider checks that the `client_secret` and `code_verifier` -1 128 parameters match and that the authorization code is both valid and has not -1 129 been used before. Then it responds with some tokens. -1 130 -1 131 < HTTP/1.1 200 OK -1 132 < Content-Type: application/json -1 133 < Cache-Control: no-store -1 134 { -1 135 "id_token": …, -1 136 "access_token": "TTT", -1 137 "token_type": "Bearer, -1 138 } -1 139 -1 140 6. The ID token is a JWT (basically a signed JSON blob) that contains some -1 141 additional information: -1 142 -1 143 { -1 144 "iss": "https://identifier-provider.example", -1 145 "iat": 1736000000, -1 146 "exp": 1736000020, -1 147 "aud": "myapp", -1 148 "nonce": "YYY" -1 149 } -1 150 -1 151 The application now does all kinds of verification: -1 152 -1 153 - check the signature of the JWK (using the keys received from `jwks_uri` -1 154 in step 1) -1 155 - check that "iss" matches the "issuer" from step 1 -1 156 - check that the token has been issued in the past (`iat`) and that it -1 157 has not yet expired (`exp`). -1 158 - check that this token was created for this client (`aud`) -1 159 - check that the nonce matches the one that was sent in step 2 -1 160 -1 161 7. Finally, the application fetches the user information from the userinfo -1 162 endpoint, using the access token received in step 5: -1 163 -1 164 $ curl https://sso.example/userinfo/ -H 'Authorization: Bearer TTT' -1 165 < HTTP/1.1 200 OK -1 166 < Content-Type: application/json -1 167 { -1 168 "sub": "tobias", -1 169 "email": …, -1 170 "name": …, -1 171 "groups": […] -1 172 } -1 173 -1 174 This protocol is obviously much more complicated than my naive expectation -1 175 (though the basic structure is the same). In the following sections I want -1 176 to examine all the little differences and ask: Why is it there and is it really -1 177 necessary? -1 178 -1 179 ## OAuth legacy -1 180 -1 181 As a first step it is important to understand that OpenID Connect is based on -1 182 [OAuth 2](https://www.rfc-editor.org/rfc/rfc6749). -1 183 -1 184 OAuth is not really an authentication protocol by itself. I feel like most -1 185 explanations are overly complicated, so I will use an example instead: -1 186 -1 187 *There is a cool new service called awesome-meetings.example. I want to start -1 188 using it immediately, but first it needs access to my calendar. So I press a -1 189 button and get redirected to serious-calendar.example, where I verify that I -1 190 indeed want to share my calendar with awesome-meetings.example. I get -1 191 redirected back and can start scheduling meetings.[^1]* -1 192 -1 193 [^1]: For another great example, see [this stackoverflow -1 194 answer](https://stackoverflow.com/questions/4727226/#32534239). -1 195 Another good introduction is [OAuth from First -1 196 Principles](https://stack-auth.com/blog/oauth-from-first-principles). -1 197 -1 198 What happens in the background is basically the same as the protocol I -1 199 described above. awesome-meetings.example ends up with an access token that it -1 200 can use to access my calendar. The `scope` parameter restricts what the token -1 201 can be used for. In this example, the token can only be used to access my -1 202 calendar, but not my address book. -1 203 -1 204 The OpenID Connect authors squinted at this and decided that being allowed to -1 205 access a user's data is really the same as authentication. They also figured -1 206 that big companies like Google, Facebook, or Microsoft would probably want to -1 207 provide both SSO and resource access. So combining the two seemed like a good -1 208 fit. -1 209 -1 210 OpenID Connect mostly adds the concept of the ID token, as well as the `nonce` -1 211 parameter. We will discuss both later in this article. They also add the -1 212 `.well-known/openid-configuration/` endpoint, which makes sense given all the -1 213 available options. -1 214 -1 215 Because oh boy are there options. The protocol I described above is just one of -1 216 many possible ways to do it. There are many different and incompatible -1 217 authentication schemes built on top of OAuth. OpenID Connect standardizes some -1 218 of that, and [OAuth 2.1](https://datatracker.ietf.org/doc/draft-ietf-oauth-v2-1/) -1 219 (still a draft) removes some further options. -1 220 -1 221 Even though some options have been removed, there are still plenty left. For -1 222 example, there are at least two ways to pass user information to applications -1 223 (none of which match my expectation): It can be included in the ID token or -1 224 received from a separate userinfo endpoint. I have seen both in the wild. -1 225 Realistically, SSO services need do both to be compatible. -1 226 -1 227 ## Terminology -1 228 -1 229 Quick note on naming things: -1 230 -1 231 - SAML uses the terms "service provider" (SP) and "identity provider" (IdP) -1 232 - OAuth uses the terms "client", "authorization server" (AS), and "resource server" (RS) -1 233 - OpenID uses the terms "relying party" (RP) and "OpenID Provider" (OP) -1 234 - I talk above about "application" and "SSO provider". -1 235 -1 236 I am sorry for adding yet another set of terms, but I find all the others -1 237 really confusing. -1 238 -1 239 ## Threat Analysis -1 240 -1 241 In non-SSO login, there are two main attack vectors: Either you manage to trick -1 242 the login (e.g. by guessing the password) or you manage to steal a session -1 243 cookie. Both of these vectors are the exactly the same with SSO. -1 244 -1 245 The benefits are that you only have a single login implementation, so you can -1 246 focus on making that really robust. You also only expose the password to a -1 247 single service, which is an improvement over older SSO mechanisms such as LDAP, -1 248 where the password was given to each application which verified it with the -1 249 SSO provider in the background. -1 250 -1 251 But there is also new attack surface. Authorization codes are sufficient to log -1 252 in, and they are easily stolen (e.g. from the browser history). It is therefore -1 253 crucial that they expire quickly, and also once they have been used. They -1 254 should also not contain any personal information about the user. -1 255 -1 256 A second, less obvious attack, is that an attacker could get a user to click a -1 257 link with a crafted authorization code. As a result, the user might do -1 258 something using the attackers account, while thinking they are using their own. -1 259 -1 260 Of course, misconfigured applications may also allow to bypass SSO, maybe even -1 261 register new accounts. Correct configuration is crucial. -1 262 -1 263 ## Threat Mitigations: State, Nonce, and Code Challenge -1 264 -1 265 These three parameters can be used to further limit the risk of authorization -1 266 code injection. They all work very similarly: A random value is stored in the -1 267 application session, and a cryptographic hash is sent in the initial request -1 268 and then passed along. When it comes time to check the value it is compared to -1 269 the hash of the value in the session again. -1 270 -1 271 This way the whole transaction is bound to the application session. Even if an -1 272 attacker would steel the authorization code, they could only use it if they -1 273 also manage to steal the session cookie (e.g. by getting physical access to the -1 274 device), by which point they don't really need the authorization code anymore. -1 275 -1 276 These mechanisms also significantly raise the bar for supplying crafted -1 277 authorization codes, because attackers need to include parameters that match -1 278 the ones in the user's session (e.g. by witnessing the initial authentication -1 279 request). -1 280 -1 281 The differences between these parameters are small: `state` is checked in step -1 282 4, so it can prevent making the token request. `code_challenge` is checked in -1 283 step 5, so the token request is made, but the application does not receive -1 284 tokens. `nonce` is checked in step 6, at the very end. -1 285 -1 286 One benefit of `code_challenge` is that it is checked by the SSO provider, so -1 287 by requiring it you can be sure that it is implemented correctly everywhere. Of -1 288 course that requires that all applications are compatible. -1 289 -1 290 So which one should you implement? This is another case where I wish the spec -1 291 had less options. Right now, for the sake of compatibility, it is probably best -1 292 to support all of them. On the other hand, this increases the risk of downgrade -1 293 attacks. -1 294 -1 295 ## ID token -1 296 -1 297 The main addition of OpenID Connect on top of OAuth is the ID token. From what -1 298 I understand, it is completely redundant. -1 299 -1 300 - Its cryptographic signature can be used to verify that authorization code, -1 301 but we have already done that by sending it to the token endpoint over a -1 302 TLS connection. -1 303 - It can contain information about the user, but we can also get that from -1 304 the userinfo endpoint. -1 305 -1 306 In an alternate world, we would receive the ID token directly instead of taking -1 307 the detour of using an authorization code (this is called the "implicit flow" -1 308 in OAuth). We would then validate the ID token and extract the user info, no -1 309 additional requests necessary. -1 310 -1 311 My main issue, again, is that there are too many options. We should pick one. -1 312 And we should certainly not have to support both, that is just unnecessary -1 313 complexity. -1 314 -1 315 In the implicit flow, the tokens are passed in the URL and end up in the -1 316 browser history, from where they can easily be stolen. This is not so much an -1 317 issue for the SSO usecase, because the tokens have limited use there. But in -1 318 the OAuth usecase, this is a real issue. I don't want people to steal the -1 319 access token to my calendar. -1 320 -1 321 OAuth 2.1 therefore went ahead and removed the implicit flow completely. This -1 322 is a huge step in the right direction (which would also make the -1 323 `response_type=code` parameter obsolete if it wasn't for backwards -1 324 compatibility). If the OpenID Connect spec got rebased onto that, it could be -1 325 simplified massively. Maybe the ID token could even be removed. -1 326 -1 327 ## Dynamic Redirects -1 328 -1 329 The authorization endpoint receives both a `client_id` and a `redirect_uri` -1 330 parameter. However, it would be insecure to allow arbitrary values for -1 331 `redirect_uri`. This would for example allow to redirect to an -1 332 attacker-controlled URI that steals the authorization code. -1 333 -1 334 Of course, always redirecting to the application start page would be annoying -1 335 for users. When I open a link and need to log in before accessing the page, I -1 336 want to get redirected to that page after login. -1 337 -1 338 In the end, only the application can decide which redirect URIs are safe. So -1 339 the best solution is to always redirect to a pre-defined URI and let the -1 340 application handle the rest. In the meantime, the application could store the -1 341 original URI in the session. -1 342 -1 343 In other words: The `redirect_uri` parameter is completely dispensable. -1 344 -1 345 ## Client Secret -1 346 -1 347 The token endpoint receives a `client_secret` parameter. This allows the SSO -1 348 provider to verify that the request comes from the same application for which -1 349 the authorization code has been created. This is of course important for the -1 350 OAuth usecase, because you don't want the wrong application to receive the -1 351 access token for your calendar. -1 352 -1 353 For the SSO usecase, this is less relevant though. What is the worst thing that -1 354 could happen? A malicious client learns that I can successfully authenticate? -1 355 That doesn't sound so bad. The token endpoint may give you access to [some -1 356 limited information about the -1 357 user](https://openid.net/specs/openid-connect-core-1_0.html#StandardClaims) -1 358 though. -1 359 -1 360 There may be more attacks that I don't see right now. Protecting the user -1 361 information alone might be worth it. So I don't really mind it. -1 362 -1 363 But again, there are way too many options: "the authorization server MAY accept -1 364 any form of client authentication meeting its security requirements (e.g., -1 365 password, public/private key pair)." -1 366 -1 367 ## Native Applications -1 368 -1 369 So far I mostly assumed that the application is a server-side web application. -1 370 If instead the application is a SPA or a native app, things get more -1 371 complicated: -1 372 -1 373 - The client secret is exposed -1 374 - The values for `state`, `code_challenge`, and `nonce` are exposed -1 375 - The request to the token endpoint uses the user's network, which makes MITM -1 376 attacks much simpler -1 377 - The authorization endpoint cannot simply redirect to a native app as you -1 378 would to a web application -1 379 -1 380 I will not go into more detail here. The OAuth spec has a [whole section on -1 381 native applications](https://www.ietf.org/archive/id/draft-ietf-oauth-v2-1-12.html#name-native-applications). -1 382 Just be aware that they are special. -1 383 -1 384 ## Logout -1 385 -1 386 One nice feature of SSO is that you may not even notice it: Clicking the login -1 387 button in an application may seemingly just refresh the page and log you in. -1 388 This is because the authorization endpoint can just redirect you back -1 389 immediately if you are already logged in at your SSO provider. -1 390 -1 391 However, there is an issue: Users may not realize that they are logged in at -1 392 the SSO provider. Imagine someone using a shared computer in a library. They -1 393 log in to their email account using SSO, then log out of the email account -1 394 again. But they are still logged in on the SSO provider. The next person using -1 395 the device could trivially log back in. -1 396 -1 397 I can think of multiple solutions: -1 398 -1 399 - When I log out of any application, I am also logged out of the SSO -1 400 provider. -1 401 - When I log out of any application, I am also logged out of the SSO provider -1 402 and all other applications. -1 403 - The SSO provider does not keep a session. When I want to log in to a second -1 404 service I have to authenticate again. -1 405 - Just don't use shared devices. -1 406 -1 407 I believe the issue here is that we do not have a shared mental model of how -1 408 SSO logout should work. It may also depend on context. For example, I sometimes -1 409 use github for SSO, but I also use github for other things, so I know that I -1 410 have a session there. On the other hand, I would not remember to log out of -1 411 keycloak because that is literally only used for SSO. -1 412 -1 413 ## Zombie Sessions -1 414 -1 415 Having centralized account management is nice. When a person leaves your -1 416 organization, you can simply remove their account and they immediately loose -1 417 access. -1 418 -1 419 However, as I described so far, SSO is only used for initial authentication. -1 420 After that, each application has its own session. People might hold on to their -1 421 sessions long after the SSO account has been removed. -1 422 -1 423 In the OAuth usecase, the access tokens connected to the central account would -1 424 also expire. But in the SSO usecase, there is no standardized solution that I -1 425 know of. Each application must be handled individually. -1 426 -1 427 ## Permission management -1 428 -1 429 When you have centralized account management, you may also want to do -1 430 centralized permission management. To a degree this is possible. -1 431 -1 432 On a basic level, you can configure to which applications an account even has -1 433 access. You could also configure groups at the SSO provider that get mapped to -1 434 application groups. But in my experience, this only gets you so far. You will -1 435 probably still have some application specific permission management. -1 436 -1 437 ## Conclusion -1 438 -1 439 OpenID Connect is a solid SSO protocol. It also comes with a semi-automatic -1 440 [conformance test suite](https://www.certification.openid.net), which is great. -1 441 Unfortunately, it suffers from far too many options and some missed -1 442 opportunities. The job of a standard is not to show the set of possibilities, -1 443 but to restrict it. This is especially true for security sensitive protocols -1 444 such as this one. -1 445 -1 446 I do understand that some things should be pluggable. Cryptographic primitives -1 447 need regular updates. But that's basically it. -1 448 -1 449 OAuth 2.1 is a great step in the right direction. I am really looking forward -1 450 to it. It seems to be active, even though it has been in draft state for a long -1 451 time. -1 452 -1 453 But it still has way to many options.