All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Calendar Versioning.
24.5.2 (2024-09-30)
- add hash to bootstrap (#1168) (4b0be5f)
- bolt: add workflow commands to bolt (#1131) (a47b296)
- builds: add prewarm ats for builds (#1176) (0a269b4)
- allocation sizes for nomad (#1127) (2e3217e)
- bolt: add forwarded and persistent db shells (#1130) (f9b6707)
- builds: allow null tags (#1177) (1c75e3b)
- builds: fix exclusive tags query (#1173) (b8c323a)
- clusters: allow dns deletion when draining and tainted (#1132) (c808d08)
- cluster: skip pruning servers without provider server id (#1133) (ca43432)
- ds, mm: hard code disk per core (#1134) (5ee2809)
- ds: add back runc cleanup (#1172) (8e08889)
- fix build tags (#1190) (6e2d214)
- fix documentation link for errors (#1174) (eb7fdaf)
- job-run: fix dupe allocs, re-enable drain all (#1128) (d019e01)
- mm, ds: fix dupe alloc killing (#1124) (dcdb06a)
- more accurate job-run cpu metrics (#1122) (312958e)
- reduce scheduler skew on distributed clusters (#1175) (2794e09)
- worfklows: add silence ts (#1129) (06d965b)
- workflows: add error message for max sql retries (#1125) (80a33f0)
- workflows: add retry delay for txn errors (#1138) (614846b)
- workflows: use unions instead of OR (#1170) (1ca8ab6)
- add back node exporter metrics (#1136) (1eeedcb)
- enable batch ssh commands (#1119) (505c09c)
- increase install timeout (#1139) (38584c9)
- increase nomad heartbeat ttl (#1140) (437494a)
- linode: pin kernel version (#1123) (48686f7)
- release 24.5.2 (90318ca)
- remove bolt templates (#1135) (f0925f0)
- revert new node exporter metrics (#1118) (07b6095)
24.5.1 (2024-09-04)
- clusters: add drain padding to nomad (#1100) (01ee21b)
- clusters: fix list lost op (#1110) (8ae85d2)
- clusters: gracefully handle node not found (#1099) (b460374)
- clusters: remove nomad drain complete signal (#1101) (c117224)
- clusters: switch from drain to ineligible system (#1102) (09f5143)
- ds: change nomad prefix (#1113) (705a470)
- ds: implement nomad monitors with signals (#1105) (238a8e9)
- fix signal history divergence (#1115) (3cbfc1b)
- job-run: delete second allocation immediately (#1104) (78b73fd)
- nomad: readd allocation metrics (#1109) (600d4fb)
- update api endpoint names (#1080) (33e780d)
- workflows: add retry to internal sql queries (#1112) (ef010d0)
- workflows: implement backoff for timeouts (#1111) (6659b34)
- main: release 24.5.0 (#1103) (7652421)
- release 24.5.0 (1657c7c)
- release 24.5.1 (12f7ee9)
- update all uses of workflows to new syntax (#1108) (0079be8)
- workflows: clean up internal contexts (#1107) (2148f9e)
24.5.0 (2024-08-27)
- add json cache (#939) (7c2897a)
- add ready_ts to servers endpoint (#1006) (8b44a7c)
- add server logs endpoint (#1005) (a23073b)
- better_uptime: allow disabling notifications (#923) (7eb12b0)
- bolt: add k9s to nix-shell (#903) (7668942)
- bolt: add lost servers list and prune commands (#1096) (0480702)
- bolt: build svcs as docker containers locally (#945) (11f4258)
- bolt: run tests in containers (#947) (08a53e3)
- clusters: add toggle for prebakes (#932) (09890e5)
- clusters: convert clusters to new workflow system (#974) (0c5558b)
- clusters: gg monitor for better uptime (#921) (152c55b)
- combine ops and workers into one svc type (#957) (774da5c)
- ds: add datacenters endpoint (#1065) (32d448e)
- ds: add server create failed message (#1068) (82daf2d)
- ds: rewrite dynamic servers on workflows (#1060) (c9b5578)
- infra: auto-create dev tunnel & public ip (#979) (0d82155)
- infra: enable configuring min & max cockroach pool conns (#922) (e8e7255)
- runtime: switch from json to logfmt (#984) (10a0e6c)
- svc: add servers create endpoint (#740) (77f1b3f)
- update billing to use tiers (#900) (918038a)
- workflows, clusters: add workflow backfill service (#1000) (e69b767)
- workflows: add api ctx for workflows (#865) (1a468d3)
- workflows: add loops (#1001) (272a09d)
- workflows: add message and signal history (#987) (0003acc)
- workflows: add messages (#977) (38c1171)
- workflows: add metrics (#1008) (a4837e2)
- workflows: add nats worker wake (#1039) (1fc72f1)
- workflows: add observe workflows fn (#901) (22a1ebd)
- workflows: add operations service type (#898) (0a0d377)
- workflows: add sleep fn (#1077) (c477ba9)
- workflows: add tags (#956) (36494eb)
- workflows: allow changing tags in workflow (#962) (01ecf86)
- workflows: implement retry backoff for activity errors (#999) (6e8560e)
- add ip whitelist to tunnels (#930) (88ce4b3)
- add players and servers db indexes (#960) (53dc398)
- add priority class to nats (#1019) (954d864)
- api: move cors verification to endpoint level (#1094) (4a4b4fe)
- backfill script, crdb usage type (#1089) (ad0a260)
- better_uptime: handle null verify_ssl (#950) (e9d8edb)
- bolt: correctly hash untracked files (#1047) (2b885e5)
- bolt: exclude volumes when using native docker builder (#969) (8ac0a55)
- bolt: explicitly handle no nomad leader error (#971) (20822fc)
- bolt: update opengb -> backend env var name (#1058) (4250808)
- bolt: validate hub regex in ns config (#1093) (b2d5cca)
- cache: mixed values in Cache::fetch_all (#927) (d69a072)
- captcha: sanitize form body (#1098) (9b56efc)
- chirp: write message tail when history is disabled (#997) (9f377ba)
- cloud: add clean timeout for matchmaker logs (#942) (a395e3f)
- cluster: dc-get column mismatch (#958) (53e276a)
- cluster: dns creation (#1066) (1ef72e6)
- clusters: add network_out metrics for hardware (#1016) (30d15c3)
- clusters: backfill json columns (#1015) (2292103)
- clusters: continue provisioning a server even when marked for deletion (#924) (8b551f4)
- clusters: dont delete servers immediately with linode (#1040) (6142837)
- clusters: fix backfill signal names (#1086) (2c8ae1c)
- clusters: fix dc scale job downscale logic, prebake disk waiting (#1078) (bda60e0)
- clusters: fix dns and unrecoverable error bugs (#1083) (273e5a3)
- clusters: fix linode cleanup logic (#1034) (f7d021c)
- clusters: fix linode-gc query (#1063) (eb0223c)
- clusters: fix tls renew query (#1026) (81a7b7a)
- clusters: fix trafficserver run dir permissions on reboot (#1021) (746198b)
- clusters: fix vlan ip query (#911) (0ab1ec9)
- cluster: split up backfill query from schema change (#1023) (4987029)
- clusters: query vlan ips per datacenter (#961) (c2a7e3f)
- clusters: resolve ip by create ts (#1037) (7033c6e)
- clusters: run scale workflow instead of signal (#1041) (cbe6f89)
- clusters: update pools in dc-update (#959) (9b31345)
- disable job migrations and reschedules (#1017) (91e869d)
- ds: add back allocation signal (#1069) (453a19b)
- ds: cache traefik routes (#1081) (4b3a1ab)
- ds: disable retries for nomad monitors (#1091) (945b5bb)
- ds: fix destroy query (#1067) (f67150f)
- ds: fix ds tests, traefik, nomad monitors, job server drain (#1085) (d29bb3f)
- ds: fix logs (#1074) (21dbd6c)
- ds: fix server list & nomad monitor alloc plan queries (#1071) (eb0252c)
- ds: fix servers (#1061) (4e8185b)
- ds: remove reschedule block (#1082) (4488c74)
- ds: update auth endpoints (#1044) (11416c4)
- fix ds echo build (#1032) (ad1146e)
- group: require > 1 use count on invites (#985) (b37565a)
- infra: dynamically generate nomad server count in install script (#981) (9c433d8)
- infra: force linux/amd64 platform for building job-runner artifact (#937) (1a32f90)
- infra: pass dynamic tunnel host port to cluster-server-install (#980) (8be472f)
- infra: re-run sshd config if dev tunnel machine recreated (#978) (7fa5cff)
- infra: remove dep on unused api_route secret (#935) (7fca24b)
- infra: remove k8s_infra -> cockroach_k8s circular dependency (#936) (41b6cdb)
- infra: resolve correct cockroachdb remote state (#976) (8413349)
- ip: cache ip queries (#907) (c36d150)
- k3d: mount host volume for PVCs (#1018) (07fae51)
- loops and cache (#1010) (bccce31)
- mm: clean up players from gc zset (#914) (d6d05f6)
- mm: move runtime aggregate logic into query (#966) (e545271)
- mm: skip prewarming ats if no nodes booted (#970) (61e9f14)
- opengb: opengb. -> backend. (#919) (dfe5f8b)
- remove trailing slash from endpoint (#1012) (b3bd44f)
- revert hotfix (#934) (115f02e)
- servers cors (#1013) (e46edfb)
- servers: fix broken insert (#1033) (6e79bc7)
- servers: remove migrate block (#1027) (eab8ec4)
- servers: use correct timeout for sleeping (#1076) (0c58f83)
- ssh: force user for ssh commands (#949) (ba02a16)
- update cloudflare crate (#1009) (4e478f1)
- workflow ts hotfix (#933) (20796db)
- workflow: fix sleep logic (#1084) (3202fdf)
- workflows: add back location bump to catch unrec (#1087) (4816533)
- workflows: add idx (#1038) (d825483)
- workflows: add limit to pulling workflows (#1020) (6766ea0)
- workflows: add sql retries, improve history diverged errors (#995) (9b0724f)
- workflows: add ts dt (#943) (1b362fd)
- workflows: dont delete signal rows (#965) (be67080)
- workflows: fix backfill (#1025) (6f7c94c)
- workflows: fix docs on macros (#1075) (1175ae5)
- workflows: fix gc, event history graph, internal naming (#963) (8b97b32)
- workflows: fix invalid error wrapping (#1092) (7014d1b)
- workflows: fix invalid event history graph (#996) (fe2c38e)
- workflows: fix listening traits (#988) (0e56121)
- workflows: fix loops queries (#1042) (63a7601)
- workflows: increase metrics publish interval (#1050) (b46300c)
- workflows: rename signals, improve failure handling for server install (#1043) (40cb84a)
- workflows: Throw errors for duplicate workflows (#1011) (53c3aeb)
- add build get endpoint (#1046) (e4f03fb)
- add game id to server endpoints (#1014) (31f586f)
- add historical server query (#1056) (c3d7c96)
- add lines to provisioning metrics (#912) (d0371e0)
- add sqlx max connection timeout jitter (#916) (4513a1f)
- archive old linode servers table (#1052) (f6126f6)
- bolt: add color to cargo build with docker (#1035) (7c324e5)
- bolt: update lockfile (#1029) (2140c0a)
- bolt: upgrade rust to 1.80.0 (#1028) (44f6aa7)
- build: add patching build tags (#1048) (812b7e2)
- cache mm-config-version-get (#913) (3b24383)
- clean up fern naming (#1045) (f4c13a8)
- cleanup runtime aggregate op (#902) (538d9b8)
- cloud: update default version format to not use special characters (#1003) (accb1d8)
- cluster: cache datacenter-get and datacenter-location-get (#908) (8863a8b)
- clusters: remove git as a dependency for cluster util (#931) (7c7eec3)
- ds: fix started_at server conversion (#1073) (ec498fb)
- ds: split up destroy wf + add progress msg (#1072) (fb3168b)
- fern: update fern (#1022) (e6fe279)
- fix dynamic servers merge (#1007) (07c4a75)
- fix monolith worker out of date (#1055) (387ee6b)
- group better uptime monitors (#972) (f57ba69)
- handle game version configs with bad proto migrations (#926) (853cf06)
- increase sql conn acquire rate limits (#915) (deca712)
- increase ttl of public tokens (#905) (93e705c)
- increase workflow tick interval (#941) (fb75556)
- infra: pin k3d image version (#975) (088e05e)
- k3d: disable volumes if using use_local_repo (#954) (c375325)
- make logs query consistent with nanoseconds (#862) (4ffab51)
- migrate from game service to env service tokens (#1054) (2bf6db2)
- migrate servers to use envs (#1053) (6b50e9e)
- read job-runner from ats (#968) (3fa0611)
- remove duplicate smithy code (#946) (7ebe1f1)
- remove servers webhook (#1051) (6c6282d)
- rename lib/types -> lib/types-proto (#986) (c4d40af)
- tls: remove unneeded acme registration (#953) (9c2e884)
- traffic-server forward script (#909) (a3528db)
- tweak pool opts (#1002) (74e36c0)
- tweak pool opts (#1004) (786829f)
- update opengb -> backend rename (#1049) (a5febc2)
- update opengb cf worker names (#1064) (904c024)
- update start_ts to be set when networking is ready (#1062) (22b3fec)
- update typescript sdk (#1031) (0e6d5fb)
- update workspace (#1030) (f738b17)
- workflows: add workflow name to logs (#928) (a3b31e0)
- workflows: clean up imports (#998) (9498cab)
- workflows: clean up internals (#899) (b840019)
- workflows: remove foo pkg (#964) (7165aed)
24.4.1 (2024-06-06)
- add compat layer between old ctx and new workflows (#788) (787971b)
- add ray ids to workflows, clean up types (#787) (3072bdc)
- add workflows (#783) (378d528)
- global error raw variant (#784) (4b11578)
- run sub workflows in the same process (#789) (717e096)
- workflows: add retries and timeouts (#860) (cc0b893)
- workflows: add worker instance failover (#854) (c5a32a3)
- cast workflow errors to raw global errors (#785) (c90d939)
- draining and tainted server grafana chart (#855) (d0cdb38)
- mm: add index for run_proxied_ports (#868) (e0785e9)
- mm: call mm-lobby-cleanup from mm-gc even for preemptive lobbies without sql row (#856) (5315a9a)
- mm: correctly handle lobby not found error if joining direclty to lobby id that doesn't exist (#867) (af3513a)
- mm: require specifying matchmaker config for new game versions (#895) (92d86fd)
- tls: provision cloudflare cert pack if opengb enabled (#869) (1dafa9e)
24.4.0 (2024-06-04)
- Cleanup API definitions, module imports (#534)
- add 1password integration docs (#595) (29045ea)
- Add cluster admin cli (#644) (5b1de57)
- add crdb data source to grafana (#732) (f22694f)
- add env update error (#814) (48a5883)
- add hacky secondary ingress route for game lobbies (#567) (8bb6bd6)
- add internal api monolith (#641) (f25ffe4)
- Add managed OpenGB (#535) (9085d51)
- add opengb to bootstrap (#844) (ebd3c7b)
- add operation to list all clusters (#717) (1f4b169)
- add patch method to router (#744) (ed6596c)
- add pool filter to cluster dashboard (#830) (5436461)
- add provider api token to all linode calls (#613) (3882047)
- add provisioning dashboard (#733) (a1f9dcc)
- add ray id to error body (#833) (c115d6f)
- add region list/resolve per game (#633) (92275d8)
- Add script to run cargo clean (#700) (0f653e2)
- add toggle for load tests (#583) (a78d682)
- add vector http source (#800) (f4f2734)
- api-admin: add server destroy endpoint (#838) (4ff616b)
- bolt: list datacenter CLI command (#728) (c4a88de)
- bolt: update datacenters from CLI (#727) (083cd19)
- configurable drain ts per pool (#684) (f88c457)
- dynamic TLS generation (#635) (66e49dd)
- grafana: rivet logs dashboard (#724) (9a43f3a)
- infra: add ability to provision dev tunnel (#692) (659f8a1)
- Infra: Loops welcome email (b2e4006)
- nix: skip building bolt in nix with NIX_SKIP_BOLT (#664) (8e16a94)
- svc: resolve cluster name id op (#751) (58200ec)
- add last upload id (#745) (d10d917)
- add min count to autoscaler (#826) (9fe12a1)
- add patch to CORS (#848) (09f3ddc)
- add region to dns for path routing (#574) (e10ad25)
- add transacitons (#689) (f55b7e6)
- add transactions and locks (#696) (477ade5)
- api admin hub endpoint is incorrect (#660) (0aff347)
- api-status: auto-delete lobby after testing connection (#770) (9803f39)
- ats: don't send requests to ats nodes without install_complete_ts (#807) (618a429)
- bolt: copy & install git in docker for cluster build.rs (#769) (12bf1d4)
- bolt: correctly check for existing env var (#705) (ca4e48d)
- bolt: dont fully parse config when pulling (#816) (d22b08b)
- bolt: uncomment provisioning check (#749) (f25bead)
- bolt: update rust test package_id parsing (#622) (3d987ab)
- Change sdks linguist-vendored to linguist-generated (#662) (602749f)
- change test relative path (#754) (daf1d07)
- check for draining before installing/creating dns (#773) (cbe450b)
- chirp: add bypass for recursive messages (#708) (566088f)
- CI regression (#713) (636f0d3)
- claims (#672) (d61e290)
- clean up nomad jobs per test (#596) (6d7f0ee)
- Cleanup API definitions, module imports (#534) (0e0660a)
- cluster: delete dns record after failure to create (#827) (35fc6fe)
- cluster: don't taint servers that have cloud_destroy_ts (#839) (e5256f1)
- cluster: gg dns records leak if server destroyed before install complete (#842) (e63f242)
- cluster: handle failed tls issuing gracefully (#825) (9aa424b)
- cluter: disable prebake images (#813) (cdb6133)
- contention bugs (#707) (d8a5d33)
- datacenter taint draining too soon, datacenter update not updating drain timeout (#763) (55073a4)
- default build creation (#582) (1ec0ba5)
- delegate more funcionality to dc-scale (#674) (a5be980)
- detect-secrets: pin detect secrets version (#786) (9db9d3c)
- docs (#667) (c5b33fa)
- encode query parameters in migrations (#579) (17ba1d1)
- expand prebake image variant system (#628) (af41308)
- feature flag more tests (#581) (be0e3e9)
- fern: remove dupe fern gen from bad merge (#725) (982d388)
- Fix nix build of bolt on macOS (Darwin) (#589) (3343b06)
- fix user relationship test (#616) (4edd90c)
- force reload tls certs (#736) (599cb8b)
- game guard ingress routes getting cobbled (#569) (bd3a73f)
- game, ip, and job tests (#566) (1607c40)
- get all api tests passing or disabled (#565) (431bfa5)
- get mm tests working again with provisioning (#711) (0b27dc2)
- get tests working with new target (#737) (3d3e37a)
- get todo tests working (#573) (38ed2da)
- get upload tests working (#572) (ace12d9)
- gracfully delete secondary dns record (#828) (94cc2ae)
- grafana: add back default prometheus dashboards (#771) (30f41ee)
- grafana: fix circular dependency between grafana <-> cockroachdb_managed (#760) (46e3bf0)
- grafana: fix pool_type query on cluster nomad panels (#840) (d99d466)
- hotfix check ci (#719) (974b7f4)
- increase default api-route resources for distributed (#559) (dc6cd79)
- infra: gg tls certs timer & precreate tls dir (#812) (b4b707e)
- infra: remove high cardinality prometheus metrics (#835) (e554984)
- infra: upgrade karpenter to 0.32 & disable compaction (#834) (0976245)
- ip-info test (#631) (5fc1e16)
- job-run: add index for run_meta_nomad.node_id (#810) (4559152)
- job-run: correctly clean up leaked proxied ports (#832) (824936f)
- job-run: don't write job proxied port if job already stopped (#841) (4466d82)
- job-run: fix leaking jobs with wrong param order (#815) (6350c72)
- job: gc was not stopping jobs which failed to stop on nomad (#617) (67ab5eb)
- k8s_infra: resolve invalid tf types (#742) (565b044)
- leaked dns records (#765) (163beaf)
- make default cluster opt in (#632) (c98e6aa)
- make nsfw check verbose error optional (#746) (3fb5195)
- mm fixes (#731) (c987736)
- mm tests (#570) (c99a410)
- mm: broken cache (#806) (12ac484)
- mm: only add to available spots if lobby is running (#843) (9b15294)
- move crdb user grants to post migration query (#757) (fbb474d)
- move grafana to its own helm chart (#741) (1be990b)
- node draining (#721) (2432a40)
- nomad: increase storage size to recommended capacity (#818) (9f78ba5)
- only generate path proxied port for https routes (#587) (29985ce)
- only select primary hostname in mm endpoints (#577) (3d8e55d)
- opengb: add dedicated error for neon projects exceeded (#847) (95b7711)
- pass tags to lobby create (#619) (fd7d90c)
- patch signal endpoint with nomad client (#712) (2891b0f)
- reenable better stack (#669) (31d0e43)
- remove /join regression (#687) (0b4af4c)
- remove absolute path from http vector sink (#851) (58c21fc)
- remove duplicate trace in op ctx (#845) (dc9812c)
- remove erronious dep on linode & cloudflare tokens (#649) (259abd8)
- remove hardcoded eks role (#586) (f1546c6)
- Remove old module code (#533) (689d203)
- remove trace from ops (#780) (d4b80f6)
- rename api-route -> api-traefik-provider (#697) (3bf5a1f)
- require tunnel before rivet hook (#714) (22f962f)
- resolve minio url within k8s when using loopback cluster ip (#580) (9bd3c83)
- revert #800, add http vector filter (#821) (b154bb6)
- route and access token tests (#578) (4d8816a)
- run all tests in one pod (#615) (3db1a8c)
- server sql (#715) (7c0418d)
- standardize token ttl (#686) (f17d652)
- start dns creation after installation (#829) (e4e7e21)
- svc: change cluster name_id to be unique (#752) (cea1fe7)
- taint logic for job nodes with no nomad node (#774) (97f6b72)
- team tests (#571) (3265c66)
- test isolation and install script hashing (#671) (495a7a5)
- tls install script not running on first boot (#764) (c13a3ed)
- tunnel: add legacy route for api-route for gg nodes (#767) (f2e05ab)
- universal region backwards compatibility regression (#792) (44d4c0d)
- update rust nix pkg (#648) (91792d0)
- user-presence: broken redis query (#802) (a899774)
- verify different tags give different lobby (#620) (8228371)
- add api scope to dev tunnel docs (#747) (86a45f7)
- Add doc about creating new endpoints (#645) (f8f4ccc)
- Fern installation instructions for script (#643) (e07ddb3)
- update debugging loki command (#852) (ef20e84)
- updating readme pricing information (#850) (21d3a4e)
- Disable Prettier checking on changelog for now (#563) (8bfad8f)
- Fix release please not adding all items to changelog (#560) (7191325)
- Add Cargo.lock to generated list (#710) (ec1c842)
- add comments, region consistency (#685) (9fe643f)
- add datacenter location get test (#673) (79ac6e2)
- add forwarding script for vector (#836) (ae7299d)
- add plugins to readme (#781) (354ab1d)
- add target directory in dockerfile (#755) (27ab366)
- api: move games/builds to game/docker/builds (#759) (0e169ad)
- apply prettier formatting (#849) (5caada5)
- bolt: add server filters & update admin api + cli (#804) (e789bf0)
- bolt: upgrade rust to 1.77.2 (#768) (5cc18f0)
- change devcontainer user off root (#743) (af3566a)
- cherry pick billing feature (#597) (afe4dd0)
- cherry pick req extentions (#738) (a014955)
- clean up dev docs, update readme (#661) (e306a77)
- clean up ip types (#709) (64eefd9)
- clean up server install scripts (#682) (2564c12)
- cleanup (#670) (1c2666c)
- cleanup hash code (#639) (fc17cee)
- clippy fix pass (#790) (4e95737)
- cluster: increase storage reserved for system on ats (#723) (0945af7)
- dev: move rust-anlayzer CARGO_TARGET_DIR to separate dir (#680) (abe64a8)
- dev: respect CARGO_TARGET_DIR in bolt & use non-mounted target in dev container (#675) (eb1a6cf)
- doc drain & kill timeouts (#646) (332f88c)
- dont destroy anything (#683) (2e50434)
- fix deprecated analytics events fields (#777) (e771f91)
- fix queries and install script (#735) (90b7fc6)
- grafana: clean up provisioning dashboard (#820) (3b1d123)
- infra: disable vpa for prometheus & traffic server (#817) (5da29a4)
- infra: increase better uptime check interval to 1m b/c we already have 4x regions (#819) (6727bdf)
- job: gc orphaned jobs from mm (#627) (a6ce505)
- k8s: update priority classes to play nice with karpenter & preemption (#801) (831044d)
- misc fixes (#706) (875b249)
- move bolt cluster subcommand to root (#803) (345d26d)
- move region_config.json to configmap (#621) (49e439e)
- publish user-create-complete (#539) (b2e4006)
- push-notification: remove unused push notification code (#776) (ee2893e)
- release 24.4.0 (#853) (ab2ee63)
- remove cluster_id from servers (#695) (0ca61a8)
- remove unnecessary files (#668) (c5d0f81)
- remove unused code (#778) (e2f4f13)
- replace auto-generate public ip with 127.0.0.1 (#650) (21d2ad1)
- Run cleaning (#701) (4955e28)
- run imports formatting (#779) (1c0bbf8)
- standardize custom image list size (#688) (8086559)
- update baseline secrets (#663) (54f3135)
- update default builds (#824) (a6d5854)
- update devcontainer docker base image (#739) (e91d538)
- update recovery & confirmation period for better uptime (#716) (ee7547b)
- Update sdks (#642) (8dbcfc5)
- vector: filter unneeded go & prom metrics (#837) (041ae05)
24.3.0 (2024-03-01)
- bolt: add region filter to ssh command (#537) (af274a8)
- expose nomad dashboard via cloudflare tunnels (#543) (3a574c0)
- Main: Added Devcontainer files (9bb97db)
- mm: add config to opt-in individual games for host networking & root containers (#549) (be9ddd6)
- add checksum annotations to cloudflared deployment (#542) (f2d847b)
- bolt: clarify 1password service token warning (#541) (eb2e7d5)
- correct hcaptcha length (#548) (748aaa8)
- inaccessible admin routes (#555) (9896b09)
- revert to redis-rs v0.23.3 with panic patch (#552) (3780eaa)
- updated docs error url (#544) (7099658)
- Reduced minimal infrastructure required to get Rivet running:
- Made K8s Dashboard disabled by default
- Made Prometheus and friends (Vector, Loki, Promtail) disabled by default
- Made Clickhouse disabled by default
- Made NSFW Check API disabled by default
- Made NSFW Check API disabled by default
- Made Image Resizing (via Imagor) disabled by default
- Reduced minimal infrastructure required to get Rivet running:
- Made K8s Dashboard disabled by default
- Made Prometheus and friends (Vector, Loki, Promtail) disabled by default
- Made Clickhouse disabled by default
- Made NSFW Check API disabled by default
- Made NSFW Check API disabled by default
- Made Image Resizing (via Imagor) disabled by default
- Infra Added Better Uptime monitor
- Bolt Add Docker
RUN
cache to distributed deploys to improve deploy speeds - Infra Prometheus VPA
- Infra Apache Traffic Server VPA
- api-cloud Admins can view all teams & games in a cluster
- Added automatic deploy CI for staging
- Infra Added compactor and GC to Loki
- api-status Test individual Game Guard nodes to ensure all nodes have the correct configuration
- Generate separate SDKs for
runtime
(lightweight, essentials for running a game) andfull
(heavy, includes cloud APIs) - Metrics for cache operations as well as a Grafana dashboard
- Bolt Added namespace config and secrets sync with
bolt config pull
andbolt config push
via 1Password GROUP_DEACTIVATED
error now shows reasons for deactivation. Added docs for deactivation reasons/health/essential
endpoint to test connectivity to all essential services- Added error when trying to deploy a distributed cluster on a non-linux-x86 machine (not supported)
- api-status More comprehensive status check that both creates a lobby & connects to it
- More details in
CLAIMS_MISSING_ENTITLEMENT
error - API Added 120s timeout to reading request body and writing response to all requests going through Traefik
- Infra Update Promtail logs to match k8s semantics
- Infra Added
Cache-Control: no-cache
to 400 responses from CDN - [BREAKING] Infra Removed config-less hCaptcha. You are now required to provide a site key and secret key for the hCaptcha config in your game version matchmaker config for all future versions (old version will remain operational using our own hCaptcha site key).
- Internal Updated source hash calculation to use
git diff
andgit rev-parse HEAD
- API Removed
x-fern-*
headers from generated TypeScript clients - Implemented liveness probe to check connectivity to essential services
- Remove public-facing health check endpoints
- API Removed ability to choose a name id when creating a game. One will be generated based on the given display name
- Infra Reduced allocated cache size on ATS nodes to prevent disks exhaustion
- Bolt Prompt prod won't prompt if does not have user control
- Bolt Exclude copying bloat from
infra/tf/
to distributed Docker builds - Invalid JWT tokens now return explicit
TOKEN_INVALID
error instead of 500 - Infra Remove debug logging from traefik-tunnel
- Game lobby logs now ship even when the lobby fails immediately
- Fixed
CLAIMS_MISSING_ENTITLEMENT
not formatting correctly (reason given was?
) - Added role ARN to exec commands in
k8s-cluster-aws
tf provider to properly authenticate - Change email attached to Stripe on group ownership change
- Enable
keep-alive
onredis
crate - Update
redis
crate to mitigate panic on connection failure duringAUTH
- Wrong grace period for GG config to update after
mm::msg::lobby_ready
- Resolve RUSTSEC-2024-0003
- Infra New
job-runner
crate responsible for managing the OCI bundle runtime & log shipping on the machine - Infra Jobs now log an explicit rate message when logs are rate limited & truncated
- Infra
infra-artifacts
Terraform plan & S3 bucket used for automating building & uploading internal binaries, etc. - Infra Aiven Redis provider
- Bolt
bolt secret set <path> <value>
command - Bolt
bolt.confirm_commands
to namespace to confirm before running commands on a namespace watch-requests
load testmm-sustain
load test- Infra Automatic server provisioning system (Read more).
- Matchmaker Allow excluding
matchmaker.regions
in order to enable all regions - Matchmaker Lowered internal overhead of log shipping for lobbies
- Matchmaker Game mode names are now more lenient to include capital letters & underscores
- API Return
API_REQUEST_TIMEOUT
error after 50s (seedocs/infrastructure/TIMEOUTS.md
for context) - API Move generated client APIs to sdks/
- API Lower long poll timeout from 60s -> 40s
- Bolt Moved additional project roots to Bolt.toml
- types Support multiple project roots for reusing Protobuf types
- Infra Switch from AWS ELB to NLB to work around surge queue length limitation
- Infra Loki resources are now configurable
- pools Allow infinite Redis reconnection attempts
- pools Set Redis client names
- pools Ping Redis every 15 seconds
- pools Enable
test_before_acquire
on SQLx - pools Decrease SQLx
idle_timeout
to 3 minutes - pools Set ClickHouse
idle_timeout
to 15 seconds - api-helper Box path futures for faster compile times
- Upgrade
async-nats
test-mm-lobby-echo
now handlesSIGTERM
and exits immediately, allows for less resource consumption while testing lobbies- mm Dynamically sleep based on lobby's
create_ts
for Treafik config to update - Infra Update Traefik tunnel client & server to v3.0.0-beta5
- Infra Update Traefik load balancer to v2.10.7
- Resolve RUSTSEC-2023-0044
- Infra runc rootfs is now a writable file system
- Matchmaker Logs not shipping if lobby exits immediately
- Matchmaker Returning
lnd-atl
instead ofdev-lcl
as the mocked mocked region ID in the region list - API 520 error when long polling
- api-cloud Returning wrong domain for
domains.cdn
- Infra Fix Prometheus storage retention conversion between mebibytes and megabytes
- Infra Fix typo in Game Guard Traefik config not exposing API endpoint
- Infra Kill signal for servers was
SIGINT
instead ofSIGTERM
- Infra NATS cluster not getting enabled
- Infra Redis Kubernetes error when using non-Kubernetes provider
- api-helper Remove excess logging
user_identity.identities
not getting purged on create & delete- Bolt Error when applying Terraform when a plan is no longer required
- api-helper Instrument path futures
- Infra CNI ports not being removed from the
nat
iptable, therefore occasionally causing failed connections - Infra Disable
nativeLB
for Traefik tunnel - Infra Update default Nomad storage to 64Gi
- Infra Tunnel now exposes each Nomad server individually so the Nomad client can handle failover natively instead of relying on Traefik
- Infra Traefik tunnel not respecting configured replicas
- Bolt ClickHouse password generation now includes required special characters
- Infra Lobby tagging system for filtering lobbies in
/find
- Infra Dynamically configurable max player count in
/find
and/create
- Bolt Added
bolt admin login
to allow for logging in without an email provider setup. Automatically turns the user into an admin for immediate access to the developer dashboard. - Bolt Fixed
bolt db migrate create
- Infra Added
user-admin-set
service for creating an admin user - api-cloud
/bootstrap
properties foraccess
andlogin_methods
- Bolt Removed
bolt admin team-dev create
. You can usebolt admin login
and the hub to create a new dev team - Infra Turnstile
CAPTCHA_CAPTCHA_REQUIRED
responses now include a site key - Infra Turnstile is no longer configurable by domain (instead configured by Turnstile itself)
- Infra Job log aggregating to use Vector under the hood to insert directly into ClickHouse
- Matchmaker Players automatically remove after extended periods of time to account for network failures
- Infra Job logs occasionally returning duplicate log lines
- Matchmaker /list returning no lobbies unless
include_state
query parameter istrue
- Matchmaker Players remove correctly when the player fails to be inserted into the Cockroach database and only exists in Redis
- Chirp
tail_all
default timeouts are now lower thanapi-helper
timeout - api-kv Batch operation timeouts are now lower than
api-helper
timeout
- Bolt Development cluster can now be booted without any external services (i.e. no Linode & Cloudflare account required, does not require LetsEncrypt cert)
- Infra Autoscale non-singleton services based on CPU & memory
- Infra Support for running ClickHouse on ClickHouse Cloud
- Infra Support for running CockroachDB on Cockroach Cloud
- Infra Support for running Redis on AWS ElastiCache & MemoryDB
- Infra Dynamically provisioned core cluster using Karpenter
- Infra Dual-stack CNI configuration for game containers
- Infra job iptables firewall to job pool that whitelists inbound traffic from Game Guard to the container
- Infra job iptables rules to configure minimize delay TOS for traffic without a TOS
- Infra job iptables rules to configure maximize throughput TOS for traffic from ATS
- Infra job Linux traffic control filters to prioritize game traffic over other background traffic
- Infra Prewarm the Traffic Server cache when a game version is published for faster cold start times on the first booted lobby in each region
- Infra Envoy Maglev load balancing for traffic to edge Traffic Server instances to maximize cache hits
- Bolt Timeout for tests
- Bolt New summary view of test progress
- Bolt
config show
command - Bolt
ssh pool --all <COMMAND>
command - Bolt Validation that the correct pools exist in th enamespace
- Bolt Validation that the matchmaker delivery method is configured correctly depending on wether ATS servers exist
- Dev Bolt automatically builds with Nix shell
- Bolt
--no-purge
flag totest
to prevent purging Nomad jobs - Matchmaker Expose hardware metrics to container with
RIVET_CPU
,RIVET_MEMORY
, andRIVET_MEMORY_OVERSUBSCRIBE
- api-cloud
GET /cloud/bootstrapp
to provide initial config data to the hub - api-cloud Dynamically send Turnstile site key to hub
- Infra Rate limit on creating new SQL connections to prevent stampeding connections
- Cleaned up onboarding experience for open source users, see docs/getting_started/DEVELOPMENT.md
- Infra Moved default API routes from
{service}.api.rivet.gg/v1
toapi.rivet.gg/{service}
- Infra Removed version flat from API request paths
- Bolt Tests are built in batch and binaries are ran in parallel in order to speed up test times
- Bolt Run tests inside of Kubernetes pod inside cluster, removing the need for port forwarding for tests
- Bolt Remove
disable_cargo_workspace
flag since it is seldom used - Bolt Remove
skip_dependencies
,force_build
, andskip_generate
onbolt up
andbolt test
commands that are no longer relevant - api-route Split up routes in to
/traefik/config/core
and/traefik/config/game-guard
- Imagor CORS now mirror the default CORS configured for S3
- Dev
git lfs install
automatically runs inshellHook
- Dev Removed
setup.sh
in lieu ofshellHook
- Replaced
cdn.rivet.gg
domains with presigned requests directly to the S3 provider - api-matchmaker Gracefully disable automatic region selection when coords not obtainable
- Infra Disabling DNS uses
X-Forwarded-For
header for the client IP - Infra Pool connections are now created in parallel for faster tests & service start times
- Infra Connections from edge <-> core services are now done over mTLS with Treafik instead of cloudflared
- Infra ClickHouse database connections now use TLS
- Infra CockroachDB database connections now use TLS
- Infra Redis database connections now use TLS
- Infra Redis now uses Redis Cluster for everything
- Infra Cloudflare certificate authority from DigitCert to Lets Encrypt
- Infra Removed 1.1.1.1 & 1.0.0.1 as resolvers from Nomad jobs due to reliability issues
- Infra Added IPv6 DNS resolvers to Nomad jobs
- Infra CNI network for jobs from bridge to ptp for isolation & performance
- Infra Remove requirement of
Content-Type: application/x-tar
for builds because of new compression types - Matchmaker Expose API origin to
RIVET_API_ENDPOINT
env var to lobby containers - [BREAKING] Infra Removed undocumented environment variables exposed by Nomad (i.e. anything prefixed with
NOMAD_
)
LC_ALL: cannot change locale
error from glibc- Dev Bolt uses
write_if_different
for auto-generated files to prevent cache purging
- Revert Fern TypeScript generator to 0.5.6 to fix bundled export
- Don't publish internal Fern package on tag to prevent duplicate pushes
- Update to Fern 0.15.0-rc7
- Update Fern TypeScript, Java, and Go generators
- Matchmaker Support custom lobbies
- Matchmaker Support lobby state
- Matchmaker Support external verification
- Library Support Java library
- Library Support Go library
- Cloud Support multipart uploads for builds
- Infra Support configuring multiple S3 providers
- Infra Support multipart uploads
- Infra Replace Promtail-based log shipping with native Loki Docker driver
- Infra Local Traefik Cloudflare proxy daemon for connecting to Cloudflare Access services
- Infra Upload service builds to default S3 provider instead of hardcoded bucket
- Infra Enable Apache Traffic Server pull-through cache for Docker builds
- Bolt Support for connecting to Redis databases with
bolt redis sh
- Bolt Confirmation before running any command in the production namespace
- Bolt
--start-at
flag for all infra commands - Bolt Explicit database dependencies in services to reduce excess database pools
- Infra Update CNI plugins to 1.3.0
- Infra Update ClickHouse to 23.7.2.25
- Infra Update Cockroach to 23.1.7
- Infra Update Consul Exporter to 1.9.0
- Infra Update Consul to 1.16.0
- Infra Update Imagor to 1.4.7
- Infra Update NATS server to 2.9.20
- Infra Update Node Exporter server to 1.6.0
- Infra Update Nomad to 1.6.0
- Infra Update Prometheus server to 2.46.0
- Infra Update Redis Exporter to 1.52.0
- Infra Update Redis to 7.0.12
- Infra Update Traefik to 2.10.4
- Bolt PostHog events are now captured in a background task
- Bolt Auto-install rsync on Salt Master
- Bolt Recursively add dependencies from overridden services when using additional roots
- KV Significantly rate limit of all endpoints
- Resolve RUSTSEC-2023-0044
- Resolve RUSTSEC-2022-0093
- Resolve RUSTSEC-2023-0053
- Portal Skip captcha if no Turnstile key provided
- Infra Missing dpenedency on mounting volume before setting permissions of /var/* for Cockroach, ClickHouse, Prometheus, and Traffic Server
- Chrip Empty message parameters now have placeholder so NATS doesn't throw an error
- Chrip Messages with no parameters no longer have a trailing dot
- Bolt Correctly resolve project root when building services natively
- Bolt Correctly determine executable path for
ExecServiceDriver::UploadedBinaryArtifact
with different Cargo names