Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for positions: Baseline definition #374

Closed
ddbeck opened this issue Oct 3, 2023 · 12 comments
Closed

Call for positions: Baseline definition #374

ddbeck opened this issue Oct 3, 2023 · 12 comments
Labels
baseline definition Issues related to the definition of Baseline meeting agenda Governance group meeting agenda item

Comments

@ddbeck
Copy link
Collaborator

ddbeck commented Oct 3, 2023

If you've joined the recent WebDX calls or participated in the Matrix chat, then you have heard that we're moving toward having the governance group make a decision about the definition of Baseline, so that we can try out a revised definition in front of actual developers. The governance group expects to have the first meeting on or after Tuesday, 10 October 2023.

How you can help: summarize your position

There have been many, many discussion threads over the past couple of months. It's been difficult to keep up. To help the governance group make a decision that considers everyone's viewpoint, please write or link to a summary of your position and chief concerns in this issue before the governance meeting.

Given the large number of participants in these discussions, please forgive me for emphasizing that a summary is highly desirable.

Prior discussions

To review past discussions, see the label https://github.com/web-platform-dx/web-features/labels/baseline-definition. I've added every discussion that was directly related to the definition. I excluded related topics, such as usage data quality or other status definitions (e.g., deprecated). But if I've missed (or mis-categorized) an issue, please let me know.

@ddbeck ddbeck added baseline definition Issues related to the definition of Baseline meeting agenda Governance group meeting agenda item labels Oct 3, 2023
@dfabulich
Copy link
Collaborator

My position is that Baseline should be time-based (N months/years after the date all browsers support the feature, aka "keystone"), it should include a broad set of browsers (including Opera and Samsung), and it the time period should be long (I'm aiming for three years).

  • Time-based: As opposed to based directly on marketshare data (because we have no trustworthy source of data), browser versions (because they're harder to count than time), or purely by fiat. I wrote a doc about why this compromise makes sense. https://docs.google.com/document/d/16Gpw1bRYPZhWtrnkT6KZiRnPXUC8UDTKknVbwgjSJ2s/edit
  • Include Opera and Samsung: Specifically, I'm recommending that Baseline features should work in all of the browsers in the BCD table at the bottom of MDN. In addition to the four major browsers (Chrome, Edge, Firefox, and Safari) that would add Opera and Samsung. Opera and Samsung are Chromium-based browsers, so it shouldn't have a huge effect on Baseline.
  • Three years long: I have several reasons why.
    • Aiming for 95%: I have a chart showing caniuse marketshare percentages based on Keystone years. https://dfabulich.github.io/baseline-calculator/keystone-chart-sample.html When I eyeball that data, I note that some features that keystoned less than three years ago have less than 95% marketshare, e.g. css-logical-props has a 93.6% marketshare. More data and methodology is available here. https://github.com/dfabulich/baseline-calculator
    • Simplify the definition: We want the definition of Baseline to be simple, but we also don't want to treat a feature as Baseline if it doesn't work in the Firefox ESR release, if it only works in Safari on the latest version of macOS, etc. A long timeline allows us not to have to worry about these details.
    • Developers who care about Baseline want a long one: Some folks have argued that the Baseline timeline should be short, because most developers just ship whatever works in the latest version of their primary browser (often Chrome). But I argue that those developers don't actually care about Baseline. For developers who do care about Baseline, they want to prevent/avoid compatibility surprises, and that means a longer wait.

@tidoust
Copy link
Member

tidoust commented Oct 4, 2023

Same as @dfabulich with one precision and a few additional conditions, most of which were in the draft document that @ddbeck prepared:

  • Define "keystone" as the date all main browser engines support the feature. In other words, do not wait until Opera and Samsung support a feature for the computation of the "keystone" date. Still include Opera and Samsung in the list of browsers that need to support any baseline feature. The rationale is that the number of months is going to be high enough not to have to worry about the effective release dates of browsers that are based on the browser engines that we track.
  • Include "The feature has support in each release in pinned releases" (Firefox ESR). Pinned releases wouldn't count for the computation of the "keystone" date. The rule is just there to make it explicit that we consider these releases in our definition, even though, in practice, the rule is not going to have any impact, unless something degenerates somewhere.
  • Include "The feature is not defined with a deprecation notice, obsolescence warning, or legacy tag." For instance, appcache remained with a deprecation notice for a long period of time before it was removed from browsers, it seems a good idea to avoid flagging such a feature as baseline when there is already a plan to get rid of it.
  • Include something along the lines of "No major interoperability bugs with the feature are known to exist". This criterion is more subjective than the others but is again only meant to catch degenerative scenarios.

I don't feel strongly on the effective number of years to start with but would err on the longer part of the spectrum: 2 years, 2.5 years, or 3 years. As far as I can tell, with the browser usage data that we have at hand, we cannot evaluate marketshare precisely, so I read percentages like 93.6% and 95% as describing the same reality. I view the marketshare phases more as a rapidly growing phase followed by a (high enough) plateau, which the analyses that were made seem to confirm. I would aim for the plateau.

@ddbeck's document also contained a "The feature has a known specification, published and open to inspection by web developers". I don't feel that's needed in that I think interoperability trumps the need for a specification (with my standardization hat on, I would definitely try to "fix" such situations though!).

My expectation is that this definition is a starting point and will be refined over time based on feedback (it does not have to be refined, it just may).

@Elchi3
Copy link
Collaborator

Elchi3 commented Oct 4, 2023

Thanks Daniel! I've read through the documents again and I have to say I'm learning new things whenever I dive into it or we talk about it, but right now I would summarize my position like this:

To me, a baseline feature:

  • has high marketshare (~95% and more). Given the lack of clear marketshare data, a time-based proxy is used. This seems to be 3 years after it shipped in all main browsers (*). (should be re-evaluated regularly if 3 years is still a good proxy). In BCD and on MDN, we have had good experience with 2 years as a time-based proxy for removing irrelevant features and docs (features removed from all BCD browsers 2 or more years ago). I think any time-based proxy between 2 and 3 years should be okay. (Is there data on how software usually sets "LTS" timelines etc?)
    • (*) Main browsers: Those that are in BCD except WebViews, see below. And maybe except the Operas for which we received requests for removal (I evaluated that against BCD's browser removal guidelines). BCD has guidelines for browser additions and removals. I guess my concern is that there are no clear guidelines for baseline which browsers to include now, or how a browser gets added/removed in the future. (pointers appreciated if there are).
  • is not excluded in ESR releases (e.g. ServiceWorkers weren't in Firefox ESRs at first). This might not be relevant if the time-based proxy is long enough.
  • has no deprecation flag.
  • has no partial implementation flag. In BCD this flag is set if a feature has major bugs and/or if it's only shipping on a particular OS (e.g. Windows only), among other things.

In a later iteration of baseline, I would like to add:

  • is shipping interoperable in WebView browsers

Some research about Webviews was brought to us, but more investigation (and more/better compat data in BCD) is needed so I don't think now is a good time to incorporate that into baseline.

@captainbrosset
Copy link
Contributor

Our (Microsoft Edge’s) position is less on the definition of Baseline than it is on its fundamental concept.

  • Discoverability/adoption of features

    Developers might learn to recognize Baseline as a stamp of approval, and therefore learn to ignore non-Baseline features. This leads to multiple risks:

    • Baseline risks discouraging early adopters. Browser vendors often need their help.
    • Certain features can be used, without any problem, very early as progressive enhancements, or by using polyfills. Baseline might make these features harder to discover, and therefore less used.
    • If many web developers learn to only use Baseline features, this essentially gives browser vendors who don't prioritize a feature a veto.
  • Trust issues

    Because Baseline wants to be a very condensed status, it can only be an approximation, and can’t represent all the nuances of the web. There will be cases when Baseline is wrong about a feature. E.g., a Baseline feature that, in fact, doesn’t work on a specific browser, or a feature that’s not Baseline yet but is already available everywhere.
    This might erode user trust in the Baseline concept, making it harder for developers to rely on it without checking caniuse.com or BCD.

  • Incompatibility with how feature promotion currently happens

    Web developers often hear about a feature when it gets supported in one browser, and then again when it gets supported in all major browsers. Baseline happens several years after this. It’s likely that a number of developers would have made their choices about a particular feature way before Baseline status occured.
    This will make it harder for these developers to find Baseline useful.

  • Lack of status granularity and call to action for developers

    Baseline is currently envisioned as a traffic light style signal that mostly draws attention to features that have achieved Baseline status vs. those that haven’t.
    The lifecycle of a feature is very granular, and developer attention at every step of the way is something the web platform depends on. If a feature is missing in one engine, we need developers to ask for its implementation. If a feature is experimental, it needs to be tested.
    Not representing these states and giving clear call to actions for developers to help risks slowing down the development of the web platform.

The following measures might make Baseline less risky:

  • Baseline should represent the multiple stages of a feature with more precision than with a simple red/green traffic light. For example, it should say if a feature is experimental in just one browser, if it’s usable as a progressive enhancement, if it’s only missing in one browser, and offer links to the various bug trackers, etc.
  • Baseline should show how long a feature has been supported in all browsers for (and possibly show its market share) to let users make their own decisions even if the Baseline status hasn’t been reached. The status could get gradually “greener” until the feature hits the 2 to 3 years mark.
    Additionally, this would allow creating different visualizations of the Baseline data: all features that have just been implemented in all browsers, all features that have been implemented in all browsers for 3 years, etc.
  • Baseline should probably be based on the same browsers that BCD uses.

@jgraham
Copy link
Collaborator

jgraham commented Oct 5, 2023

Mozilla Baseline Definition Proposal

Proposal

Baseline should be defined as the longest time matching each of the following criteria:

  • Supported in all security supported released of included browsers
  • At least 30 months since the feature was present in the stable channel of all major browser engines.
  • No cross vendor agreement that the feature is considered deprecated and has a suitable replacement that has reached baseline.

In addition, the criteria should be subject to a regular revision process. Initially this should be based on user sentiment analysis, to ensure that the specification of baseline is achieving the desired goals. Thereafter periodic — at least annual — recalibrations of the 30 months number should be performed so that it continues to match the rate of rollout of new browser versions to end users.

If at some point it becomes practical to adopt a baseline definition that is based directly on user reach, that should be preferred over this definition.

Definitions

  • Major browser engines: We consider this to be any browser engine that's powering a widely used general purpose browser. Currently that's Blink, Gecko and WebKit.
  • Included browsers: Those which are powered by a major browser engine, have representation within the WebDX CG or the Baseline Governance group, and are accepted by the Governance Group as representing significant general purpose user agents that have enough market share to be relevant to the user reach of features.

Rationale

Baseline should represent a time at which it's safe for developers to use a feature for all their users, without fear of compatibility issues, and without additional considerations (e.g. use of polyfills).

Ideally this definition would be directly based on two considerations:

  • We cannot tell developers that a feature is safe to use unconditionally whilst telling users that a browser without support for that feature is still current / supported.
  • The feature is measured to have reached a critical mass of users e.g. via marketshare data.

For the first, we can use the "security supported" status for browsers that are participating in the baseline effort and which have a policy that we can incorporate into decision making.

For the second, however, the existing public browser marketshare sources appear to have some methodological problems (e.g. incorrect classification of browser versions, incomplete removal of non-browser traffic from the data). Private data is sometimes available, but not always, and relying extensively on private data lacks transparency.

Therefore we propose the use of a time-based proxy for the time for features to reach a large user population. The time after the feature reaches the stable release of the last browser to implement seems like a workable measure, as long as the corresponding value is picked to be conservative i.e. based on the slowest browser to update.

Based on the analysis done by Dan Fabulich, and an analysis of Firefox telemetry data, we believe that a period after that final release of at least 30 months is required for the feature to reach enough of the user population to reach the stated goal of authors being able to use it without worrying about lack of user support.

Arguably 30 months is somewhat optimistic; in Dan's analysis based on Statcounter data that corresponds to 93% user reach for features released in 2019. As such we would also be happy to increase this to 36 months, which corresponds to 94-95% of users in the Statcounter analysis and more than 95% of users in Firefox telemetry data.

Regular reassessments of this methodology, and the numerical parameters chosen are necessary both because there's considerable uncertainty about whether we have chosen the right approach and calibrated the initial definition correctly, and because we expect that the time for features to achieve widespread user reach to itself change over time. For example browsers might switch to a more rapid upgrade cadence, thus moving users to newer versions faster and decreasing the time to baseline. Or browsers might drop support for older devices which are still widely used, leaving a large group of users on older browser releases, and increasing the time to baseline.

Finally we note that we expect the 30 months criteria to always overrule the security support criteria at present. However we think that the criteria should include both explicitly since they represent fundamentally different considerations, and any future changes to the baseline definition should not allow it to be shorter than the time for which browser versions continue to receive official support.

@foolip
Copy link
Collaborator

foolip commented Oct 9, 2023

Google's position is that Baseline needs to reflect two important points in time:

  1. When a feature is available in stable releases of all major browsers.
    • In some discussions this has been called the "interop moment".
    • The main consideration is which browsers are included.
    • We support the original proposal for the core browser set: Chrome, Edge, Firefox, and Safari.
  2. When a feature is widely available to end users.
    • This is what most of the discussion has revolved around.
    • A time-based proxy seems like the most pragmatic approach.
    • We support 24 months as the definition.

These definitions should be revisited annually based on web developer feedback.

In other words, we are proposing that there are stages of Baseline, not a single state. Regarding presentation, the two stages should be sufficiently distinct and should not encourage developers to depend on new features as if they are already widely available. But it is important to distinguish between features that are on track to be widely available and those that aren't, and the "interop moment" is what puts a feature on that track.

We share some of the concerns of @captainbrosset (Microsoft Edge) about Baseline potentially discouraging early adopters, and think that these stages help with that.

@ddbeck
Copy link
Collaborator Author

ddbeck commented Oct 9, 2023

Speaking for myself (though I’m contract to Google), the top four things I’m thinking about when it comes to a Baseline definition are:

  • I want a status that is guiding but ultimately descriptive. I want to simplify information about a feature’s availability—information that developers could find on their own, but sparing them the trouble. (This also means I expect the definition to change incrementally over time, in response to changes in the platform and developers' experience of it.)
  • I want a time-based approach to capture the idea of wider support, to avoid alternatives that are poorer imitations (counting releases) or suffer from practicality and trust issues (usage data).
  • I want the time range to fall 1 to 3 years after a feature has shipped in every engine, to reflect when the overwhelming majority of support availability has already been captured. I’d prefer to use a shorter or middle time frame (1 to 2 years, approximating a 90% usage threshold to reflect some uncertainty about usage and optimism about the future), but I’m not fixated on a target.
  • I want editorial overrides to deal with deprecated features, bugs, and other hazards than to say anything outright barbarous (apologies to Orwell). A definition is a tool to help us communicate faster and easier—it shouldn't box us into things that don't make sense. If we can think this hard about the definition, I hope we’ll bring half as much energy to making sure the individual features are fairly represented.

I’d like to acknowledge that I’ve come around to agreeing to a number of issues that I originally did not think much of, including:

  • Tracking the duration/extent of interoperability, such as noting the “keystone” date
  • Adding more browsers to the core browser set (provided we have quality data)
  • Considering usage data (even if it can’t drive the status today)
  • Considering device/browser support durations (and related matters, like the carbon intensity of favoring newer/older devices)

I’m very grateful to everyone who has participated so far—I think we’re likely to get better results than we would have if I were left to my own impulses.

Finally, there are a number of ideas that I’d like to consider later (or to encourage the WebDX community to take up as separate efforts), but I’m happy with putting them off for now:

  • Explicitly defining other statuses or annotations (e.g., single implementation, call-for-input, polyfilable, deprecated, etc.)
  • Cultivating new or better data sources for browser, operating system, device usage data, and polyfills
  • Considering web views

@bkardell
Copy link

bkardell commented Oct 10, 2023

TL;DR: Our[1] position is that baseline should be defined as "reached on the day when it first shipped in all browsers with more than 1% of global marketshare according to {an agreed upon set of counters}". We propose that, to start, the set of counters would be statscounter and cloudflare. We believe that the best way to display baseline on MDN is a banner saying the feature achieved baseline status “{relative time} ago”.

In more detail:

We believe the stated positions which call for something like 30-36 months after “universally shipping” are describing a different milestone than most developers are interested in. We believe the shipping-plus-three-years measure would be better termed "ubiquitous", and that it's interesting - and very important to a minority of developers - but is not what most developers will find helpful. This straw poll's numbers indicate only 25-30% have a measure of “years” in their thinking.

We regret that the group managed to get painted into a corner where we have already created a single term and publicly suggested what it would mean in the abstract and how it would be used. We understand how it happened, and aren't trying to point fingers. It just leaves us in an unfortunate place, because Igalia believes there are multiple milestones that are interesting, rather than one, and that the emerging consensus in the group seems to be at odds with what most developers will expect.

[1] "Our" here should be read as @meyerweb's and mine. While we believe many Igalians would agree, we did not have sufficient time to feel good about saying this was Igalia's position.

@meyerweb
Copy link
Contributor

After posting, Brian and I noticed our position roughly aligns with what @foolip posted. Sorry to not have noticed that before posting; we’d have written what we said a little differently.

@romainmenke
Copy link
Contributor

romainmenke commented Oct 11, 2023

TL;DR; I mostly agree with Brian and Eric but I am concerned about the requirement to consider all browsers that have more than N% marketshare.

I think that the interop moment is the most important moment to communicate to developers.
I also think that the time since the interop moment is the best way to present this to developers.

I think there should be room in the definition to override the N% marketshare requirement.
If a browser is discontinued (e.g. IE) then it could be decided to remove it from the list even if it still has some amount greater than N.


I think that the time based definition with 2-3 years after shipping in all engines is an overcompensation to the current definition. I don't want a web where developers always wait 2-3 years before they start using new features. I also don't want a web where websites only work in the latest releases of browsers.

The current definition is a binary label that lacks any context or nuance and still aims to communicate that a feature is safe. It leaves very little room for developers to use their own judgement.

My concern is that developers are being told that something is safe to use, when the real answer always is: "it depends".

The important nuance for me is the difference between :

  1. I want to easily see if a feature has been supported for 2-3 years
  2. I want everyone to only consider those features safe that have been supported for 2-3 years

I think there is more value in letting go of the binary label.

As Brian said :

We believe that the best way to display baseline on MDN is a banner saying the feature achieved baseline status “{relative time} ago”.

This already holds enough context to leave room for developers to use their own judgement.

It communicates that all implementers agree on what the feature is, how it must work, ...
It also communicates a single data point that people can use for heuristic/instinctual decision making.


I also agree with @ddbeck that there should be plenty of escape hatches for editorial edits. The intention is to help developers. If there are edge cases where baseline is causing harm then it must be possible to address those issues without having to alter the definition itself.

@ddbeck
Copy link
Collaborator Author

ddbeck commented Oct 12, 2023

Hi folks, a brief schedule update: the governance group held its first meeting on Tuesday, 10 October, as planned. However, we ran out of time before we could cover all the points of discussion, so we'll be continuing soon (date TBD). Thanks to everyone who has given their input so far—it's been very helpful in organizing our discussion.

@tidoust
Copy link
Member

tidoust commented Dec 18, 2023

Closing the issue as addressed: the governance group converged on a new baseline definition, taking positions expressed here as input. The definition will be reviewed at least once per year.

@tidoust tidoust closed this as completed Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
baseline definition Issues related to the definition of Baseline meeting agenda Governance group meeting agenda item
Projects
None yet
Development

No branches or pull requests

10 participants