The future of files-based metadata in moz.build files

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

The future of files-based metadata in moz.build files

Gregory Szorc-3
moz.build files have a mechanism for associating metadata with files [1]. We use the "with Files()" primitive [2] for:

* Associating files with Bugzilla components
* Defining how files impact certain CI components

In the future, I fully anticipate us wanting to have a primitive that allows us to associate files with modules and/or reviewers. This would enable tools to automatically pick reviewer(s) for files that changed in a commit, for example.

Putting this files-based metadata in moz.build files seemed like the correct decision initially. moz.build files existed and it seemed better to reuse an existing solution in the same space than to invent a new wheel.

However, some deep cracks have formed that are causing me to question whether continuing to shoehorn this metadata into moz.build files is the right way to go.

"If you build it they will come." The mapping of files to bug components is valuable and people wanted an easy way to query it. So, there's an endpoint on hg.mozilla.org that allows callers to specify a set of filenames and resolve the moz.build Files() metadata [3]. Because moz.build files are Python files, this involves running a heavily sandboxed process on the server with a snapshot of the Python code for evaluating moz.build files. A big problem with this is that this Python code can only evaluate moz.build files that were valid at the time the code snapshot was made. So when we upgrade the Python code, we may be unable to read moz.build files on older revisions. And when we add new primitives to moz.build files, the old Python code may choke reading moz.build files from newer revisions. It's a giant mess and is causing problems with automated tooling that needs moz.build Files metadata lookups to "just work."

In bug 1402010, I wrote some PoC patches that change moz.build evaluation of Files metadata to strip the Python AST of all nodes that aren't relevant to "with Files()" blocks. This drastically reduces the surface area of things that can cause moz.build evaluation to fail when using a snapshot of the Python code for evaluating moz.build files. While that solution does work, I have my doubts that it is the right step forward. For one, you lose some of the Python niceties, such as the ability to reference a variable or higher-level primitive when using "with Files()." For example, you may want to iterate over a list of patterns to define common metadata. You could no longer do that if the AST is rewritten in Files evaluation mode. That action-at-a-distance would be confusing to people editing moz.build files.

Anyway, I'm really tempted to declare the shoehorning of files-based metadata that isn't essential to the build system into moz.build files to be a failed experiment. I'm tempted to say we should extract this static data that is non-essential to builds into standalone data files (e.g. YAML files). Essentially, we would invent a new data format for representing files-based metadata and then port moz.build Files() blocks - namely those declaring BUG_COMPONENT - to the new format. We would update tooling that evaluates files-based metadata to consume the new files. The new format would strive to be backwards and forwards compatible so that anyone with a snapshot of the code for evaluating the files would be able to evaluate very old and newer files to some degree. We may or may not keep Files() in moz.build files. The CI scheduling primitives in there are arguably the domain of the build system and should continue to live in moz.build files. But we could have that discussion.

This post is basically me saying "I'm not sure what we should do." I still like the idea of shoehorning things into moz.build files because it is convenient and doesn't fragment where such knowledge is defined. But at the same time, that is technically difficult to support and that difficulty is actively causing pain due to existing solutions being brittle.

I'm curious how others feel about this issue. And if we wanted to talk about what a standalone file would look like, be named, etc, I think that would be a constructive exercise.


_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: The future of files-based metadata in moz.build files

jmaher-2
I pushed hard to ensure we had a mostly complete set of data in moz.build files for BUGZILLA_COMPONENT.  This is useful data for many things, the primary tool is the intermittent bug filer which uses that data to file a bug in the proper component for the failing test.  This also gives us a path to ownership for our tests which is a first step in getting the right people to look at issues with tests.

I am not personally tied to the idea of making moz.build files be the source of truth, it works for now- if there is a better proposal that is great.  Having the data in-tree and a system to notify you if there are new files that do not have BUGZILLA_COMPONENT defined seems useful.   I like the idea of the generated artifacts- we use that in code coverage to determine different types of files and ownership.

-Joel


On Tue, Mar 20, 2018 at 1:44 PM, Gregory Szorc <[hidden email]> wrote:
moz.build files have a mechanism for associating metadata with files [1]. We use the "with Files()" primitive [2] for:

* Associating files with Bugzilla components
* Defining how files impact certain CI components

In the future, I fully anticipate us wanting to have a primitive that allows us to associate files with modules and/or reviewers. This would enable tools to automatically pick reviewer(s) for files that changed in a commit, for example.

Putting this files-based metadata in moz.build files seemed like the correct decision initially. moz.build files existed and it seemed better to reuse an existing solution in the same space than to invent a new wheel.

However, some deep cracks have formed that are causing me to question whether continuing to shoehorn this metadata into moz.build files is the right way to go.

"If you build it they will come." The mapping of files to bug components is valuable and people wanted an easy way to query it. So, there's an endpoint on hg.mozilla.org that allows callers to specify a set of filenames and resolve the moz.build Files() metadata [3]. Because moz.build files are Python files, this involves running a heavily sandboxed process on the server with a snapshot of the Python code for evaluating moz.build files. A big problem with this is that this Python code can only evaluate moz.build files that were valid at the time the code snapshot was made. So when we upgrade the Python code, we may be unable to read moz.build files on older revisions. And when we add new primitives to moz.build files, the old Python code may choke reading moz.build files from newer revisions. It's a giant mess and is causing problems with automated tooling that needs moz.build Files metadata lookups to "just work."

In bug 1402010, I wrote some PoC patches that change moz.build evaluation of Files metadata to strip the Python AST of all nodes that aren't relevant to "with Files()" blocks. This drastically reduces the surface area of things that can cause moz.build evaluation to fail when using a snapshot of the Python code for evaluating moz.build files. While that solution does work, I have my doubts that it is the right step forward. For one, you lose some of the Python niceties, such as the ability to reference a variable or higher-level primitive when using "with Files()." For example, you may want to iterate over a list of patterns to define common metadata. You could no longer do that if the AST is rewritten in Files evaluation mode. That action-at-a-distance would be confusing to people editing moz.build files.

Anyway, I'm really tempted to declare the shoehorning of files-based metadata that isn't essential to the build system into moz.build files to be a failed experiment. I'm tempted to say we should extract this static data that is non-essential to builds into standalone data files (e.g. YAML files). Essentially, we would invent a new data format for representing files-based metadata and then port moz.build Files() blocks - namely those declaring BUG_COMPONENT - to the new format. We would update tooling that evaluates files-based metadata to consume the new files. The new format would strive to be backwards and forwards compatible so that anyone with a snapshot of the code for evaluating the files would be able to evaluate very old and newer files to some degree. We may or may not keep Files() in moz.build files. The CI scheduling primitives in there are arguably the domain of the build system and should continue to live in moz.build files. But we could have that discussion.

This post is basically me saying "I'm not sure what we should do." I still like the idea of shoehorning things into moz.build files because it is convenient and doesn't fragment where such knowledge is defined. But at the same time, that is technically difficult to support and that difficulty is actively causing pain due to existing solutions being brittle.

I'm curious how others feel about this issue. And if we wanted to talk about what a standalone file would look like, be named, etc, I think that would be a constructive exercise.



_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: The future of files-based metadata in moz.build files

mozilla
In reply to this post by Gregory Szorc-3
On Tue, Mar 20, 2018 at 11:45 AM Gregory Szorc <[hidden email]> wrote:
The CI scheduling primitives in there are arguably the domain of the build system and should continue to live in moz.build files. But we could have that discussion.

From a code standpoint, the only place that appears to use this information is in the taskcluster code, which is the only point that code calls into mozbuild (other than for re-using a couple of utility functions). Depending on mozbuild here has cause a bit of diffuclty for thunderbird being able to use the scheduling code.

There is also the code for building sphinx documentation, which does actually just use the python ast method of parsing the moz.build files, bringing with it, all the trouble you mention.

-- Tom

_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: The future of files-based metadata in moz.build files

Dustin Mitchell
In reply to this post by Gregory Szorc-3
I agree that the fit with moz.build files is awkward.  Other issues
(besides those tom raised) include:
 - two different methods of reading files can give different results
 - the method of combining different `with Files` clauses, even in
different files, is not clear
 - *very* deep Python magic used in the implementation makes it
difficult to understand or modify (and I'm no Python n00b)

We haven't gotten very far with SCHEDULES partly because it's not
clear (to me or anyone) how to express more fine-grained components in
moz.build files.

I'd be happy with another solution that provides better expressiveness
and clearer semantics, whatever form that takes.

Dustin


On Tue, Mar 20, 2018 at 1:44 PM, Gregory Szorc <[hidden email]> wrote:

> moz.build files have a mechanism for associating metadata with files [1]. We
> use the "with Files()" primitive [2] for:
>
> * Associating files with Bugzilla components
> * Defining how files impact certain CI components
>
> In the future, I fully anticipate us wanting to have a primitive that allows
> us to associate files with modules and/or reviewers. This would enable tools
> to automatically pick reviewer(s) for files that changed in a commit, for
> example.
>
> Putting this files-based metadata in moz.build files seemed like the correct
> decision initially. moz.build files existed and it seemed better to reuse an
> existing solution in the same space than to invent a new wheel.
>
> However, some deep cracks have formed that are causing me to question
> whether continuing to shoehorn this metadata into moz.build files is the
> right way to go.
>
> "If you build it they will come." The mapping of files to bug components is
> valuable and people wanted an easy way to query it. So, there's an endpoint
> on hg.mozilla.org that allows callers to specify a set of filenames and
> resolve the moz.build Files() metadata [3]. Because moz.build files are
> Python files, this involves running a heavily sandboxed process on the
> server with a snapshot of the Python code for evaluating moz.build files. A
> big problem with this is that this Python code can only evaluate moz.build
> files that were valid at the time the code snapshot was made. So when we
> upgrade the Python code, we may be unable to read moz.build files on older
> revisions. And when we add new primitives to moz.build files, the old Python
> code may choke reading moz.build files from newer revisions. It's a giant
> mess and is causing problems with automated tooling that needs moz.build
> Files metadata lookups to "just work."
>
> In bug 1402010, I wrote some PoC patches that change moz.build evaluation of
> Files metadata to strip the Python AST of all nodes that aren't relevant to
> "with Files()" blocks. This drastically reduces the surface area of things
> that can cause moz.build evaluation to fail when using a snapshot of the
> Python code for evaluating moz.build files. While that solution does work, I
> have my doubts that it is the right step forward. For one, you lose some of
> the Python niceties, such as the ability to reference a variable or
> higher-level primitive when using "with Files()." For example, you may want
> to iterate over a list of patterns to define common metadata. You could no
> longer do that if the AST is rewritten in Files evaluation mode. That
> action-at-a-distance would be confusing to people editing moz.build files.
>
> Anyway, I'm really tempted to declare the shoehorning of files-based
> metadata that isn't essential to the build system into moz.build files to be
> a failed experiment. I'm tempted to say we should extract this static data
> that is non-essential to builds into standalone data files (e.g. YAML
> files). Essentially, we would invent a new data format for representing
> files-based metadata and then port moz.build Files() blocks - namely those
> declaring BUG_COMPONENT - to the new format. We would update tooling that
> evaluates files-based metadata to consume the new files. The new format
> would strive to be backwards and forwards compatible so that anyone with a
> snapshot of the code for evaluating the files would be able to evaluate very
> old and newer files to some degree. We may or may not keep Files() in
> moz.build files. The CI scheduling primitives in there are arguably the
> domain of the build system and should continue to live in moz.build files.
> But we could have that discussion.
>
> This post is basically me saying "I'm not sure what we should do." I still
> like the idea of shoehorning things into moz.build files because it is
> convenient and doesn't fragment where such knowledge is defined. But at the
> same time, that is technically difficult to support and that difficulty is
> actively causing pain due to existing solutions being brittle.
>
> I'm curious how others feel about this issue. And if we wanted to talk about
> what a standalone file would look like, be named, etc, I think that would be
> a constructive exercise.
>
> [1]
> https://firefox-source-docs.mozilla.org/build/buildsystem/files-metadata.html
> [2]
> https://firefox-source-docs.mozilla.org/build/buildsystem/mozbuild-symbols.html#sub-context-files
> [3]
> https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/mozbuildinfo.html
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: The future of files-based metadata in moz.build files

Christopher Manchester
I'll only add that moving this to a separate file set would have the benefit of significantly simplifying the moz.build reading code, which is quite complex and somewhat slow. I don't have a profile offhand that points to Files() as a bottleneck, but it could become a concern if we were to continue extending it to be everything we want it to be.

Moving generally to moz.build files containing almost only things needed to execute the build would be a welcome trend as far as I'm concerned.

Chris

On Wed, Mar 21, 2018 at 12:33 AM, Dustin Mitchell <[hidden email]> wrote:
I agree that the fit with moz.build files is awkward.  Other issues
(besides those tom raised) include:
 - two different methods of reading files can give different results
 - the method of combining different `with Files` clauses, even in
different files, is not clear
 - *very* deep Python magic used in the implementation makes it
difficult to understand or modify (and I'm no Python n00b)

We haven't gotten very far with SCHEDULES partly because it's not
clear (to me or anyone) how to express more fine-grained components in
moz.build files.

I'd be happy with another solution that provides better expressiveness
and clearer semantics, whatever form that takes.

Dustin


On Tue, Mar 20, 2018 at 1:44 PM, Gregory Szorc <[hidden email]> wrote:
> moz.build files have a mechanism for associating metadata with files [1]. We
> use the "with Files()" primitive [2] for:
>
> * Associating files with Bugzilla components
> * Defining how files impact certain CI components
>
> In the future, I fully anticipate us wanting to have a primitive that allows
> us to associate files with modules and/or reviewers. This would enable tools
> to automatically pick reviewer(s) for files that changed in a commit, for
> example.
>
> Putting this files-based metadata in moz.build files seemed like the correct
> decision initially. moz.build files existed and it seemed better to reuse an
> existing solution in the same space than to invent a new wheel.
>
> However, some deep cracks have formed that are causing me to question
> whether continuing to shoehorn this metadata into moz.build files is the
> right way to go.
>
> "If you build it they will come." The mapping of files to bug components is
> valuable and people wanted an easy way to query it. So, there's an endpoint
> on hg.mozilla.org that allows callers to specify a set of filenames and
> resolve the moz.build Files() metadata [3]. Because moz.build files are
> Python files, this involves running a heavily sandboxed process on the
> server with a snapshot of the Python code for evaluating moz.build files. A
> big problem with this is that this Python code can only evaluate moz.build
> files that were valid at the time the code snapshot was made. So when we
> upgrade the Python code, we may be unable to read moz.build files on older
> revisions. And when we add new primitives to moz.build files, the old Python
> code may choke reading moz.build files from newer revisions. It's a giant
> mess and is causing problems with automated tooling that needs moz.build
> Files metadata lookups to "just work."
>
> In bug 1402010, I wrote some PoC patches that change moz.build evaluation of
> Files metadata to strip the Python AST of all nodes that aren't relevant to
> "with Files()" blocks. This drastically reduces the surface area of things
> that can cause moz.build evaluation to fail when using a snapshot of the
> Python code for evaluating moz.build files. While that solution does work, I
> have my doubts that it is the right step forward. For one, you lose some of
> the Python niceties, such as the ability to reference a variable or
> higher-level primitive when using "with Files()." For example, you may want
> to iterate over a list of patterns to define common metadata. You could no
> longer do that if the AST is rewritten in Files evaluation mode. That
> action-at-a-distance would be confusing to people editing moz.build files.
>
> Anyway, I'm really tempted to declare the shoehorning of files-based
> metadata that isn't essential to the build system into moz.build files to be
> a failed experiment. I'm tempted to say we should extract this static data
> that is non-essential to builds into standalone data files (e.g. YAML
> files). Essentially, we would invent a new data format for representing
> files-based metadata and then port moz.build Files() blocks - namely those
> declaring BUG_COMPONENT - to the new format. We would update tooling that
> evaluates files-based metadata to consume the new files. The new format
> would strive to be backwards and forwards compatible so that anyone with a
> snapshot of the code for evaluating the files would be able to evaluate very
> old and newer files to some degree. We may or may not keep Files() in
> moz.build files. The CI scheduling primitives in there are arguably the
> domain of the build system and should continue to live in moz.build files.
> But we could have that discussion.
>
> This post is basically me saying "I'm not sure what we should do." I still
> like the idea of shoehorning things into moz.build files because it is
> convenient and doesn't fragment where such knowledge is defined. But at the
> same time, that is technically difficult to support and that difficulty is
> actively causing pain due to existing solutions being brittle.
>
> I'm curious how others feel about this issue. And if we wanted to talk about
> what a standalone file would look like, be named, etc, I think that would be
> a constructive exercise.
>
> [1]
> https://firefox-source-docs.mozilla.org/build/buildsystem/files-metadata.html
> [2]
> https://firefox-source-docs.mozilla.org/build/buildsystem/mozbuild-symbols.html#sub-context-files
> [3]
> https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/mozbuildinfo.html
_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds


_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds
Reply | Threaded
Open this post in threaded view
|

Re: The future of files-based metadata in moz.build files

Gregory Szorc-3
In reply to this post by Gregory Szorc-3
On Tue, Mar 20, 2018 at 10:44 AM, Gregory Szorc <[hidden email]> wrote:
moz.build files have a mechanism for associating metadata with files [1]. We use the "with Files()" primitive [2] for:

* Associating files with Bugzilla components
* Defining how files impact certain CI components

In the future, I fully anticipate us wanting to have a primitive that allows us to associate files with modules and/or reviewers. This would enable tools to automatically pick reviewer(s) for files that changed in a commit, for example.

Putting this files-based metadata in moz.build files seemed like the correct decision initially. moz.build files existed and it seemed better to reuse an existing solution in the same space than to invent a new wheel.

However, some deep cracks have formed that are causing me to question whether continuing to shoehorn this metadata into moz.build files is the right way to go.

"If you build it they will come." The mapping of files to bug components is valuable and people wanted an easy way to query it. So, there's an endpoint on hg.mozilla.org that allows callers to specify a set of filenames and resolve the moz.build Files() metadata [3]. Because moz.build files are Python files, this involves running a heavily sandboxed process on the server with a snapshot of the Python code for evaluating moz.build files. A big problem with this is that this Python code can only evaluate moz.build files that were valid at the time the code snapshot was made. So when we upgrade the Python code, we may be unable to read moz.build files on older revisions. And when we add new primitives to moz.build files, the old Python code may choke reading moz.build files from newer revisions. It's a giant mess and is causing problems with automated tooling that needs moz.build Files metadata lookups to "just work."

In bug 1402010, I wrote some PoC patches that change moz.build evaluation of Files metadata to strip the Python AST of all nodes that aren't relevant to "with Files()" blocks. This drastically reduces the surface area of things that can cause moz.build evaluation to fail when using a snapshot of the Python code for evaluating moz.build files. While that solution does work, I have my doubts that it is the right step forward. For one, you lose some of the Python niceties, such as the ability to reference a variable or higher-level primitive when using "with Files()." For example, you may want to iterate over a list of patterns to define common metadata. You could no longer do that if the AST is rewritten in Files evaluation mode. That action-at-a-distance would be confusing to people editing moz.build files.

Anyway, I'm really tempted to declare the shoehorning of files-based metadata that isn't essential to the build system into moz.build files to be a failed experiment. I'm tempted to say we should extract this static data that is non-essential to builds into standalone data files (e.g. YAML files). Essentially, we would invent a new data format for representing files-based metadata and then port moz.build Files() blocks - namely those declaring BUG_COMPONENT - to the new format. We would update tooling that evaluates files-based metadata to consume the new files. The new format would strive to be backwards and forwards compatible so that anyone with a snapshot of the code for evaluating the files would be able to evaluate very old and newer files to some degree. We may or may not keep Files() in moz.build files. The CI scheduling primitives in there are arguably the domain of the build system and should continue to live in moz.build files. But we could have that discussion.

This post is basically me saying "I'm not sure what we should do." I still like the idea of shoehorning things into moz.build files because it is convenient and doesn't fragment where such knowledge is defined. But at the same time, that is technically difficult to support and that difficulty is actively causing pain due to existing solutions being brittle.

I'm curious how others feel about this issue. And if we wanted to talk about what a standalone file would look like, be named, etc, I think that would be a constructive exercise.


We discussed this at our team meeting yesterday. We pretty much all agreed that continuing to shoehorn this data into moz.build files isn't the right approach. We think a new, standalone file that is static, safe, backwards and forwards compatible, and human friendly is the best approach.

We think YAML (or at least a safe subset of it) is a reasonable format for the new file type. Byron is already planning to use YAML to define project vendoring metadata in standalone files. It is likely that the file he's designing will either change to support files metadata from its onset or it will be co-opted at a later date: having a single file defining static metadata seems better than N+1 files.

We're not yet sure the timetable for the transition from moz.build files or who will work on it.

Bug 1449604 has been filed to track this transition.

Thank you to everyone who provided feedback!

_______________________________________________
dev-builds mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-builds