Reports regzbot tracking#1925
Conversation
| "--action=new_issues", | ||
| "--to=kernelci-results@groups.io", | ||
| "--cc=gus@collabora.com", | ||
| "--cc=linux@leemhuis.info, ben.copeland@linaro.org", |
There was a problem hiding this comment.
Please use regressions@leemhuis.info for me instead.
Or maybe even better: regressions@lists.linux.dev – I monitor that closely, so it's just as good as mailing me directly. And that way other can join in easily if they want to.
There was a problem hiding this comment.
Thanks! It already goes out to this list, so I have just removed the CC here.
| {% if builds and builds[0]["last_pass_commit"] -%} | ||
| #regzbot introduced: {{ builds[0]["last_pass_commit"] }}..{{ issue["git_commit_hash"] }} | ||
| {% endif -%} | ||
| #regzbot title: build regression on {{ issue["tree_name"] }}/{{ issue["git_repository_branch"] }} | ||
| #regzbot link: https://d.kernelci.org/i/{{ issue["id"] }} | ||
|
|
There was a problem hiding this comment.
Without an "#regzbot introduced: …" regzbot won't act (at least for now, but changing that would not be easy), because it needs a commit or an range where the regressions started. So I'd the "{% endif -%}" needs to move down three lines afaics.
There was a problem hiding this comment.
Have changed in the second review, I have moved the endif down three lines. It will only trigger when there's an introduced change (the full introduced, title and link) and when there's no last-good commit (the regzbot block is omitted). So it'll only appear when the last-good/test query finds a prior passing. If nothing is found within the window (18 days) there's no range so there won't be a regzbot block.
| AND b.compiler = %(compiler)s | ||
| AND b.status = 'PASS' | ||
| AND c.origin = %(origin)s | ||
| AND b._timestamp >= NOW() - INTERVAL %(interval)s |
There was a problem hiding this comment.
might be safer to just use start_time instead of _timestamp, as it will retrieve all builds compiled in the given interval.
There was a problem hiding this comment.
Good idea, have switched to start_time thanks!
| if last_build: | ||
| incident["last_pass"] = last_build[0]["start_time"] | ||
| incident["last_pass_commit"] = last_build[0]["git_commit_hash"] | ||
| incident["last_pass_id"] = last_build[0]["id"] |
There was a problem hiding this comment.
could we call it last_pass_build_id? just to be clear on which id we are talking here.
| platform=platform, | ||
| test_start_time=test["start_time"], | ||
| config_name=test["config_name"], | ||
| field_timestamp=test["_timestamp"], |
There was a problem hiding this comment.
this call is missing a required parameter group_size that limits how far in history we go. We might need to either make this an optional parameter (allowing to return full history) or pass a value large enough here.
There was a problem hiding this comment.
Added group_size=10. Does that fit? A fail streak longer than that will probably lead to no range, which I feel is a safe outcome here. Can bump more though if you like.
There was a problem hiding this comment.
I think it might suffice, the only possible downside of the group size might be that the bigger it is, the slower that query might get.
But since this is a notification command, we are probably not that worried about bigger latency.
2835e76 to
322375b
Compare
|
Commit 2 and 3 are inverted (you are using last good in the templates before you added it) 🙈 |
322375b to
fec68a5
Compare
How does it look now? Good spot thanks |
base.txt opened with a blank line and an empty header block, so every report rendered with several blank lines before the first content line. Trim the top of base.txt so reports start at their first line, and refresh the metrics example fixture to match. Signed-off-by: Ben Copeland <ben.copeland@linaro.org>
Add kcidb_last_build_without_issue() to fetch the most recent passing build of the same config (filtered by start_time) that does not carry the issue, and set last_pass_commit on build incidents. For standalone test reports, derive the last passing commit from the status history (passing the required group_size). This lets the mails emit a real last_good..HEAD introduced range. Signed-off-by: Ben Copeland <ben.copeland@linaro.org>
Move the Reported-by and #kernelci trailer up under the summary, and add a regzbot tracking block (introduced/title/link) in its own blank-line-delimited paragraph as the handling-regressions docs require. The block is emitted only when a last known good commit is available, so regzbot is never handed a single unbisected commit or a bare title/link it cannot act on. Signed-off-by: Ben Copeland <ben.copeland@linaro.org>
The regressions list is already added in code for mainline/next, so the explicit Cc is not needed. Signed-off-by: Ben Copeland <ben.copeland@linaro.org>
fec68a5 to
fd6f3b8
Compare
|
Rebased |
|
Could you please post the diff for one report in here? Or better yet, add a test that uses a fixture similar to backend/kernelCI_app/tests/unitTests/commands/fixtures/metrics_notifications_example.txt |
https://gist.github.com/bhcopeland/be0edaf87d7f6034ec898ae1642563b5 Although, after our chat yesterday, maybe we should drop tests+boots and just focus on build issues for now. @knurd FYI can you review this. Since these are the ones we should mainly focus on with a issue associated to it. I am not sure which way to push, so leaving it open to discussion. |
Here we go; quotes from the linked page from here on out:
Minor: make that one line, save percious space. :-D
Side note: in that particular case the error is not much helpful (and kernel devs might be to busy to follow any links to look into details), so it might be good to have a full dmesg at hand right in the mail -- and/or maybe even a diff to a dmesg from a working system to see what's different.
I'd say this statement in it's current form is likely counter-productive (and there are no special regzbot tags needed in the fix -- and what's in the section that follow it are tags to make regzbot track the issue). Better use something like this: This obviously requires you to know that msgid ahead of sending, hope that doesn't make things hard for you.
Is the title used the same as the mail's subject? Then just omit it, regzbot will take it from there if not specified. And move that section to the end of the mail somewhere, it's just for the bot, so no need to waste the attention of humans. :-D
Do you want that to end up in the git footer of a fix? Pretty sure that is not going to fly (Linus and others are pretty restrictive when it comes to the footer and I suspect some review tools like checkpatch will complain, too). If you want to automate things, automate on the "Link:" tag I mentioned above, that is established and kinda mandatory (nevertheless some subsystem never add them).
Not a regzbot-specific thing, but while at it: How long might the "Log excerpt:" get? If it's always this short I'd say "scratch the one-line summary near the top and put the log there, as that's the thing people care about"
Even if it get's longer I'd say: move that stuff downwards. And maybe make it easier to grasp for developers that have never interacted with kernelci by saying what to find where (e.g. "For complete logs and details about the environment used, see https://d.kernelci.org/i/maestro:98c2d110d805d1a864a89edd1e2c447175476af8). And explain how that is different from the other d.kernelci.org link. Even I'm not sure if I understand this corrently: one is the error in general, the others are specific CI tests where the error occurred? HTH |
|
Thanks for you feedback @knurd !
Dropped!
Yes this should be added, I believe though this might be tricky as we don't have this in template context, so will need a way to wire it from pipeline. I'd say bank this for future work.
This is doable. Message-ID can be get ahead of time, just need to change the ordering, but should be fine. It'll use something like lore.kernel.org/all/. The reports go up to https://lore.kernel.org/kernelci/, and mainline/next reports additionally via the regressions list Cc. So I believe this should be all fine for regzbot tracking + this change.
Ack, will drop the title.
:D Will move this to bottom.
Hm, no, it was never intended as a git-footer tag, it's a body marker for KernelCI's own tracking; it now lives at the very bottom with the bot block to make that obvious
Yeah, logspec isn't always short, so I've kept the one-line summary but moved the excerpt directly under it (so show the error first). The dashboard/giturl/commit bullets have moved down with a lead-in as you suggested. And yes, your reading is right: /i/ is the error in general (full logs, history, everything affected), the /build/ and /test/ links are the individual CI runs that hit it. The new lead-in text should make that clearer. https://gist.github.com/bhcopeland/a41ceb117bc1101fdeebeb5087d04f0c Haven't pushed code yet, all local changes, but here is output of above changes.
|
Better, but if you ask me, move it even lower. E.g. add another "--" or maybe even something "-- For the bots: " below the "This is an experimental report [...]" and put that stuff there right at the end.
Thx, LGTM! |
Rework the issue and test report mails so the details humans need come first and the machine-readable tags come last. Drop the "Hello," greeting, move the log excerpt up next to the issue description, list the dashboard details consistently, and move the #regzbot and #kernelci tags to the very end of the body. Generate the Message-ID before rendering and pass it through to the templates and the smtp helper, so each report can carry a Link: to its own thread on lore.kernel.org next to the Reported-by tag. Signed-off-by: Ben Copeland <ben.copeland@linaro.org>
Render the issue and test report templates with fixed data and compare the output against committed example files, so template changes show up as reviewable diffs. Add a refresh_report_examples management command to regenerate the fixtures when a template change is intentional. Signed-off-by: Ben Copeland <ben.copeland@linaro.org>
We need to sanity-check the SQL command that returns a sensible "last good build".
Other than that fixes are pretty simple: