-
Notifications
You must be signed in to change notification settings - Fork 1
Contributors list auto-updater #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
60275fb
c3556bc
257dbb4
0c4e23a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,147 @@ | ||
| name: Update People List | ||
|
|
||
| on: | ||
| schedule: | ||
| # Runs automatically daily at midnight UTC | ||
| - cron: '0 0 * * *' | ||
| workflow_dispatch: | ||
| # Allows you to trigger the workflow manually from the Actions tab | ||
|
|
||
| permissions: | ||
| contents: write # Necessary to allow the bot to commit and push changes | ||
|
|
||
| jobs: | ||
| update-people-md: | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v5 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v6 | ||
| with: | ||
| python-version: '3.14' | ||
|
|
||
| - name: Process TSV and Update Markdown | ||
| run: | | ||
| python3 - <<'EOF' | ||
| import urllib.request | ||
| import csv | ||
| import os | ||
| from collections import OrderedDict | ||
|
|
||
| # Define the source TSV URL from the main SasView repo | ||
| url = "https://raw.githubusercontent.com/SasView/sasview/main/build_tools/contributors.tsv" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this also need the equivalent files for |
||
|
|
||
| try: | ||
| req = urllib.request.Request(url) | ||
| with urllib.request.urlopen(req) as response: | ||
| lines = [line.decode('utf-8') for line in response.readlines()] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. my |
||
| except Exception as e: | ||
| print(f"Error fetching TSV: {e}") | ||
| exit(1) | ||
|
|
||
| # Parse the TSV | ||
| reader = csv.DictReader(lines, delimiter='\t') | ||
|
|
||
| # Dynamically map headers to be case-insensitive to ensure resilience | ||
| headers = {name.lower().strip(): name for name in reader.fieldnames if name} | ||
|
|
||
| name_col = headers.get('name', 'Name') | ||
| creator_col = headers.get('creator', 'Creator') | ||
| producer_col = headers.get('producer', 'Producer') | ||
| related_person_col = headers.get('relatedperson', 'RelatedPerson') | ||
|
|
||
| # Fallback through possible affiliation header names | ||
| affil_col = headers.get('affiliation', headers.get('institution', 'Affiliation')) | ||
|
|
||
| creators = [] | ||
| producers = [] | ||
|
|
||
| # Categorize users by their roles | ||
| for row in reader: | ||
| raw_name = row.get(name_col, '').strip() | ||
| creator = row.get(creator_col, '').strip().lower() | ||
| producer = row.get(producer_col, '').strip().lower() | ||
| related_person = row.get(related_person_col, '').strip().lower() | ||
| affiliation = row.get(affil_col, '').strip() | ||
|
|
||
| if not raw_name: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. move earlier to fail this iteration earlier? (Performance is absolutely not a consideration - I just spent time reading the code to try to work out why the other |
||
| continue | ||
|
|
||
| # Convert "Last Name, First Name" to "F. Last Name" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sort of transformation is really problematic. The general advice for something like this is "don't" with a follow-up of "names are really complicated and they don't work the way this code assumes" and "we mustn't tell people that they are spelling their names wrong". It will do the wrong thing for lots of different folks - we might not currently have people in the list, but we will one day. A couple of quick examples off the top of my head
Overall advice - put the name in the preferred form in the tsv file and do not try to programmatically manipulate. If both a short form and long form are really needed, then store both. If you really want to programmatic manipulation then you need all manner of escape mechanisms to process the names safely (see Yes it is possible to store the name without a comma in it already which solves all but one of the above, but that just highlights there's already a way to make it so that the data is right so that name manipulations aren't needed. Related good read: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ |
||
| if ',' in raw_name: | ||
| last_name, first_name = raw_name.split(',', 1) | ||
| last_name = last_name.strip() | ||
| first_name = first_name.strip() | ||
|
|
||
| if first_name: | ||
| formatted_name = f"{first_name[0].upper()}. {last_name}" | ||
| else: | ||
| formatted_name = last_name | ||
| else: | ||
| formatted_name = raw_name | ||
|
|
||
| # Store raw_name as a sort_key so it still alphabetizes by Last Name | ||
| entry = {'name': formatted_name, 'sort_key': raw_name, 'affiliation': affiliation} | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is assuming |
||
|
|
||
| if 'x' in creator: | ||
| creators.append(entry) | ||
| elif 'x' in producer or 'x' in related_person: | ||
| producers.append(entry) | ||
|
|
||
| # Sort alphabetically by the original Last Name format | ||
| creators.sort(key=lambda x: x['sort_key'].lower()) | ||
| producers.sort(key=lambda x: x['sort_key'].lower()) | ||
|
|
||
| affil_dict = OrderedDict() | ||
|
|
||
| # Helper function to generate publication-style citation numbers | ||
| def format_section(people): | ||
| formatted_names = [] | ||
| for p in people: | ||
| # Assuming multiple affiliations might be separated by semicolons | ||
| affiliations = [a.strip() for a in p['affiliation'].split(';') if a.strip()] | ||
| superscripts = [] | ||
| for a in affiliations: | ||
| if a not in affil_dict: | ||
| affil_dict[a] = len(affil_dict) + 1 | ||
| superscripts.append(str(affil_dict[a])) | ||
|
|
||
| if superscripts: | ||
| formatted_names.append(f"{p['name']}<sup>{','.join(superscripts)}</sup>") | ||
| else: | ||
| formatted_names.append(p['name']) | ||
| return ", ".join(formatted_names) | ||
|
|
||
|
|
||
| creators_str = format_section(creators) | ||
| producers_str = format_section(producers) | ||
|
|
||
| # Format affiliations block (e.g. 1 Institution A) | ||
| affil_str = ", ".join([f"<sup>{idx}</sup> _{affil}_" for affil, idx in affil_dict.items()]) | ||
|
|
||
| # Ensure the _includes directory exists | ||
| os.makedirs('_includes', exist_ok=True) | ||
|
|
||
| # Write out the three files | ||
| with open('_includes/creators.html', 'w', encoding='utf-8') as f: | ||
| f.write(creators_str + "\n") | ||
| print("Updated _includes/creators.html") | ||
|
|
||
| with open('_includes/producers.html', 'w', encoding='utf-8') as f: | ||
| f.write(producers_str + "\n") | ||
| print("Updated _includes/producers.html") | ||
|
|
||
| with open('_includes/affiliations.html', 'w', encoding='utf-8') as f: | ||
| f.write(affil_str + "\n") | ||
| print("Updated _includes/affiliations.html") | ||
|
|
||
| EOF | ||
|
|
||
| - name: Commit and Push Changes | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the commit then trigger a website rebuild or is the updated file immediately live? (have never looked at how the rest of this works) |
||
| uses: stefanzweifel/git-auto-commit-action@v5 | ||
| with: | ||
| commit_message: "docs: automatic update of people from the SasView contributors.tsv" | ||
| file_pattern: "_includes/*" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing EOL before EOF |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| <sup>1</sup> _University of Luxembourg_, <sup>2</sup> _Institut Laue-Langevin_, <sup>3</sup> _University of Tennessee, Knoxville_, <sup>4</sup> _ISIS Neutron and Muon Source_, <sup>5</sup> _Oak Ridge National Laboratory_, <sup>6</sup> _Technical University, Delft_, <sup>7</sup> _National Institute of Standards and Technology_, <sup>8</sup> _University of Delaware_, <sup>9</sup> _Paul Scherrer Institute_, <sup>10</sup> _Charles University_, <sup>11</sup> _The Debian Project_, <sup>12</sup> _University of Cologne_, <sup>13</sup> _ETH Zurich_, <sup>14</sup> _European Spallation Source_, <sup>15</sup> _California Institute of Technology_, <sup>16</sup> _Diamond Light Source_, <sup>17</sup> _University of Maryland_, <sup>18</sup> _University of Copenhagen_, <sup>19</sup> _Brookhaven National Laboratory_, <sup>20</sup> _Aarhus University_, <sup>21</sup> _Australian National Science and Technology Organisation_, <sup>22</sup> _Lund University_, <sup>23</sup> _SciLifeLab at Lund University_, <sup>24</sup> _University of New South Wales_, <sup>25</sup> _University of Princeton_, <sup>26</sup> _Columbia University_ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing EOL before EOF (but the code should have added one?) |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| M. Adams<sup>1</sup>, N. Agouzal<sup>2</sup>, G. Alina<sup>3</sup>, Z. Attala<sup>4</sup>, M. Backman<sup>5</sup>, J. Bakker<sup>6</sup>, P. Beaucage<sup>7</sup>, J. Berger<sup>8</sup>, R. Bourne<sup>4</sup>, W. Bouwman<sup>6</sup>, I. Bressler<sup>9</sup>, P. Butler<sup>7</sup>, I. Cadwallader-Jones<sup>2</sup>, K. Campbell<sup>4</sup>, J. Cho<sup>3</sup>, R. Conan<sup>10</sup>, T. Cooper-Benun<sup>4</sup>, R. Cortes Hernandez<sup>3</sup>, J. Crake-Merani<sup>4</sup>, A. Detiste<sup>11</sup>, M. Doucet<sup>5</sup>, J. Doutch<sup>4</sup>, D. Dresen<sup>12</sup>, G. Drosos<sup>13</sup>, C. Durniak<sup>14</sup>, C. Farrow<sup>15</sup>, R. Ferraz Leal<sup>5</sup>, R. Ford<sup>15</sup>, L. Forster<sup>16</sup>, J. Gaudet<sup>17</sup>, M. Gerina<sup>10</sup>, P. Gilbert<sup>7</sup>, M. Gonzalez<sup>2</sup>, O. Hammond<sup>14</sup>, T. Hansen<sup>18</sup>, R. Heenan<sup>4</sup>, S. Henson<sup>5</sup>, E. Hewins<sup>4</sup>, A. Hicks<sup>5</sup>, D. Honecker<sup>4</sup>, A. Jackson<sup>14</sup>, G. Jensen<sup>7</sup>, P. Juhas<sup>19</sup>, J. Karliczek<sup>2</sup>, P. Kienzle<sup>7</sup>, S. King<sup>4</sup>, S. Kline<sup>7</sup>, J. Krzywon<sup>7</sup>, J. Lin<sup>15</sup>, Y. Liu<sup>7</sup>, R. Lopes<sup>4</sup>, D. Lozano<sup>2</sup>, K. Lytje<sup>20</sup>, D. Mannicke<sup>21</sup>, B. Maranville<sup>7</sup>, A. Markvardsen<sup>4</sup>, N. Martinez<sup>2</sup>, M. McKerns<sup>15</sup>, B. Miller<sup>7</sup>, K. Mothander<sup>22</sup>, R. Murphy<sup>7</sup>, A. Nelson<sup>21</sup>, T. Nielsen<sup>14</sup>, L. O'Driscoll<sup>4</sup>, M. Oakley<sup>4</sup>, H. Park<sup>7</sup>, P. Parker<sup>4</sup>, M. Patrou<sup>5</sup>, P. Peterson<sup>5</sup>, W. Potrzebowski<sup>23</sup>, S. Prescott<sup>24</sup>, M. Rakitin<sup>19</sup>, T. Richter<sup>16</sup>, J. Rooks<sup>8</sup>, P. Rozyczko<sup>14</sup>, X. Shan<sup>7</sup>, P. Sharp<sup>4</sup>, S. Shrestha<sup>4</sup>, T. Snow<sup>16</sup>, A. Stellhorn<sup>14</sup>, S. Teixeira<sup>7</sup>, J. Tumarkin<sup>3</sup>, A. Washington<sup>4</sup>, K. Weigandt<sup>7</sup>, R. Whitley<sup>4</sup>, L. Wilkins<sup>4</sup>, C. Wolf<sup>7</sup>, A. Zhang<sup>25</sup>, A. Zheng<sup>7</sup> |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| A. Anuchitanukul<sup>4</sup>, P. Corona<sup>26</sup>, G. Fragneto<sup>14</sup>, B. Fultz<sup>15</sup>, M. Knudsen<sup>18</sup>, S. Krueger<sup>7</sup>, A. Larsen<sup>18</sup>, S. Lee<sup>27</sup>, T. Narayanan<sup>28</sup>, D. Parsons<sup>29</sup>, B. Pauw<sup>30</sup>, T. Perring<sup>4</sup>, L. Porcar<sup>2</sup>, L. Pozzo<sup>31</sup>, S. Prevost<sup>2</sup>, A. Rennie<sup>32</sup>, G. Roberts<sup>33</sup>, T. Rod<sup>14</sup>, Y. Shang<sup>5</sup>, J. Taylor<sup>14</sup>, L. Udby<sup>34</sup>, D. Zakoutna<sup>10</sup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not wild about Python embedded inside YAML. Editing, (syntax highlighting?) debugging, even just running is harder than it needs to be, it never gets linted or formatted by
ruff, etc. Would breaking it into a script be reasonable?