Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions .github/workflows/update_people.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
name: Update People List

on:
schedule:
# Runs automatically daily at midnight UTC
- cron: '0 0 * * *'
workflow_dispatch:
# Allows you to trigger the workflow manually from the Actions tab

permissions:
contents: write # Necessary to allow the bot to commit and push changes

jobs:
update-people-md:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v5

- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.14'

- name: Process TSV and Update Markdown
run: |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not wild about Python embedded inside YAML. Editing, (syntax highlighting?) debugging, even just running is harder than it needs to be, it never gets linted or formatted by ruff, etc. Would breaking it into a script be reasonable?

python3 - <<'EOF'
import urllib.request
import csv
import os
from collections import OrderedDict

# Define the source TSV URL from the main SasView repo
url = "https://raw.githubusercontent.com/SasView/sasview/main/build_tools/contributors.tsv"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also need the equivalent files for sasdata and sasmodels?


try:
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
lines = [line.decode('utf-8') for line in response.readlines()]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my ruff comment above was prompted by noticing the mixture of " and ' in the code. (that is not a blocker to merging, but we do have an active discussion about using ruff on new code)

except Exception as e:
print(f"Error fetching TSV: {e}")
exit(1)

# Parse the TSV
reader = csv.DictReader(lines, delimiter='\t')

# Dynamically map headers to be case-insensitive to ensure resilience
headers = {name.lower().strip(): name for name in reader.fieldnames if name}

name_col = headers.get('name', 'Name')
creator_col = headers.get('creator', 'Creator')
producer_col = headers.get('producer', 'Producer')
related_person_col = headers.get('relatedperson', 'RelatedPerson')

# Fallback through possible affiliation header names
affil_col = headers.get('affiliation', headers.get('institution', 'Affiliation'))

creators = []
producers = []

# Categorize users by their roles
for row in reader:
raw_name = row.get(name_col, '').strip()
creator = row.get(creator_col, '').strip().lower()
producer = row.get(producer_col, '').strip().lower()
related_person = row.get(related_person_col, '').strip().lower()
affiliation = row.get(affil_col, '').strip()

if not raw_name:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move earlier to fail this iteration earlier? (Performance is absolutely not a consideration - I just spent time reading the code to try to work out why the other row.get() calls needed to run, to eventually conclude that they didn't)

continue

# Convert "Last Name, First Name" to "F. Last Name"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of transformation is really problematic. The general advice for something like this is "don't" with a follow-up of "names are really complicated and they don't work the way this code assumes" and "we mustn't tell people that they are spelling their names wrong".

It will do the wrong thing for lots of different folks - we might not currently have people in the list, but we will one day. A couple of quick examples off the top of my head

  • commas in their names (I've seen this break name handling code before - John Smith, IV)
  • people who don't use their first name (Oppenheimer, J. Robert)
  • names where the first name is not the given name and should not be abbreviated (Shinzo, Abe, many names particularly from southern and eastern Asia, perhaps westernised or perhaps not)
  • names where the first name just should not be abbreviated (initials is a very western construction and makes little sense in many other cultures, eliciting a "what are you doing to my name, stop that" response)
  • Assuming that the first name starts with an upper case letter (or making it so in the abbreviation) is also problematic.
  • Assuming that the person even has first names or last names is problematic tbh... but they probably won't have a comma in the field.
  • [there are many more]

Overall advice - put the name in the preferred form in the tsv file and do not try to programmatically manipulate. If both a short form and long form are really needed, then store both. If you really want to programmatic manipulation then you need all manner of escape mechanisms to process the names safely (see bibtex/bst files, for example), and it mostly comes down to "put the name in the form you want it in the source data".

Yes it is possible to store the name without a comma in it already which solves all but one of the above, but that just highlights there's already a way to make it so that the data is right so that name manipulations aren't needed.

Related good read: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

if ',' in raw_name:
last_name, first_name = raw_name.split(',', 1)
last_name = last_name.strip()
first_name = first_name.strip()

if first_name:
formatted_name = f"{first_name[0].upper()}. {last_name}"
else:
formatted_name = last_name
else:
formatted_name = raw_name

# Store raw_name as a sort_key so it still alphabetizes by Last Name
entry = {'name': formatted_name, 'sort_key': raw_name, 'affiliation': affiliation}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is assuming raw_name is "last, first" but the code above implies it might not be, so the sort order will be a problem?


if 'x' in creator:
creators.append(entry)
elif 'x' in producer or 'x' in related_person:
producers.append(entry)

# Sort alphabetically by the original Last Name format
creators.sort(key=lambda x: x['sort_key'].lower())
producers.sort(key=lambda x: x['sort_key'].lower())

affil_dict = OrderedDict()

# Helper function to generate publication-style citation numbers
def format_section(people):
formatted_names = []
for p in people:
# Assuming multiple affiliations might be separated by semicolons
affiliations = [a.strip() for a in p['affiliation'].split(';') if a.strip()]
superscripts = []
for a in affiliations:
if a not in affil_dict:
affil_dict[a] = len(affil_dict) + 1
superscripts.append(str(affil_dict[a]))

if superscripts:
formatted_names.append(f"{p['name']}<sup>{','.join(superscripts)}</sup>")
else:
formatted_names.append(p['name'])
return ", ".join(formatted_names)


creators_str = format_section(creators)
producers_str = format_section(producers)

# Format affiliations block (e.g. 1 Institution A)
affil_str = ", ".join([f"<sup>{idx}</sup> _{affil}_" for affil, idx in affil_dict.items()])

# Ensure the _includes directory exists
os.makedirs('_includes', exist_ok=True)

# Write out the three files
with open('_includes/creators.html', 'w', encoding='utf-8') as f:
f.write(creators_str + "\n")
print("Updated _includes/creators.html")

with open('_includes/producers.html', 'w', encoding='utf-8') as f:
f.write(producers_str + "\n")
print("Updated _includes/producers.html")

with open('_includes/affiliations.html', 'w', encoding='utf-8') as f:
f.write(affil_str + "\n")
print("Updated _includes/affiliations.html")

EOF

- name: Commit and Push Changes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the commit then trigger a website rebuild or is the updated file immediately live? (have never looked at how the rest of this works)

uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: "docs: automatic update of people from the SasView contributors.tsv"
file_pattern: "_includes/*"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing EOL before EOF

1 change: 1 addition & 0 deletions _includes/affiliations.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<sup>1</sup> _University of Luxembourg_, <sup>2</sup> _Institut Laue-Langevin_, <sup>3</sup> _University of Tennessee, Knoxville_, <sup>4</sup> _ISIS Neutron and Muon Source_, <sup>5</sup> _Oak Ridge National Laboratory_, <sup>6</sup> _Technical University, Delft_, <sup>7</sup> _National Institute of Standards and Technology_, <sup>8</sup> _University of Delaware_, <sup>9</sup> _Paul Scherrer Institute_, <sup>10</sup> _Charles University_, <sup>11</sup> _The Debian Project_, <sup>12</sup> _University of Cologne_, <sup>13</sup> _ETH Zurich_, <sup>14</sup> _European Spallation Source_, <sup>15</sup> _California Institute of Technology_, <sup>16</sup> _Diamond Light Source_, <sup>17</sup> _University of Maryland_, <sup>18</sup> _University of Copenhagen_, <sup>19</sup> _Brookhaven National Laboratory_, <sup>20</sup> _Aarhus University_, <sup>21</sup> _Australian National Science and Technology Organisation_, <sup>22</sup> _Lund University_, <sup>23</sup> _SciLifeLab at Lund University_, <sup>24</sup> _University of New South Wales_, <sup>25</sup> _University of Princeton_, <sup>26</sup> _Columbia University_

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing EOL before EOF (but the code should have added one?)

1 change: 1 addition & 0 deletions _includes/creators.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
M. Adams<sup>1</sup>, N. Agouzal<sup>2</sup>, G. Alina<sup>3</sup>, Z. Attala<sup>4</sup>, M. Backman<sup>5</sup>, J. Bakker<sup>6</sup>, P. Beaucage<sup>7</sup>, J. Berger<sup>8</sup>, R. Bourne<sup>4</sup>, W. Bouwman<sup>6</sup>, I. Bressler<sup>9</sup>, P. Butler<sup>7</sup>, I. Cadwallader-Jones<sup>2</sup>, K. Campbell<sup>4</sup>, J. Cho<sup>3</sup>, R. Conan<sup>10</sup>, T. Cooper-Benun<sup>4</sup>, R. Cortes Hernandez<sup>3</sup>, J. Crake-Merani<sup>4</sup>, A. Detiste<sup>11</sup>, M. Doucet<sup>5</sup>, J. Doutch<sup>4</sup>, D. Dresen<sup>12</sup>, G. Drosos<sup>13</sup>, C. Durniak<sup>14</sup>, C. Farrow<sup>15</sup>, R. Ferraz Leal<sup>5</sup>, R. Ford<sup>15</sup>, L. Forster<sup>16</sup>, J. Gaudet<sup>17</sup>, M. Gerina<sup>10</sup>, P. Gilbert<sup>7</sup>, M. Gonzalez<sup>2</sup>, O. Hammond<sup>14</sup>, T. Hansen<sup>18</sup>, R. Heenan<sup>4</sup>, S. Henson<sup>5</sup>, E. Hewins<sup>4</sup>, A. Hicks<sup>5</sup>, D. Honecker<sup>4</sup>, A. Jackson<sup>14</sup>, G. Jensen<sup>7</sup>, P. Juhas<sup>19</sup>, J. Karliczek<sup>2</sup>, P. Kienzle<sup>7</sup>, S. King<sup>4</sup>, S. Kline<sup>7</sup>, J. Krzywon<sup>7</sup>, J. Lin<sup>15</sup>, Y. Liu<sup>7</sup>, R. Lopes<sup>4</sup>, D. Lozano<sup>2</sup>, K. Lytje<sup>20</sup>, D. Mannicke<sup>21</sup>, B. Maranville<sup>7</sup>, A. Markvardsen<sup>4</sup>, N. Martinez<sup>2</sup>, M. McKerns<sup>15</sup>, B. Miller<sup>7</sup>, K. Mothander<sup>22</sup>, R. Murphy<sup>7</sup>, A. Nelson<sup>21</sup>, T. Nielsen<sup>14</sup>, L. O'Driscoll<sup>4</sup>, M. Oakley<sup>4</sup>, H. Park<sup>7</sup>, P. Parker<sup>4</sup>, M. Patrou<sup>5</sup>, P. Peterson<sup>5</sup>, W. Potrzebowski<sup>23</sup>, S. Prescott<sup>24</sup>, M. Rakitin<sup>19</sup>, T. Richter<sup>16</sup>, J. Rooks<sup>8</sup>, P. Rozyczko<sup>14</sup>, X. Shan<sup>7</sup>, P. Sharp<sup>4</sup>, S. Shrestha<sup>4</sup>, T. Snow<sup>16</sup>, A. Stellhorn<sup>14</sup>, S. Teixeira<sup>7</sup>, J. Tumarkin<sup>3</sup>, A. Washington<sup>4</sup>, K. Weigandt<sup>7</sup>, R. Whitley<sup>4</sup>, L. Wilkins<sup>4</sup>, C. Wolf<sup>7</sup>, A. Zhang<sup>25</sup>, A. Zheng<sup>7</sup>
1 change: 1 addition & 0 deletions _includes/producers.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
A. Anuchitanukul<sup>4</sup>, P. Corona<sup>26</sup>, G. Fragneto<sup>14</sup>, B. Fultz<sup>15</sup>, M. Knudsen<sup>18</sup>, S. Krueger<sup>7</sup>, A. Larsen<sup>18</sup>, S. Lee<sup>27</sup>, T. Narayanan<sup>28</sup>, D. Parsons<sup>29</sup>, B. Pauw<sup>30</sup>, T. Perring<sup>4</sup>, L. Porcar<sup>2</sup>, L. Pozzo<sup>31</sup>, S. Prevost<sup>2</sup>, A. Rennie<sup>32</sup>, G. Roberts<sup>33</sup>, T. Rod<sup>14</sup>, Y. Shang<sup>5</sup>, J. Taylor<sup>14</sup>, L. Udby<sup>34</sup>, D. Zakoutna<sup>10</sup>
9 changes: 7 additions & 2 deletions people.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,20 @@ Many people have contributed to this SAS analysis software over the years, and w
### Developers and contributors:
In addition to the core development, contributions include documentation writing and editing, tutorial writing and editing, performing code reviews, setting up and maintaining a variety of related services such as the marketplace, and admin tasks required to keep the infrastructure of the collaboration running.

M. Adams, N. Agouzal, G. Alina, Z. Attala, M. Backman, J. Bakker, P. Beaucage, J. Berger, R. Bourne, W. Bouwman, I. Breßler, P. Butler, I. Caddy-Jones, K. Campbell, R. Charles, J-H. Cho, T. Cooper-Bennun, R. Cortes Hernandez, J. Crake-Merani, A. Detiste, M. Doucet, J. Doutch, D. Dresen, G. Drosos, C. Durniak, C. Farrow, R. Ferraz Leal, R. Ford, L. Forster, J. Gaudet, M. Gerina, P. Gilbert, M. Gonzalez, O. Hammond, T. Hansen, R. Heenan, S. Henson, E. Hewins, A. Hicks, D. Honecker, A. Jackson, G. Jensen, P. Juhas, J. Karliczek, P. Kienzle, S. King, S. Kline, J. Krzywon, A. Larsen, S. Lee, J. Lin, Y. Liu, R. Lopes, D. Lozano, K. Lytje, D. Mannicke, B. Maranville, N. Martinez, M. McKerns, B. Miller, K. Mothander, R. Murphy, A. Nelson, T. Nielsen, M. Oakley, L. O'Driscoll, H. Park, P. Parker, M. Patrou, P. Peterson, W. Potrzebowski, S. Prescott, M. Rakitin, T. Richter, J. Rooks, P. Rozyczko, X. Shan, S. Shrestha, P. Sharp, T. Snow, A. Stellhorn, S, Teixeira, J. Tumarkin, A. Washington, K. Weigandt, R. Whitley, L. Wilkins, C. Wolf, A. Zhang, A. Zheng.
{% include creators.html %}

### Community contributors and external collaborators:
The SasView collaboration goes well beyond the contributions listed above. In particular we wish to thank the following people from the scattering community and beyond for their time in providing feedback and ideas and being advocates for the collaboration.

A. Anuchitanukul, P. Corona, G. Fragneto, M. Knudsen, S. Krueger, A. Markvardsen, T. Narayanan, D. Parsons, B. Pauw, R. Pellicelli, T. Perring, L. Porcar, L. Pozzo, S. Prevost, A. Rennie, G. Roberts, T. Holm Rod, Y. Shang, J. Taylor, L. Udby, D. Zakoutna and J. Zhou.
{% include producers.html %}

Last, but by no means least, we thank all those colleagues and SasView users that help further the collaboration by assisting us with beta testing, by reporting bugs and other idiosyncrasies of the program, and by suggesting improvements and new features.

### Affiliations
Contributions have come from the following facilities and institutions, in the order contributors appear in the above lists.

{% include affiliations.html %}

### Funding
In addition to the staff time and support for code camps provided by the collaboration partners, the following people have been instrumental in obtaining funding for development of SasView.
- B. Fultz (DANSE - <a href="https://www.nsf.gov/awardsearch/showAward?AWD_ID=0412074">NSF Award #0412074</a>)
Expand Down