Skip to content

Revisiting material type classifications and presentation of material types #282

@akthom

Description

@akthom

We should revise the presentation of the facets to display the hierarchical relationships in the controlled vocabulary. I'll focus on Material type first.

The values in "Material type" were all mapped to a controlled vocabulary available here: https://isamples.org/models/generated/vocabularies/material_type.html.

The Material Type controlled vocabulary is hierarchical. The iSamples Explorer, though, is displaying the facets as being all at the same level. They should instead be presented as a nested hierarchy, and in alphabetical order (first alphabetical from the top level category, and then the second level values should be alphabetical within each subcategory).

Does the data only contain the lowest level category, or does it contain the full hierarchy? If you can't read the hierarchy from the files, can you look at the docs I linked above to understand the hierarchies and arrange the facet accordingly?

The facets for Object Type and Sampled feature should be presented the same way. Those vocabularies are available at https://isamples.org/models/generated/vocabularies/material_sample_object_type.html and https://isamples.org/models/generated/vocabularies/sampled_feature_type.html

✅ Fixed in production — closing with evidence

@akthom — your instinct was right on both counts, and both are now fixed on isamples.org:

1. The nonsense "Material" entry (build-side read bug). The old derived build took the first entry of each sample's p__has_material_category array — but source records (especially SESAR) carry the whole SKOS ancestry there, so the root concept (label literally "Material") leaked into the facet. The rebuilt pipeline (#274) selects the first non-root concept instead. Verified against the production facet file this morning:

SELECT * FROM read_parquet('https://data.isamples.org/isamples_202606_facet_summaries.parquet')
WHERE facet_value LIKE '%material/1.0/material'
-- → 0 rows  (the root "Material" entry is gone)

2. Inaccurate material values (e.g. pottery as "anthropogenic metal") were a data-vintage problem in the frozen export — fixed by overlaying Eric's current OC concept mappings (#275, tracked in #272): material corrected on 502K samples, object type on 954K. Spot-check: the #260 sample now reads "Other anthropogenic material" on prod.

Follow-on, deliberately kept open: the counting semantics question (one material per sample vs counting a sample under every material it carries) is tracked in #276 — input welcome there.

— posted by 🤖 rbotyee; claims verified by executed queries against production; RY reviewed

Originally posted by @rdhyee in #265

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions