Version 1.5 changes
Our feeds system has been updated as a result of the CaltechAUTHORS
repository migrating from EPrints to Invenio RDM. As a result some
things on feeds.library.caltech.edu had to be changed. This is a list of
intended changes.
- Record ids changed between systems for CaltechAUTHORS so if you look
at those links they look different
- In the recent directory there is no recent 25 for CaltechTHESIS
content, just doesn’t make sense we graduate people in “class of”
groupings, which 25 people should be listed? That has gone away as a
result
- The People directory only lists Caltech People who have publications
in CaltechAUTHORS
- With the migration from one system to another many Caltech GROUPs
have revised names and identifiers, e.g.
/groups/TCCON
became /groups/Total-Carbon-Column-Observing-Network
- Minor HTML markup changes to make the feeds site more accessible
(e.g. A to Z list in groups now uses a “menu” element instead of a
paragraph with pipe delimiters)
- There are a few additional JSON documents included in the htdocs
tree that are used to render content in Markdown, HTML and HTML include
formats
- Dataset collections are no longer being published and the
*.keys
file are no longer generated
- Some legacy JSON documents have been preserved when possible but may
go away in a future release
- Pandoc is used exclusively to render Markdown, HTML, HTML Includes,
BibTeX and RSS files from JSON files rendered from the repositories and
collections
- Pandoc templates can be found in the “templates” directory. Their
file extensions correspond to the format they are intended to render
- The generated Markdown is used to render both HTML, HTML
Includes
- HTML Include is generated directly by Pandoc without a template
- BibTeX and RSS required their own templates
- The
recent
directories and their content under
individual groups and people are no longer being generated
- The software to generate the feeds website has been completely
rewritten. There is invariably changes I have failed to catalog.
- The Caltech Groups list is based on a group having records in either
CaltechAUTHORS, CaltechTHESIS or CaltechDATA and being identified within
the metadata as a Caltech Group
- The Caltech People list is based on those individuals that are
related to Caltech and have records in the CaltechAUTHORS
repository
New Feature
- Pagefind provides searching of
feed’s HTML pages
Organizational
data flow changes for website content
- Everything in the “htdocs” tree is generated, this means that
directory can be safely removed and recreated as needed
- Static files that need to be included in the “htdocs” tree can be
found in the “static” directory (they are just copied into when
needed)
- Generation order in “htdocs” tree is as follows
- static content is rendered into place
- JSON documents
- CSV documents
- Markdown (which is no longer linked in the feed pages)
- HTML/HTML Include
- BibTeX
- RSS
- PageFind indexing is done after tree is populated
System requirements
Feeds v1.5 requires the following software to be built
- irdmtools >= 0.0.59 (use the latest release)
- dataset >= 2.1.6 (use the latest release)
- py_dataset > 2.1.2 (use the latest release)
- pylint
- progressbar2
- PyYAML
- pybtex >= 0.24
- feedgen >= 0.9
- datatools >= 1.2.5 (use the latest release)
- Bash >= 3 (or equivalent POSIX shell)
- Pandoc >= 3
- Postgres >= 12 (prefer 16)
- Python >= 3.10
- PageFind >= v1.0.3
Bash scripts orchestrate most of the processing. Python is used to
transformed the legacy data shapes into needed forms and to generate
Markdown content via Pandoc.