MaRDI TA5 Milestones Meeting 23.02.2024 @ ZIB
Goals of the meeting
We have an idea about how to reach the official milestones
Everybody is aware of the personal milestones
Some (all) technical points are discussed / solved
We have a plan about how to have better documentation
Agenda
Welcome
Mission clarification
What IS our mission in TA5? Connect papers/software AND data-sets?
--> Make it easy to access and find the data produced by MaRDI TAs 1-4
Milestone Planning
What are our 2024 goals for MaRDI?
--> Bring in content from TAs 1-4
What are the official 2024 milestones?
Who is doing what? (-> Personal Milestone Planning) PART1 - Presentations
Open (technical) topics (see below)
Documentation
How to improve internal documentation?
--> If possible, update the upstream documentation (e.g. MediaWiki) - and link it from our Wiki
--> Use Rim as a test-person to check whether all needed information is documented on our Wiki
How to improve documentation for external?
--> Collect technical questions from other TAs and create documentation about it
--> Start with a FAQ-like document (potentially link to more complex documentations from there)
Outreach (to other SFBs, Math+, Libraries, ...)
--> Connect better with: Math+, LifeDocs (Christoph Lehrenfeld), TU Darmstadt Library (Jens Freund)
(Technical) Topics to discuss
How to define items? --> Create a property ("mardi-profile") for each item that can be used to identify an item's type (software, formula, publication, ...)
How to define profile types?
Formulae
Papers
Current way of selecting papers in SPARQL queries by "has zbMath ID"? --> solved through the new "mardi-profile" property
How to link from a paper, as in "cites software"? / "uses dataset"?
How to link to a paper, as in "This data-set / software was used in this paper" (Now: in software-item we use "is described in" and in )
Datasets
Which properties to use?
--> Larissa made a first draft; compatibility should be checked with Zenodo items; then implement it
How to link to a paper, as in "was used in paper"? (Is this necessary?)
--> as before for software
Software items (How can we query all of them - "instance of X" - what is X?)
"instance of software" is violating the WikiData hierarchy? (Software is quite high-level)
--> Solved by using the new "mardi-profile" property
arXiv Importer
What is the plan?
--> Use zbMath data about arXiv paper meta-data (blocker: API is not yet giving out that information)
Import of formulae (can we use an LLM to describe a particular formula? parameters etc.?)
--> Do this on a small sub-set of arXiv papers to showcase the idea
Import of paper-meta-data? (->Disambiguation)
Next steps?
--> Take 2..10 arXiv papers, extract formulas, add to MaRDI KG, try Moritz's formula search service
--> Discuss results and see whether this is useful at all
LLMs for MaRDI portal
What is the overall plan?
What is the status?
Chat-Bot (LLM to query the portal)
How to integrate more of the cool Scholia stuff? (Simple example: number of citations of a paper, see e.g. https://scholia.portal.mardi4nfdi.de/work/Q25938997 )
--> Define what "cool" Scholia stuff is
--> For the citation example: Use available services such as OpenCitations.net to get needed meta-data
Zenodo importer (for Math+ integration)
What is the plan?
Set-up workflow to harvest the Math+ Zenodo Community items
Workflows for periodic updates (for any source we have)
--> Use the Zenodo example as demontrator
zbMath MSC Keyword import? (We only have the IDs)
--> Put the ID<->Keyword relatins in SQL database to avoid license issues
Wikidata graph split
--> If this happens, Scholia might become disfunctional on many of the queries
Licensing
Put a "general" this is our licensing strategy page on our Wiki
Author disambiguation
OKMaps
environmental footprint