SRIML: A Document Tagging Language

Posted Monday, March 14, 2022 by Sri.Tagged PROTOTYPE
EDIT STATUS:new

The SRIML Document Tagging Language is based on my desire to convert blocks of text into referenceable data in a human-readable and pretty text format. This is where I deviate from other efforts in the area.

Concept: Federation and Block Addressing

I was exposed to Matrix recently, and I like the idea of using a federated protocol to address content on various home servers. I think the idea of a data garden as its own address is neat. But how to address the individual block elements in a garden, particularly since they can move around freely?

Concept: Anonymous Persistent Block Addresses

It would be nice to have a way of identifying the "life of a block of text" based on its signature. Git is able to do this somehow with its files, so that would be worth studying. Git appears to use lstat and compare the current 'working set'

Concept: Permalink

Every page needs an "x-permalink", which is the canonical link outside of a caterogy directory. I'm imagining there is a directory called x that contains every html article by its base filename, so this doesn't change for seo reasons.

Concept: Cluster Tagging

Instead of using section end/marker tags, maybe I could use a simpler preprocessor that just looks for @@ markers and does a few things according to some rules:

implicit cluster text

Blah blah blah @@text of any length@@ blah blah

The preprocessor trims the text between @@ and creates a 'cluster tag'. We want to save the paragraph text markers to a cluster tag page. Since it's short, it could be considered as a tag that is mapped to the x-permalink. We'll call these cluster tags

explicit cluster text

@@ short tag [, short tag]
text lines
text lines
@@ --

This creates a named hash of a chunk of text. If there is an existing short tag with that name, this page's permalink is added to it, and it's also marked a cluster text.

processing clusters

Each clusterTag produces a `clusterText and a clusterTextHash. The hash is used to find matching text across

Concept: area tagging

I like the idea, but I don't like the syntax. Visually, the document markup should resemble something that strongly delineates document blocks within the same file, and let markdown be markdown inside.

@@@ mark: "marker-slug"
    title: "This is my Section Title"
    subtitle: "This is my section subtitle"

## This is my article title section

Here I am typing stuff on my normal way, using Markdown Extended with the
trimmings

@@@ end

The algorithm would gather all the text lines betweeen tag markers, trim them, and produce two pieces of output:

  1. an start,end indexkey of the original file pointing to the props
  2. the captured text
  3. a hash for the captured text

Implemented as a shortcode, this is written to another file outside of the source directory, and the captured text is returned for regular eleventy processing. Some kind of data format, so it can be turned into a network graph-like display.

A separate eleventy config writes processes the saved files into a tag glossary

Concept: Term Tagging

I think I can extend Markdown-It to recognize %% as a symbol tagger. This could automatically generate an inline entry like %%Sri of the Future%%. Or maybe @@Sri of the Future@@ looks better. Hard to say. [[Sri of the Future]] also is a possibility.

A term tag would auto-create a reference to the original chunk and an entry in some kind of term table. It would also be nice if there was markup that could further scope the reference:

example [[sri of the future]] use in text
--- bottom matter ---
[[sri of the future]] scope:@personal_dev, [[ghdr]]

This could create flat tags and hierarchy