Toward an SVG-Native Pikchr

(1) By sam atman (mnemnion) on 2022-08-22 17:57:14 [link] [source]

Hi! I've been looking for a way to generate railroad diagrams of Parsing Expression Grammars, "just like the ones SQLite has", so when it finally came to my attention that drh had implemented a drawing language based on Kernighan's earlier work, I knew my search was done.

As it happens, that and related work has lead me to spend a lot of time with the SVG specification, writing a tool (incomplete!) which will validate SVG based on the XSD schema.

I was a couple hundred lines into implementing LOGO at one point. Pikchr is better.

This convergence has lead me to some tentative proposals for extending the language, which I consider compatible with the purpose and scope of the Pikchr project. There are three changes, one of which is invisible to the syntax.

The goal is to provide a minimal and composable way to add semantics to the generated SVG, without compromising the core structure of Pikchr: a single-pass compiler of a simple language, which generates SVG without backtracking, and an absolute minimum of state.

I hope you will see these proposals as in keeping with the spirit of the program. I'll begin with the simplest.

Grouping Groups

I've made one change successfully, one which doesn't change the language, and that is grouping any labeled object with <g id="LABEL"></g>.

A sentence like Gizmo: [oval "A Gizmo" fit], which would generate this:

<path d="M17,32L56,32A15 15 0 0 0 71 17A15 15 0 0 0 56 2L17,2A15 15 0 0 0 2 17A15 15 0 0 0 17 32Z"  style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
<text x="36" y="17" text-anchor="middle" fill="rgb(0,0,0)" dominant-baseline="central">A Gizmo</text>

Now generates this:

<g id="Gizmo">
   <path d="M17,32L56,32A15 15 0 0 0 71 17A15 15 0 0 0 56 2L17,2A15 15 0 0 0 2 17A15 15 0 0 0 17 32Z"  style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
   <text x="36" y="17" text-anchor="middle" fill="rgb(0,0,0)" dominant-baseline="central">A Gizmo</text>
</g>

Without the indentation, which I added for clarity. The difference in --svg-only output is 20 bytes, 350 for the latest pikchr binary and 370 with the groups, we can work backward from the length and see that each group adds 15 bytes plus the length of the label. A small price to pay!

If we say a label averages 8 bytes, surely generous, it would take 146 of these groups before SQLite would have to allocate another page in response to the bloat.

For my purposes, this is an easy win, since I can now pick up the labeled objects with a selector and do whatever I want to them once pikchr returns the document. The difference in terms of what can be done with the output is considerable.

I've run this modified pikchr binary on the example documents, and the output is identical in appearance. I like that pikchr has basically no configuration, and would propose that this grouping should be added to the output automatically. It adds a small handful of bytes per label and changes the SVG from being opaque to being transparent, from the perspective of a script.

Since pikchr allows labels to be redefined, I want to draw attnetion to the fact that XML which has duplicate .id fields is valid. It's not something one would want to do on purpose, but it has no effect on rendering, a browser will print it just the same, and an XML validator is encouraged to complain but will pass it.

I have a Fossil repo with the changes, and asked in another thread about the development model here. I won't put any patches in this thread, lest it complicate matters, let me just say that I'm confident the original author could work backward from the output to the patch in less than an hour.

This brings me to a proposal for an actual change in the grammar. The complement of the id attribute is the class attribute, and one change to support classes would complete the Pikchr language.

Class Lists

Let me begin by saying that it was a great choice to reimplement PIC with modest improvements. It should be possible to learn a diagramming language, with modest effort, to the point where the documentation is not constantly open. Limiting the scope of the language is critical, and the right answer to "how about even bigger text?" is "no".

The thing is that SVG is absolutely vast, just enormous, and what Pikchr provides is a solid subset of those styles, one which is defensible as a reimplementation of PIC, but fairly arbitrary, looking at it in the other direction.

The general-purpose solution to this is a syntax for adding classes to generated elements. This opens up the output for the entire panoply of SVG wizardry, and it also seals the language against requests for feature enhancements, because there's no further need.

There are two things I can do with a class attribute very easily: add attributes to it, directly or with CSS, or find it and add any elements I want as children. That's basically all XML has to offer; one can imagine extending pikchr's ability to do layout, but the question of style is finished. It composes beautifully with id labels on groups, as SVG is designed to do.

This calls for an addition to the grammar, and I've figured out lemon and pikchr.y well enough to start to see how. I work on parsing engines as part of my job, so making all the necessary changes is plausibly something I can accomplish, but modifying this program is daunting enough that it would pay to get the syntax firm before trying it.

My goal is to add exactly one mechanism to Pikchr, one which opens it to all the rest of SVG's capabilities, without requiring the whole universe to be supported by the code generator. That would violate the simple and easy-to-learn maxims in the mission statement.

The proposed extension is class lists. We change the test sentence to read Gizmo: %sprocket %gear [oval "A Gizmo" fit], expecting:

<g id="Gizmo" class="sprocket" class="gear">
   <path d="M17,32L56,32A15 15 0 0 0 71 17A15 15 0 0 0 56 2L17,2A15 15 0 0 0 2 17A15 15 0 0 0 17 32Z"  style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
   <text x="36" y="17" text-anchor="middle" fill="rgb(0,0,0)" dominant-baseline="central">A Gizmo</text>
</g>

Extending further: Gizmo: %sprocket %gear [%fancyOval oval "A Gizmo" fit] gives

<g id="Gizmo" class="sprocket" class="gear">
   <path class="fancyOval" d="M17,32L56,32A15 15 0 0 0 71 17A15 15 0 0 0 56 2L17,2A15 15 0 0 0 2 17A15 15 0 0 0 17 32Z"  style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
   <text x="36" y="17" text-anchor="middle" fill="rgb(0,0,0)" dominant-baseline="central">A Gizmo</text>
</g>

Which is exactly enough SVG to be able to do anything else. Want to apply a gradient, do a transform, update colors semantically at runtime, respond to clicks? No problem. We have classes, we can do this.

The only subtlety here is in classes being assigned to the group between a label and an object, and to the object in an unlabeled context. I even have two or three commented-out lemon rules which might do the job, with the requisite modifications to the parser and a custom pik_add_class function.

Classes very frequently use a hyphen, and it would be nice to support that, but supporting only a subset of valid class names is more than sufficient.

I do have one remaining suggestion, which would do a lot for reuse. Nope, it's not modules or includes or namespaces, pikchr doesn't need to know about any of that stuff.

Rather, it is the SVG-native alternative to macros: symbol and use.

Of Symbols and Macros

Richard, I bet you had a rueful chuckle when someone showed you the macrobomb at the heart of PIC. It's just a very early-Unix problem to have: combinatoric expansion bombs were a feature, not a bug.

And I don't have to tell you that, for all the footguns and gotchas, there's a "you'll get used to it" to the Unix school of inlining-rewrite-macros, it has a power and simplicity which punch way above weight for minimal implementation complexity.

But particularly in a stateful imperative language like this, it gives a lot of the power of conditions and loops without the danger and complexity of providing them. Imposing a compile-time limit on tokens is a pragmatic guard against blowup.

There is an identifiable application, however, where macros do too much, and not enough. This is the case where the user wants to make something which behaves just like file or another object class. This can be done with macros, clearly, and a few paragraphs in the user manual about the trick of making, say, a five-pointed star behave just like a box, would help.

The SVG native way to handle this kind of reuse is with defs, symbol, and use. So we can use a similar syntax to define for symbol, which might look like symbol @button { oval } for a trivial example. Then we can have sentences like ButtonAccept: @button "Accept" fit.

I used one of the user prefixes as a reminder that this would need to be accounted for in the SVG output, since $ and @ aren't legal in identifiers.

Instead of inlining, each symbol is generated in-place with <defs><symbol id="button">, and subsequent references are looked up like any other variable, but a T_SYMBOL is returned, generating something like <use x=:x, y=:y, xlink:href="#button">. The 'prepared statement' syntax is just a gesture at the fact that the relevant attributes can all be filled in here, the inheritance rules for use show the effect of this combination.

It's better form to have one set of defs at the top of an SVG element, which are used in the rest of the body, but it's perfectly legal to use one defs wrapper per symbol or anything else. As is extremely common with XML, the behavior when child elements are duplicated or arrive out-of-order is underdefined and will vary by platform, but in my opinion pikchr shouldn't pay enough attention to the stream to protect the user from making dodgy SVG. It will already do bad things if someone redefines a macro, it can afford to do less drastically bad things with a redefined symbol. Post-processing to give the SVG a cleaner layout isn't necessary at all, but becomes possible, because the defs use pattern preserves the semantics of a separate definition and use, which macros cannot do.

The important part is that symbols and their references can be generated in-place without lookahead. Given the construction of a use element, the variable list would only have to return a value of "this is a defined symbol", since the stream just has to construct "#button" and already has the button part.

This will interact smoothly with define, because the symbol is generated in place and never again, so the macro executes when it's encountered.

This might constitute a breaking change for any script which has an existing symbol variable, but it shouldn't, right? At most the assignment would shadow the keyword.

Conclusion

My goodness, that's a lot of words for someone who has been using this program for three days!

I mentioned at the start that I went so far as to start a parser for a LOGO, just to generate this kind of diagram. I knew about Metapost and TikM, I've written some Asymptote if anyone knows what that is, I've generated acceptable graphviz a few dozen times and occasionally I've pounded out a frustrating and ugly bit of PlantUML.

I think taking these three steps closer to SVG will unlock the greater potential of the Pikchr project. It accomplishes this with one new token category, an additional keyword, and an invisible change to the output. The changes are backward-compatible, providing only the fundamental tools for post-processing an SVG in detail: ids, classes, and defined symbols.

There's some trepidation in showing up out of nowhere with a major proposal like this. I think I could add the call lists, the symbol rule exceeds my grasp of the layout of the program. It's valid SVG to embed a symbol in a symbol, so that doesn't have to be excluded, but the syntax wouldn't support the macro variables, so this is serveral changes to (a copy of) one of the more complex parts of the source code.

Thanks for reading to the end, and thanks for another great piece of software.

(2) By Stephan Beal (stephan) on 2022-08-22 18:14:18 in reply to 1 [link] [source]

FWIW, i don't have any informed thoughts on most of what you've said, but this bit stuck out at me...

id="Gizmo"

Keeping in mind that IDs must be unique per HTML page. It would be onerous to require that the pikchr author keep track of all symbol names for all pikchrs in a document to avoid collisions. Perhaps either using class="pikchr-Gizmo" or id="${idprefix}Gizmo" would be more appropriate, where idprefix is a new var the user could set on a per-pikchr basis, defaulting to an empty string or "pikchr-" or some such.

(3.1) By Warren Young (wyoung) on 2022-08-22 18:22:05 edited from 3.0 in reply to 2 [link] [source]

The ID could be based on the object hierarchy and the nearest preceding label:

B1: box "Foo"
    circle "Bar"

The latter could be ID "pikchr-circle-B1". If you wanted it to be "…-C1", you'd add another label.

If there's a conflict, add a disambiguation character. If you add "circle "Qux"" to the above pikchr, the ID might be "pikchr-circle-B1a" or something.

(5) By sam atman (mnemnion) on 2022-08-22 22:45:37 in reply to 3.1 [link] [source]

I hadn't considered putting IDs on anything without a label, but that would be a good way to generate them. I'm still puzzling out the program flow, but the engine would kind of have to be able to look back like that to resolve references in the first place, right?

I'm also not at all opposed to a cheap mechanism to guarantee the IDs are unique within the SVG, if one can be devised. In a sibling comment I was emphasizing that duplicate .id fields are valid, that doesn't make them a good idea.

(4) By sam atman (mnemnion) on 2022-08-22 22:41:42 in reply to 2 [link] [source]

I linked to the relevant portion of the XML specification, but just as the single word 'valid', so that was easy to miss.

Here's the bare link https://www.w3.org/TR/xml-id/#processing, the relevant part is the use of the word MUST for assuring the well-formedness of the ID string, and SHOULD for the uniqueness of .id fields.

So much for the specification: on paper, duplicate ids are valid, just discouraged. In practice, browsers will render them without a hint of complaint, and trouble sets in if a script tries to interact with something on the basis of .id.

That trouble can be avoided by not duplicating labels when the contents of the id-field is important. Someone generating a fragment of an HTML document has the job of keeping id fields unique, no matter what's generating the id field.

I did consider getting namespacing involved but decided against it, both in the explanation and the code. It rarely proves its worth in practice, especially in the context of HTML rather than XML.

Any code which tried to keep track of previous label references would work against the simplicity of shadowing, and something like that would be needed to make the SVG ids unique, something the user can do themselves.

Similarly, if a prefix needs to be applied to the SVG ids, or it's expedient to alter them in some other way, post-processing will do the job handily.

(6) By Stephan Beal (stephan) on 2022-08-22 23:14:44 in reply to 4 [link] [source]

So much for the specification: on paper, duplicate ids are valid, just discouraged. In practice, browsers will render them without a hint of complaint, and trouble sets in if a script tries to interact with something on the basis of .id.

Not just script code, but CSS selectors. My assumption is that CSS is the main motivation behind adding an id/class to the SVG at all? Scripting of these seems like a far fringe use case, but styling is much more useful.

(7) By sam atman (mnemnion) on 2022-08-23 11:59:08 in reply to 6 [link] [source]

The approach I settled on is informed by a few factors:

It's valid to duplicate IDs the way that Pikchr duplicates labels. What that means is that the simplest algorithm (always add a label, verbatim, as .id) will produce valid SVG under all circumstances. That's important because it means that objects can be grouped and labeled automatically, no configuration or tracking is needed: the result will be valid SVG, which is the most important invariant to preserve.
Doing anything useful with the ids requires them to be unique in the SVG, but unique labels are something which is under the user's control. This patch empowers the creation of unique IDs but doesn't ensure it. There are ways to force that to be true in a given Pikchr output, but they complicate the code generation. On an object level, SQLite clearly has no (immediate!) need for CSS selectors on the generated diagrams, so it's important that any use of duplicate labels not affect the SQLite project in any other way than adding a few bytes of markup.
There's no mechanism available to assure the uniqueness of IDs within an HTML document which has multiple subcomponent SVGs, and that's a requirement as well. Prefixes and namespaces are basically the same thing here, they mean two IDs between SVGs won't class unless the prefix or namespace match, so that's just pushing the problem around. However, post-processing an SVG with unique IDs, so that the IDs are unique within a document, is easy to accomplish in whatever automated way fits with the rest of the page framework/schema/whatever you'd like to call it.
During such post-processing, it is also very easy for me to find duplicate group IDs and make them unique. This is a one-liner, in fact, in many languages. What I can't do, with pikchr as-is, is anything involving treating a sentence like ThreeOvals: [oval; move; oval; move; oval;] as a single unit, because it's rendered as five opaque SVG elements. Finding fifty id="ThreeOvals" and making them unique is easy, however.

That said, it can be done, and it might be the right choice. Label definition could look up the label to see if it's been defined previously, and keep some kind of counter on the next label so that generating group ids ends up with Foo.... Foo_01.... Foo_FF, but I don't know that the benefit would be compelling, and it does come with the additional logic of checking to make sure a fresh label isn't identical to a generated one which already exists: that's a lot of extra work for a benefit the use may assure for themselves.

(8) By sam atman (mnemnion) on 2022-08-30 15:38:45 in reply to 1 [source]

In the interest of not flooding the forum with new posts on basically the same topic, I'm replying here, to better explain my goals, and gesture in the direction of what I'll be doing next.

Please don't read any urgency into the pace of my posts. I'm getting paid to do this, and have an application which needs some changes to Pikchr to make that possible. I will release all changes under MIT 0 when the time is right.

I'll admit that it almost feels rude to barrel forward with large changes to someone else's project. I go where the spirit takes me, and I have a self-imposed deadline which this project has grown to involve. This imposes no obligation on anyone but myself, but perhaps explains what could be seen as haste on my own part.

The Pikchr project has a clear mission, namely:

Pikchr generates diagrams for technical documentation written in Markdown or similar markup languages using an enduring language that is easy for humans to read and maintain using a generic text editor.

Even better, some statements about what it isn't:

Pikchr is not intended for marketing graphics. Pikchr strives to present information in a dry and mathematical style. The objective of Pikchr is to convey truth, not feeling.

Pikchr is not intended for generating charts and graphs. It could perhaps be used for this. One might propose extensions to make it more suitable for this. But that is not its current purpose.

What I'm working on is exactly the generation of diagrams for technical documentation in a language not unlike Markdown.

The changes I'm making are all responses to blockers I hit fairly quickly in working on Pikchr diagrams, in the context of what I intend to do with them next. The appearance of the output, and the experience of writing it, is a real delight after the terrible things I've done to myself in the salt mines of GraphViz. But the structure of the output was a non-starter, there was no way to take the next step.

In a way the class syntax is a distraction: it turns out to be necessary (I think) but I'm going to end up with major changes to the SVG output for identical inputs. All of which, let me stress, will not change a single pixel of the image.

The reason I have to do this to move forward with my project is simple. Pikchr generates an image in SVG format, and what I need is an SVG of the pik document. Well, what's the difference?

Pikchr has labels; the SVG doesn't have those labels, but it does have an id field which I can put labels on. Pikchr has groups, but they aren't in the output. Pikchr has built in classes (they're even called classes!), but those classes aren't in the output, something I added yesterday afternoon to the ~~Pikchr branch.

Here's a small, and non-exhaustive, list of what technical writers can do with a semantic SVG:

Accessibility, using aria classes to assist screen readers
Tooltips
URLs
State machines which change state when you interact with them
Swimlanes which can be run with Javascript to illustrate the flow directly
Collapsing and showing parts of a diagram: such as a complex object which is represented as a single box, which blows up to show all the parts of the object
Theming, including a dark mode which is based on prefers-color-scheme
Providing a set of grayscales so that a color diagram can be printed without colors of similar brightness looking the same
Providing accessibility palettes for users with one of the various sorts of colorblindness. It is often the case that the good colors for limited vision are completely different from a good default for the trichomatic.
Changing components on hover, including ones which aren't hovered over: as a toy example, consider a chess board where mousing over any piece highlights all the squares it can move to. Or a circuit diagram where you can click to change the jumpers and it shows the active circuit when you mouse over any component which is in that circuit.
Or a syntax diagram, where when you mouse over a component, it lights up all the valid subsequent rules, while coloring them as mandatory, optional, or one of several mandatory choices.

None of these things are practical with Pikchr's current output, because there's no good way to find your components again after changing the diagram. All of this depends on being able to hit the target with a selector, and it's impractical to post-process the SVG to add the semantics back, such a tool would literally need a pikchr parser and decent knowledge of the output to find the pieces which ~~Pikchr provides during rendering, when it has access to it.

Speaking of hitting the target with the selector: inlining styles works against inheritance in several ways, which brings me to the next phase of what I'm doing, which I expect will take awhile.

CSS Custom Properties

The next thing I'm aiming to do is to take a simple diagram like this:

  oval; move; file; arrow;
cylinder fill blue; move; box "textual element" fit;

And remove all styling from the elements, moving them to a CSS block with variables aka custom properties.

This is a hand-modified SVG, derived from the ~~Pikchr output of the above program, which illustrates what I'm trying to accomplish.

<svg xmlns='http://www.w3.org/2000/svg' class="pikchr" viewBox="0 0 687.283 112.32">
<style type="text/css"><![CDATA[
  svg {
   --p-stroke: black;
   --p-fill-bg: transparent;
   --p-fill-fg: black;
   --p-blue : blue;
   stroke: var(--p-stroke, black);
   fill: var(--p-fill-bg, transparent);
   stroke-width:  var(--p-strokewid, 2.16);
  }
  polygon {
    fill:  var(--p-fill-fg, black);
  }
  text {
     pointer-events: none;
     --p-t-stroke: transparent;
     fill: var(--p-fill-fg, black);
     stroke: var(--p-t-stroke, transparent);
     text-anchor: var(--t-anchor, middle);
     dominant-baseline:  var(--t-baseline, central);
  }
]]></style>
<path class="oval" d="M38,92L110,92A36 36 0 0 0 146 56A36 36 0 0 0 110 20L38,20A36 36 0 0 0 2 56A36 36 0 0 0 38 92Z" fill="none"/>
<g class="file" >
  <path class="file" d="M218,110L290,110L290,23L268,2L218,2Z"/>
  <path class="file" d="M268,2L268,23L290,23"/>
</g>
<g class="arrow" >
  <path class="arrow" d="M290,56L362,56" fill="none"/>
  <polygon class="arrow" points="362,56 350,60 350,51"/>
</g>
<path class="cylinder" d="M362,30L362,81A54 10 0 0 0 470 81L470,30A54 10 0 0 0 362 30A54 10 0 0 0 470 30" style="fill:var(--p-blue);"/>
<path class="box" d="M542,71L685,71L685,41L542,41Z"/>
<text class="box" x="613" y="56">textual element</text>
</svg>

I haven't added indenting to ~~Pikchr, don't worry, it's for legibility. It's what the DOM sees. I haven't grouped elements with appended text because I'm not entirely sure I should.

What I've done here is move all the defaults to CSS, and name the exception, not the rule.

Let's compare the results with the current output of Pikchr.

<svg xmlns='http://www.w3.org/2000/svg' class="pikchr" viewBox="0 0 687.283 112.32">
<path d="M38,92L110,92A36 36 0 0 0 146 56A36 36 0 0 0 110 20L38,20A36 36 0 0 0 2 56A36 36 0 0 0 38 92Z"  style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
<path d="M218,110L290,110L290,23L268,2L218,2Z"  style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
<path d="M268,2L268,23L290,23"  style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
<polygon points="362,56 350,60 350,51" style="fill:rgb(0,0,0)"/>
<path d="M290,56L356,56"  style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
<path d="M362,30L362,81A54 10 0 0 0 470 81L470,30A54 10 0 0 0 362 30A54 10 0 0 0 470 30"  style="fill:rgb(0,0,255);stroke-width:2.16;stroke:rgb(0,0,0);" />
<path d="M542,71L685,71L685,41L542,41Z"  style="fill:none;stroke-width:2.16;stroke:rgb(0,0,0);" />
<text x="613" y="56" text-anchor="middle" fill="rgb(0,0,0)" dominant-baseline="central">textual element</text>
</svg>

What this does is take the decisions made by the user in writing the diagram, and transfers them to SVG, which is an accomodating format for that.

I take it as axiomatic that Pikchr must remain single-pass, with an absolute minimum of superlinear behavior of any sort, or retained state.

I believe that's possible here, without changing anything we don't have to. My strategy would be to add a sentinel on builtins which is set when they're looked at, that's how we know we'll need a --p-blue variable later. It inverts some assumptions in the code, which will need to be handled carefully. It will add only constant factors to existing complexity, and modest ones at that, or it isn't worth doing.

I bet this will shave 10% off the total file size for all the SQLite SVGs, maybe 20%. While this isn't a goal, it's illustrative of the DRY nature of the output. The example is longer, but most diagrams are not that simple, and you can see how this format catches up as things get longer.

The outcome is an SVG which is compliant with the structure CSS needs for selectors to do expected things. It also means that blue can be treated as an idea, rather than just a shade: the blue in question can be changed from CSS blue to whatever we might want. That has real accessibility consequences, among the other solid reasons for doing it.

It isn't a small change. Builtins must keep track of whether they've been seen, objects have to have a pointer to where they got certain data, rather than just the value of it. There's a judgement call in terms of how far to go, not unlike how normal a database should be. There's a fairly bright line between the structural and presentation attributes of an SVG element, which will be my guide here.

If you woke up tomorrow, and all your SVGs looked like Exhibit A instead of Exhibit B, would you be unhappy? There's at least one good reason you might be: every modern browser supports this syntax, but IE doesn't, and while it's straightforward to downgrade an SVG in any number of ways, that is extra work. But I'm reasonably confident in saying that whatever we're using to look at things in 50 years will render it identically to what I'm using today.

I acknowledge that for SQLite, it might be a contractual requirement that IE 6 display the documentation correctly, or you might want the documentation to be compatible with a whole long-tail of browsers which won't support 2017 additions to CSS. That adds to the maintenance burden of the documentation and is a good reason to be wary. It's practical to solve this problem in a couple ways, but it would have to be solved.

Most of what I'm doing is below the tip of the iceberg. The visible change to the syntax is one way of providing classes, which are a necessary primitive for ~~Pikchr to do the things I'm aiming to do with it. Just the invisible changes would make it possible to generate or write pik files which could have the SVG trawled over to markup with classes, but it wouldn't be fun to write or maintain.

I'm keenly interested in any feedback on what I've posted. While I will end up with a version of Pikchr that does what I need it to, I would like to do so in harmony with the vision which has guided it so far. I apologize if anything I've said about Pikchr in its current state comes across as critical, as this is far from my view or intention. Thanks for reading.

(9) By Alhadis on 2022-08-30 16:17:00 in reply to 8 [link] [source]

What I've done here is move all the defaults to CSS, and name the exception, not the rule.

The SVG code you've posted contains presentational attributes (e.g., fill="none"). These have lower priority than CSS, so it's better to avoid mixing them when possible.

Also, I'm surprised that Pikchr isn't using a <rect/> primitive for oval shapes. Rounded corners can be applied using the rx and/or ry attributes:

<rect class="oval" width="152" height="80" rx="36" />

Providing a set of greyscales so that a colour diagram can be printed without colours of similar brightness looking the same

This would also be useful for devices that lack colour support (targeted using @media not (color)).

Pikchr is not intended for generating charts and graphs. […] One might propose extensions to make it more suitable for this. But that is not its current purpose.

In lieu of inventing an entirely new syntax for charts and graphs, might I suggest a grap(1)-compatible format? I understand that compatibility with GNU pic(1) isn't a priority for Pikchr, but it will eliminate friction for users who already know grap(1) (as well as vice versa).

(10) By sam atman (mnemnion) on 2022-08-31 11:31:00 in reply to 9 [link] [source]

The SVG code you've posted contains presentational attributes (e.g., fill="none"). These have lower priority than CSS, so it's better to avoid mixing them when possible. Any deviation from the defaults is intended to be in a style attribute with or without a var.

It does, that was an accident. The intended split leaves all structural traits on the element and moves presentation attributes to variables. If you scan both SVGs, you'll see I clipped most of them, but that was made by hand, I didn't even get sed involved.

I have little opinion on why rect isn't used for boxes or ovals. Given the sort of sweeping changes I've been making under the hood, "don't change a single pixel of what the browser renders for existing scripts" is a hard requirement.

(11) By Alhadis on 2022-08-31 11:38:47 in reply to 10 [link] [source]

"don't change a single pixel of what the browser renders for existing scripts" is a hard requirement.

That's wise. I only mentioned the oval/rect thing as an aside. It probably merits a separate discussion.

(12) By sam atman (mnemnion) on 2022-08-31 14:42:41 in reply to 8 [link] [source]

When preparing for a complex change to a codebase, I write out notes such as the following, which I'm sharing because it seems broadly sensible to put this in the open. No feedback is expected though any such would be received gratefully.

Colors and Variables

Here's a stripped down PObj with the fields of interest.

/* A single graphics object */
struct PObj {
  const PClass *type;      /* Object type or class */

  PXmlClass *pXmlClass;    /* Optional list of assigned XML classes */
  char *zName;             /* Name assigned to this statement */

  PNum sw;                 /* "thickness" property. (Mnemonic: "stroke width")*/
  PNum dotted;             /* "dotted" property.   <=0.0 for off */
  PNum dashed;             /* "dashed" property.   <=0.0 for off */
  PNum fill;               /* "fill" property.  Negative for off */
  PNum color;              /* "color" property */

  unsigned char nTxt;      /* Number of text values */
  unsigned mProp;          /* Masks of properties set so far */
  unsigned mCalc;          /* Values computed from other constraints */
  PToken aTxt[5];          /* Text with .eCode holding TP flags */

};

I don't understand any of the text value code well enough to comment at the moment, so let's look at the PNums in the middle.

When is a Variable Variable

I've been reading through the code, looking for minimal changes which could allow for the effect I'm pursuing. To be able to write the output in linear time, we need the variables to know if they've been seen, so that the CSS block can be complete.

All user variables have been seen, as far as I'm concerned: something defined and not referenced may as well be in the SVG, since it's in the pik, if that's an error it isn't a rendering error and shouldn't be detected then. I haven't figured out how the non-color builtins work in enough detail to be sure they're looked before rendering, so that we can follow the same strategy; it might be that some are taken for granted, and we would then do likewise.

If we reduce redundancy at the cost of the occasional extraneous variable, and it's more complex to do otherwise, that's probably worth it. I'm excluding cavalier approaches like putting all the colors in every SVG, one to maybe four variables which might not get used is a limit, zero is ideal.

Either way, all builtins have a flag now which says they've been seen. I just said user variables are seen already, but there's a use for a seen flag there as well: some are set several times.

Mutants

Pikchr mutates variables, and it doesn't have to, but I don't intend to change that. It would change the big O of Pikchr, because macros can generate many more redefinitions than they can distinct tokens (I think?), and keeping the linked list as long as there are names seems wise.

Furthermore, all we could do with an immutable plist is generate distinct CSS variables for each of them, and this is of dubious utility: we give the user more than they asked for. It's not so bad on a big O level because we have to find whatever we're making a new one of, so that if it's in eg += context the new value can be calculated, while we're there we could generate a unique label. An idea worth considering, and probably not worth doing.

If they've changed, we have to use the value we saw at the time, not a later mutation, to keep existing semantics.

So let's consider that a seen flag on a variable means it's been both defined and mutated, and that we probably need to know which at some point.

Returning PVar, not PNum

Here's PVar, with a seen flag.

struct PVar {
  const char *zName;       /* Name of the variable */
  PNum val;                /* Value of the variable */
  int seen;                /* Whether the variable has been seen aka mutated */
  PVar *pNext;             /* Next variable in a list of them all */
};

The other side of marking variables, builtin or otherwise, is returning the whole variable, not the value of it. That's an extensive change, but a consistent one. The plan is to then copy the value as soon as it's seen, so that we can compare them and choose how to render the element. The value gets the original name, the variable gets a new one, so there should be no cascading consequences on the calculation.

I think this also involves making builtins proper instances of PVar with a null pNext field. There's a part of me that feels they should point to the next element in the array, despite the lack of any compelling reason for that.

There are properties on the PObj where the variable per se isn't interesting, this suggests two approaches: two functions for retrieval, one of which merely unwraps the PVar to its PNum, or changing all the call sites. The first one is much easier and can be refactored into the latter.

Move all of pik_value into this subroutine:

static PVar *pik_variable(Pik *p, const char *z, int n, int *pMiss);

With a separate pik_get_val for PNum retrieval. This also gets us some flexibility in deciding which variables become variables and which we sink, which I hope we won't need but which we might. We have the necessary information to see if a value has been looked at in a variable-preserving context, and make a note of that fact.

If we got a value from a given variable, and that variable still has the same value, we'll get correct results by referring to it, since the CSS generator doesn't have access to intermediate values.

If we've kept track of which variables have been modified, then we have the option of doing the simplest thing, which is to treat it like any other variable: the result is that anything which references the variable after the last redefinition gets a variable, and everything else gets inlined.

It would call for extra complexity to inline every variable which has been both defined and mutated, but the output would be more consistent. It's a judgement call, but one where the more consistent approach could be added if the simpler one is inadequate.