Funky warning, array subscript has type ‘char’, explained and ignorable
(1.1) By Larry Brasfield (larrybr) on 2021-08-29 01:32:12 edited from 1.0 [link] [source]
A couple of casts cure that strange, "array subscript has type ‘char’" warning from a pikchr compilation that Mr. Rouillard worried over. See https://fossil-scm.org/forum/forumpost/493bf000cf98fc5b?t=h
The isspace() implementation used by the OP is clearly broken. Per the C standard, the argument type is int which should have been promoted to from char. And obviously, character classification should accept char input. For these reasons, I do not see curing the compiler warning as warranted.
Edit to 1.1: This is the same issue raised 10 months ago in this forum. Unresolved then and could be resolved now as "Will not fix", IMHO.
(2.1) By Scott Robison (casaderobison) on 2021-08-29 02:13:50 edited from 2.0 in reply to 1.1 [source]
Looking at the definition of isspace at cppreference.com (which documents the C version as well): https://en.cppreference.com/w/c/string/byte/isspace
It reads "The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF."
So I believe the arguments to isspace must be cast from "signed char" to "unsigned char" to have defined behavior. A negative signed character value might work on a given toolchain, but it might not.
Note that casting it to an int will not suffice, because a negative signed char will match a negative signed int on any platform I'm familiar with. The linked page says that the only valid input values for isspace are either valid unsigned char values or EOF. As isspace and its cousins are often implemented as macros, and this is C, it's not like automatic type coercion will get it to the right type.
Alternatively: It seems that isspace might be locale dependent. I'm not aware of any implementation that differs in whitespace classification, but if pikchr is or should be ASCII dependent, perhaps isspace (or other is functions/macros) should be replaced by something that is guaranteed to work for ASCII and not care about locale specificity.
(3) By Larry Brasfield (larrybr) on 2021-08-29 14:14:16 in reply to 2.1 [link] [source]
... casting it to an int will not suffice ...
I agree with that for the reasons you state, which is why the post I linked mentions an intermediate cast to unsigned char.
In retrospect, because the character classification functions take an int argument which could be outside of the 0 .. UCHAR_MAX range if a char with a set sign bit is passed on a platform where char is a signed value, I think the cast I recommended ( (int)(unsigned char) ) is appropriate in those isspace() calls.
There is little said in the Pikchr docs about expected character set. At https://pikchr.org/home/doc/trunk/doc/userman.md , under "Automatic Sizing To Fit Text Annotations", there is a clear implication that "non-ASCII" characters are anticipated. I think this means that UTF-8 is expected, which means set char sign bits should be accepted without exciting undefined behavior.
(4.1) By Scott Robison (casaderobison) on 2021-08-29 17:43:16 edited from 4.0 in reply to 3 [link] [source]
All reasonable. I'd be inclined to omit the final (int) cast just because it is automatic, but that's as much because I'm lazy as anything. :)
More than anything, I know we all have strong feelings about over aggressive warnings on many platforms that provide far more noise than signal. This is one of the relatively rare useful warnings I've seen reported as a defect in any of the drh led forums.
Edit: Note that I completely agreed with the original assessment of "will not fix" and I think it is because I'm so used to over aggressive warnings being reported. It wasn't until follow up posts that I read it more carefully and thought a bit more about it. The noise of pointless warnings almost led me to ignore a "legitimate" report because of historical bias.