The basis for an xtalk engine [I/we] control

tperry2x · Post by **tperry2x** » Tue Apr 01, 2025 5:40 pm

I'm not disagreeing with any of your points above, but you do know the regex version is already capable of this don't you?

: regex-matching.png (35.76 KiB) Viewed 11231 times

You can already be a bit fuzzy/esoteric in your script too.... just not as much as HyperCard.

: a-bit-fuzzy.png (10.31 KiB) Viewed 11227 times

YouTube · Post by **OpenXTalkPaul** » Tue Apr 01, 2025 5:48 pm

Do I go with a Tokenizer approach, or do I stick with a Regex approach?

A tokenized version still needs to employs regex for splitting out a line into tokens, right? Then it needs to process each token, resolve the values in order... so in the Hello,world example after the command 'put' it expects an expression that evalutes to a string, it would first try find that as 'all text between two quotes' and use that as string-literal value, but then if there's no quotes in that spot it would need to also check if it is something that evaluates to a string, could be a property name (expession begins with 'the'), a chunk expression (expression begins with word,char,line,paragraph,etc.), or container (begins with field/fld, btn/button, etc.). So it needs to check multiple things to see if expression evaluates to a string.

In the case of the missing space between "Hello, world" it simply did not tokenise 'correctly.
Put (command) should be the first token and 'an expression that evaluates to a string' ("Hello, world") should be the second token.

I don't really think of tokenised version or regex version as separate , you are tokenizing using regex, it's just not centralized or nested, some code I think is going to be essentially repeated. For example you have 'ask' and seperately 'ask password' or other forms of ask command. It could be one 'ask' evaluation that uses case/switch to look at the second token for another 'form' ('password') and branches out there, or checks to see if it's an expression that fits the next value it might be looking for (an expression that evalutes to a string).

tperry2x · Post by **tperry2x** » Tue Apr 01, 2025 6:04 pm

Yes, the tokenized version is still splitting things up using regex pattern matching.
The thing with using a tokenized version is you kind of have to guess at how the interpreter might be.... well... interpreting what you've just written

However, with a regex match used on a portion of a string, (rather than lots of little tokens), it's a lot easier to make this relay to the corresponding function.

The disadvantage of using just regex is that you can't be as 'wooly' with your scripting - like that example of HyperTalk you showed me before.

For example, I can't use:

Code: Select all

put round(char 1 of the time /0.25) into toutput
put toutput

but it DOES understand what I'm trying to do at least, and comes back with:
Error: round: argument must be a number
So, this works:

Code: Select all

put char 1 of the time into tNum
put round(tNum /0.25) into toutput
put toutput

Because it's less 'wooly' - I specified that tNum contains a number by putting a number into it on line 1 first.
Yeah, I know - not strictly necessary you'd think - and that's the drawback of purely using regex.

Plus the tokenizer has no idea how to process it because it's getting the equivalent of a 'bag-o'-words' and can't make sense which order they go into at the moment, so doesn't know how to match that up to the relevant function. It's then failing with the "t is null" error, because it doesn't know what it should have run in the first place.

That's what I'm currently struggling with: you'd still end up with regex matching being parsed from the tokenizer, so I don't know if it makes much difference? (this is my first go at ever building this kind of thing from the ground up, after all)

Both approaches have their advantages and disadvantages as far as I can see.

YouTube · Post by **OpenXTalkPaul** » Tue Apr 01, 2025 6:11 pm

I do know the non-centralized tokenizer version works already, for many things. And it is really great work, it is certainly usably fast interpreter so far already. But I have had some parsing errors with things like mixing in chunk or container expressions. I'm not sure which version I was trying. Could not set the left of btn X to the left of button Y (it might have been 'the loc' that I was trying). I know it's all work in progress so I tend to chalk problems up to that.

: Screen Shot 2025-04-01 at 2.09.25 PM.png (29.72 KiB) Viewed 11211 times

YouTube · Post by **OpenXTalkPaul** » Tue Apr 01, 2025 6:18 pm

and can't make sense which order they go into at the moment,

The 'bag of words' isn't it an list/array, shoould be able to itterate over them in order by index number?
So with put "hello, world' the bagOfTokens[1] element would be 'put' and the second element bagOfTokens[2] world be 'Hello, World'

tperry2x · Post by **tperry2x** » Tue Apr 01, 2025 6:31 pm

But this is where the thing like that winkler test comes in. HyperCard / Hypertalk was perhaps too forgiving in this regard.
I mean, you can already be pretty random:

: pretty-random.png (29.74 KiB) Viewed 11202 times

Where you really confuse a tokenizer is with something like this:

Code: Select all

put word 3 to 4 of "word of word with word and word"

It splits up all those "word of word with word and word" into separate tokens and gets properly confused, whereas the regex version returns:

Code: Select all

word with

Which is correct.

The tokenizer JUST needs to see the thing between quotes as a single string, but then I'd be using regex matching to find the quotes... so it's kind of parsing each thing twice, as it's effectively looping back over itself, which would make it slower.

But as I say, the suggestion to tokenize everything does make sense, and I'm sure it's the better approach in the long run. This is why I'd really like to get it working properly, rather than using regex pattern matching. (Even if regex seems easier and faster at the moment). This is all new to me, so kind of finding my way as I go here.

dandandandan · Post by **dandandandan** » Tue Apr 01, 2025 6:57 pm

You don’t tokenize begin quote, each individual token, end quote. The whole quoted string is one token.

YouTube · Post by **OpenXTalkPaul** » Tue Apr 01, 2025 9:29 pm

tperry2x wrote: ↑Tue Apr 01, 2025 6:31 pm Where you really confuse a tokenizer is with something like this:
Code: Select all
put word 3 to 4 of "word of word with word and word"
It splits up all those "word of word with word and word" into separate tokens and gets properly confused, whereas the regex version returns:
Code: Select all
word with
Which is correct.

The tokenizer JUST needs to see the thing between quotes as a single string, but then I'd be using regex matching to find the quotes... so it's kind of parsing each thing twice, as it's effectively looping back over itself, which would make it slower.

But as I say, the suggestion to tokenize everything does make sense, and I'm sure it's the better approach in the long run. This is why I'd really like to get it working properly, rather than using regex pattern matching. (Even if regex seems easier and faster at the moment). This is all new to me, so kind of finding my way as I go here.

Yeah what Dan said (and I eluded to earlier), anything between two double-quote marks should be single token, tokenizer should not try to parse anything in between them (at least not until we get to syntax like 'do <script>'). Once a scanner/tokenizer hits a double-quote character it should go into a with a repeat loop and continue scanning forward, chomping off char/bytes (bytes because unicode) and appending them to the current token until it runs into another double quote mark which terminates the string literal. I think it should first check that the expected expression is a quoted string literal, and if it is then chomp off the whole string and move on, this way the scanner/tokenizer would need only go into checking to see if the parameter passed is some type of container/variable/chunk only if it's not a string-literal. So it would need something like case/switch, and the first case would be a check for string literal (begins with double-quote mark).

The tokens of:
put word 3 to 4 of "word of word with word and word"
should be:
[1] put -- xTalk lines usually start with a command/action verb
[2] word -- keyword that begins a chunk expression
[3] 3 -- expresion that evaluates to a number
[4] to -- keyword indicating current chunk is range rather than a single word
[5] 4 -- expression that evaluates to a number (OXT can use -negatives to scan a string reversed / from right to left
[6] of -- expression that evaluates to a container to follow
[7] word of word with word and word -- the container in this case is a string literal

YouTube · Post by **OpenXTalkPaul** » Tue Apr 01, 2025 10:00 pm

If speed is a concern, i think that regardless of parsing / tokenizing system used there could be optimizations specific to xTalk.
For example I think it's accurate to say that every line in xTalk starts with one of:
1) an action verb (put, set, answer, speak, play, go, etc.)
2) or is part of a control structure (if, then, else, repeat, case/switch, etc.)
3) or is a handler definition/handler termination (on, function, end)

So I was thinking a good strategy might be to check the first word for an action verb from a list started from most commonly used keywords to least commonly used keywords. So 'put' would be at the top of that list. When you know what command a line is executing then you have a sort of formula for what parameters would be required for that command.
For example 'put' would always be followed by a parameter that is 'an expression that evaluates to a string' (even if that string contains numeric characters) followed by optional 'into' and then a 'a container that can store a string' (variable, property, fld, etc.). So the formula there is something like:
<command> <string or container> (optionally) 'into' <container>
'Put' is a little odd because there's the automatic 'default' container of msg box (or to 'std out' / console log in some cases) if no 'into' container is specified.

Dan did mention in that doc we could use his BNFtoLPEG JS tool used for HC sim.
The docs on Dan's site:
https://hypervariety.com/BNFToLPEG/
Source on Github:
https://github.com/hyperhello/BNFToLPEG
Probably a good idea to work with that and improve on it if possible, but it already seems to work very well for HyperTalk.
I really like what Dan done with the 'SimSCript' which makes it super-easy to expand on the interpreter's vocabulary.

https://www.jaedworks.com/hypercard/scr ... k-bnf.html
https://en.wikipedia.org/wiki/Backus–Naur_form
https://en.wikipedia.org/wiki/Parsing_e ... on_grammar
https://peggyjs.org
https://github.com/peggyjs/peggy
https://github.com/pegjs/pegjs/tree/master
https://coderwall.com/p/316gba/beginnin ... ith-peg-js
https://pest.rs/book/examples/csv.html
https://medium.com/@gvanrossum_83706/ad ... e00fa1092f
https://www.youtube.com/watch?v=XR36rbD6tRM
https://www.inf.puc-rio.br/~roberto/docs/peg.pdf
https://berthub.eu/articles/posts/pract ... g-parsing/
https://stackoverflow.com/questions/334 ... use-peg-js
https://stackoverflow.com/questions/524 ... arser?rq=4

My eyes start to glaze over when I look at those sorts of grammar expression definitions and try to make sense of them, but you can see in the Wikipedia PEG examples that there is sort-of-regex-like pattern matching mixed in there.

Expr ← Sum
Sum ← Product (('+' / '-') Product)*
Product ← Power (('*' / '/') Power)*
Power ← Value ('^' Power)?
Value ← [0-9]+ / '(' Expr ')'

I just noticed the L in BNFtoLPEG stands for Lua, that's interesting.

oh and another point I was thinking about with parsing/tokenizer, eventually we'd want it to maintain a record of the scanner/parser 'coordinates' (line/character) so we can find and highlight an offending bad script line when debugging our scripts.

You may have already seen these, but here's some links pertaining to Parsing Expression Grammars:
https://nathanpointer.com/blog/introToPeg
https://itnext.io/create-a-custom-parse ... e697313926
https://www.youtube.com/watch?v=EubNzfhZS_E
https://medium.com/@gvanrossum_83706/bu ... 869b5958fb
Here's somethings that may be alternatives:
https://ohmjs.org https://ohmjs.org/pubs/live2016/
and xTalk related Node.js thing built on that looks similar to what's brewing here:
https://github.com/dkrasner/Simpletalk
https://simpletalk.systems
Antlr4 HyperTalk Grammar:
https://github.com/antlr/grammars-v4/bl ... perTalk.g4
https://news.ycombinator.com/item?id=2331234
Peg for C++ https://berthub.eu/articles/posts/pract ... g-parsing/

dandandandan · Post by **dandandandan** » Wed Apr 02, 2025 2:15 am

Don’t bother thinking about the parsing speed. If you’re optimizing for performance at all, the parsing is done once and the execution is streamlined in more detailed ways to be faster inside loops.

YouTube · Post by **OpenXTalkPaul** » Wed Apr 02, 2025 4:37 am

I think it should first check that the expected expression is a quoted string literal, and if it is then chomp off the whole string and move on, this way the scanner/tokenizer would need only go into checking to see if the parameter passed is some type of container/variable/chunk only if it's not a string-literal. So it would need something like case/switch, and the first case would be a check for string literal (begins with double-quote mark).

Quoting myself here. This wouldn't help anything anyway, because parsing would still need to do more checks for like a compound expression because there can be concatenation like so:

Code: Select all

put "H" & "ello" & comma && "world!"

With the OXT or LCS interpreter you can also use comma a bit like & &&
for example

Code: Select all

put the short name of button 1, the first item of the backColor of button 1 -- would put something like "My Button Name,255"

tperry2x · Post by **tperry2x** » Wed Apr 02, 2025 7:03 am

Certainly suffering with information overload now.
Leave it all with me - see you in about a week.

YouTube · Post by **OpenXTalkPaul** » Wed Apr 02, 2025 4:57 pm

Sorry for that flood of article/links, to some degree I posted them for myself to look at later when I have time.
That one research project is particularly interesting to me:
https://simpletalk.systems
That's a live editable 'stack', right click on something on that page and you'll see,
It's definitely an xTalk that does xCard / UI stuff using JS / web tech. Might be worth looking at the source code (although it's UI is a little sluggish compared to HC sim, and there's some weirdness to how they've implemented properties)
https://github.com/dkrasner/Simpletalk

tperry2x · Post by **tperry2x** » Thu Apr 03, 2025 7:09 am

Just a quick progress update, I have implemented the Parser expression grammar (PEG) and I've also (using Dan's BNF as inspiration), created an independent expressionParser - together with the tokenizer, and the PEG - this means you don't end up with a huge monolithic interpreter.js file that solely uses Regex (regular expressions). Now, each function has it's own js (which is dynamically loaded and unloaded) - so no need to declare the *.js files in the html file either.
(I'm making this as modular, extendable, simple to modify, and most importantly: to diagnose issues, as I can)

What I'm currently working on is modifying the functions I've already written, as they need some slight tweaking.
There's not many apparent visual changes, Paul - not from what I shared previously, but one of those is the addition of a "parserLogic" field on the index page. This will show you in realtime what the tokenizer and parser are 'thinking'. I'm going to make this a toggle option (same as the message box) with a keyboard shortcut.

More to follow in the next few days...

: v133-screenshot.png (61.12 KiB) Viewed 10798 times

edit: (meant to add earlier)
This also resizes correctly now for mobile/tablet devices too:

: dynamic-resizing-and-chunk.png (41.76 KiB) Viewed 10663 times

TerryL · Post by **TerryL** » Thu Apr 03, 2025 5:00 pm

https://pepa.holla.cz/wp-content/upload ... ition1.pdf
Remarkable progress. I've been reading a javascript beginners guide.pdf. It's well written with examples and detailed chapters on arrays and regular expressions. Maybe it will give Tom and Paul some ideas.

Not that you already have your hands full, and juggling life-issues too, so forgive me. I checked WebTalk Doc-128 and couldn't find if you've worked out...
- the target, target() --similar to 'me' but probably hard to translate
- result() --the result synonym
- global/local (script local and within a handler) declarations for container names.
- switch/case/break
- arrays --maybe include split and combine
- repeat while <condition>, repeat until <condition>, next repeat --example: repeat while intersect(fld "A", grc "Ball")
- specialFolderPath("desktop") --desktop, documents, resources
- window.find() //launch browser's find dialog
- window.confirm() //answer dialog with cancel and ok btns
- Can js be coaxed to translate:
answer "Please select a color." with "Cancel" or "Red" or "Green" or "Blue" --oxt: max = 7 buttons

Kdjanz · Post by **Kdjanz** » Fri Apr 04, 2025 4:13 am

https://lynxjs.org

This may be a complete red herring, but I ran across this today and it has just become open sourced and moved to GitHub. I'm not sure if it has any relevance to what the new direction is here, but they talk about building for the web and native Android and iOS as well. It is based on .js of some sort with a custom JS engine claimed to be lightening fast while still being tiny. What makes all of this somewhat credible to me is that is being offered by ByteDance who are famous for TikTok - so they obviously know something about doing things at scale and with serious quality control. Not your usual coders in a basement. I'd be very interested to have the gurus take a look at it and report back (explain it like I'm 5 please) on whether this is useful or going off in another direction. I don't know how this ties into Emscripten etc. and whether this could cut through that knot. But I hope that 15 minutes on their site will either give a quick thumbs down as useless or a big desire to take a closer look.

Hoping this might be the Ferrari engine that slides in under the sleek new body that is taking shape as we watch.

Kdjanz · Post by **Kdjanz** » Fri Apr 04, 2025 4:46 am

https://bare.pears.com

I promise I will quit now!

But here is another ?engine? that makes relevant sounding noises to this novice:

Actually
Run Javascript Everywhere

Bare is a small and modular JavaScript runtime for desktop and mobile. Like Node.js, it provides an asynchronous, event-driven architecture for writing applications in the lingua franca of modern software. Unlike Node.js, it makes embedding and cross-device support core use cases, aiming to run just as well on your phone as on your laptop. The result is a runtime ideal for networked, peer-to-peer applications that can run on a wide selection of hardware.

The modular aspect means that you only add to the tiny core what you actually use - HTML, file system access, or whatever. So only the essentials have to ship underneath our code, not the monster of Node or Electron.
So could someone tell me if this is useful or not?

YouTube · Post by **OpenXTalkPaul** » Fri Apr 04, 2025 6:50 am

Kdjanz wrote: ↑Fri Apr 04, 2025 4:46 am The modular aspect means that you only add to the tiny core what you actually use - HTML, file system access, or whatever. So only the essentials have to ship underneath our code, not the monster of Node or Electron.
So could someone tell me if this is useful or not?

Ideally I'd like a wrapper 'app' that uses a webview with whatever web engine the OS comes preinstalled with as I believe most OSes do nowadays. On macOS and recent 'Buntus that would be Webkit. I don't think it would be too difficult to make a shell app that creates a web view and loads the 'ide' / index.html, but Electron includes APIs for doing standalone .app type things, like direct file-system access, shell(), etc.

It could be. I'm certainly interested in alternatives to Electron that use Webkit or Gecko instead of full blown and rather large Chromium Embedded Framework.
Perhaps this subject (web app to standalone app) should be it's in it's own topic.

YouTube · Post by **OpenXTalkPaul** » Fri Apr 04, 2025 7:05 am

tperry2x wrote: ↑Thu Apr 03, 2025 7:09 am Just a quick progress update, I have implemented the Parser expression grammar (PEG) and I've also (using Dan's BNF as inspiration), created an independent expressionParser - together with the tokenizer, and the PEG - this means you don't end up with a huge monolithic interpreter.js file that solely uses Regex (regular expressions). Now, each function has it's own js (which is dynamically loaded and unloaded) - so no need to declare the *.js files in the html file either.
(I'm making this as modular, extendable, simple to modify, and most importantly: to diagnose issues, as I can)

Fantastic! Thanks, looking forward to testing it out.
In the meantime I've been doing some tinkering around with 132

I was wondering why 'play' command wasn't working in Safari (but worked in Chrome and Firefox). I thought it was the 'user interaction required' problem (not sure if you're familiar with that mostly Safari issue). But I WAS interacting with the document so that shouldn't be an issue. it turns out it's just that Safari has no built-in support for playing .ogg files. I copied my boing .wav into the sounds folder and the play command works fine with that. For compressed samples, mp3 or MPEG4.m4a work fine too.

Eventually it would be good to employ WebAudio API there, you use it to make audio-processing graphs (a LOT like Apple's CoreAudio), which means we can have audio effects like reverb, pitch shift sounds, stream audio, have multiple input sources, etc. Of course I'm also going to want playPMD's extended 'playSentence' and MIDI I/O ... eventually

richmond62 · Post by **richmond62** » Fri Apr 04, 2025 10:59 am

So? How on earth one can get anything to run on ALL browsers on ALL operating systems ALL of the time . . . might be a very tall order.

Oregano on RISCOS?

The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Re: The basis for an xtalk engine [I/we] control

Who is online