SjASMPlus от z00m

Цитата:

Сообщение от Bedazzle

И более гибкие как раз мейкфайлы и скрипты снаружи, которые можно править как и когда тебе хочется, а не ждать, когда что-то внедрят в гибкий ассемблер.

мейки умеют плавающую запятую?
там логарифмы, степеня?

https://github.com/ped7g/ZXSpectrumN...acked.i.asm#L3

There was some ongoing discussion about packing 5bit 0..31 values (aka "strings" with limited charset) to bitstream, and how long the decoding routine has to be.
From the initial 22B by Busy we went down to 19B (thanks to Baze and Zilog), and I collected it all into my "code snippets" project (it's targetting ZX Next, but the particular 5b-decoding is classic Z80 only).

There's also sjasmplus macro to encode regular string literals... I'm so glad I did add WHILE in v1.18.0 ... it's becoming really handy :D (I could used `DUP inputdata_size` in this case too, but WHILE reads better to me, to understand what is happening).

Have fun. :) (I have some suspicion this is "old news" for you, but consider there are many newcomers trying to learn Z80 programming, the snippets are targetting those, as examples of small routines)

Цитата:

Сообщение от NEO SPECTRUMAN

мейки умеют плавающую запятую?
там логарифмы, степеня?

Да. Всё что угодно через любое ПО.

Цитата:

Сообщение от NEO SPECTRUMAN

мейки умеют плавающую запятую?
там логарифмы, степеня?

Насколько помню, мы начинали с объединения нескольких бинарников в один.
Где здесь нужна степень или логарифм?

А если мы говорим просто о скриптах, то любой современный язык это умеет.
И да, он будет более гибкий, чем асм.

Цитата:

Сообщение от Bedazzle

Насколько помню, мы начинали с объединения нескольких бинарников в один.
Где здесь нужна степень или логарифм?

там где заявили, что компилятор ничего не должен делать

Цитата:

Сообщение от NEO SPECTRUMAN

там где заявили, что компилятор ничего не должен делать

Объединять три готовых бинарных куска в целое?
Да, хорошо зарекомендовавшая себя практика - делать это снаружи ассемблера.

there are several different approaches possible.

One MSX project I have seen sources of does this:
using `--longptr` option to have 24bit address space and `output -> outend` to produce binary 2MB file in one go (assembling machine code sequentially, emitting the file byte by byte). Then the mapping looks like `ld a,address>>14 ; bank num from long address` and `ld hl,address&$3FFF` to have offset into it (+ adjust to particular target area in memory) ... or something like that, I hope I remember it correctly.

I would personally go the virtual DEVICE route, making the virtual mapping match my runtime mapping, and using 16b labels (with page info):

Код:

DEVICE ZXSPECTRUMNEXT ; Next has 8ki pages MMU 0 n, 26, $0000 ; map page 26 into $0000..$1FFF , org $0000, enable auto-wrap spriteData1: incbin "spr1.bin" ; spriteData1 == $0000, $$spriteData1 == 26 (page num) spriteData2: incbin "spr2.bin" ... ORG $8000 ; runtime mapping of data2 into $0000..$3FFF area (in case it does cross 8ki boundary while incbin) nextreg $50,$$spriteData2 ; maps $0000..$1FFF to page of spriteData2 symbol nextreg $51,$$spriteData2+1 ; maps $2000..$3FFF to page+1 (to cover for 8ki cross) ld hl,spriteData2 ; some 0000-1FFF offset to the beginning of the data

prepare full 1/2 MB memory state, and then SAVENEX to the "NEX" file which works with this pattern well.

For multi-load one can prepare each block in virtual memory, SAVEBIN/SAVEDEV it, then ORG back at the beginning and start over with second part...
All of these DEVICE "SAVE" directives are saving *current* state - local to their position in code, so you can produce different binary blocks from the same virtual memory area in one assembling - contrary to OUTPUT way, where you are emitting stream of bytes, making it quite tricky to "go back" (it's possible by opening the OUTPUT for rewrite, and then FPOS to position where you want to overwrite data - possible, but cumbersome a bit)).

The script few messages back to merge several bin blocks together into one big file using OUTPUT + INCBIN should definitely work (if small parts are less than 64kiB, then you will not even cross $10000 boundary, so no warning, but even with larger files it should work.. or you can avoid warning by some other trick, like INCBIN into MMU wrap-around page while also having OUTPUT active, or by using separate asm file without code, just to build the final big ROM, being assembled with --longptr switch).

....

But it also depends what you need at runtime, and how you plan to address those data... The example in this answer is slightly Next-specific for NEX file output, where I know the loader ".nexload" did already load all banks into memory, and I can just switch banks and data are there. Although I believe the principle does apply also to regular zx128 (but loader is up to you).

- - - Updated - - -

If you are just asking the basic "can sjasmplus produce 1MB ROM files?", then the answer is simple yes.

It should not only be possible, but it should feel like well supported, by multiple possible approaches. If you hit some issue with that, surely report it.

If you have issue to choose initial approach, then describe better what you are trying to do, and add some examples what you expect while writing runtime code, and what is the form of input data you want to assemble together. :)

- - - Updated - - -

Цитата:

Сообщение от NEO SPECTRUMAN

а можно сделать дописывание файла при помощи SAVEBIN?

Thinking about it second time, while answering the stuff above... you can also use DEVICE with large-enough memory, INCBIN everything into memory into continuous memory block, and SAVEDEV to big file. (I forgot to mention SAVEDEV before)

@Ped7g just wanted to thank you for your continuing support of this assembler :) I believe you've spent hundreds of hours of your free time on it. Some people here were somewhat harsh, but you were always nice and sympathetic. Thank you! And keep up the good work :)

Цитата:

Сообщение от Ped7g

SAVEBIN "part1.bin",$0000,$2000

а savebin происходит только во время pass 3 ?

Цитата:

Сообщение от NEO SPECTRUMAN

а savebin происходит только во время pass 3 ?

Yes, it's saving bytes from DEVICE memory, which is not used in first passes during assembling, so it's completely zeroed in pass 1/2, and there's nothing to save.

The machine code production happens in third pass (also for OUTPUT/OUTEND/SAVETAP/... all the others).
(production = writing it... the machine code is "dry-simulated" in first 2 passes, ie. the assembler calculates how many bytes the instruction takes, to adjust labels, but then it throws away the opcode)

So in recent weeks I was abusing sjasmplus script a lot (as a form of test/exercise of the scripting implementation, some of the experience went into improving it just before v1.18.0 release)...

If you are serious about sjasmplus scripting, maybe you can pick up a trick or two from these:
https://github.com/ped7g/adventofcode

Код:

tmp_cnt = 0 dup 10 if ((tmp_cnt = 0) || (tmp_cnt = 5) || (tmp_cnt = 7) || (tmp_cnt = 8)) display tmp_cnt," skip" else display tmp_cnt," ok" endif tmp_cnt = tmp_cnt +1 edup display " " tmp_cnt = 0 dup 10 if ((tmp_cnt != 0) || (tmp_cnt != 5) || (tmp_cnt != 7) || (tmp_cnt != 8)) display tmp_cnt," ok" else display tmp_cnt," skip" endif tmp_cnt = tmp_cnt +1 edup display " " tmp_cnt = 0 dup 10 if ((tmp_cnt =! 0) || (tmp_cnt =! 5) || (tmp_cnt =! 7) || (tmp_cnt =! 8)) display tmp_cnt," ok" else display tmp_cnt," skip" endif tmp_cnt = tmp_cnt +1 edup display " " tmp_cnt = 0 dup 10 if ((tmp_cnt <> 0) || (tmp_cnt <> 5) || (tmp_cnt <> 7) || (tmp_cnt <> 8)) display tmp_cnt," ok" else display tmp_cnt," skip" endif tmp_cnt = tmp_cnt +1 edup

Код:

> 0x0000 skip > 0x0001 ok > 0x0002 ok > 0x0003 ok > 0x0004 ok > 0x0005 skip > 0x0006 ok > 0x0007 skip > 0x0008 skip > 0x0009 ok > > 0x0000 ok > 0x0001 ok > 0x0002 ok > 0x0003 ok > 0x0004 ok > 0x0005 ok > 0x0006 ok > 0x0007 ok > 0x0008 ok > 0x0009 ok > > 0x0000 ok > 0x0001 skip > 0x0002 skip > 0x0003 skip > 0x0004 skip > 0x0005 skip > 0x0006 skip > 0x0007 skip > 0x0008 skip > 0x0009 skip > test.asm(52): error: Syntax error: > 0) || (tmp_cnt <> 5) || (tmp_cnt <> 7) || ( tmp_cnt <> 8)) test.asm(52): error: ')' expected

почему не работает != ? o_O
почему =! дает какой то результат?

Цитата:

Сообщение от NEO SPECTRUMAN

почему не работает != ? o_O

Looks all correct to me?

The first part is doing "skip" when cnt is 0 or 5 or 7 or 8

The second part is doing "ok" every time (first two "(cnt is not 0) or (cnt is not 5)" are enough to be always true for any cnt value)

The third part "=!" is parsed as: "=" is equivalence operator, "!" is logical not, so 0 becomes -1, and non-zero values become 0
That leads to "ok" when cnt is -1 or 0

Fourth is just syntax error, the "<>" operator doesn't exist in sjasmplus.

Did you want in second block negation of first block? Then: !(A || B) = (!A) && (!B)
(note the "logical or" becomes "logical and" when whole expression is negated)

Код:

tmp_cnt = 0 dup 10 if ((tmp_cnt != 0) && (tmp_cnt != 5) && (tmp_cnt != 7) && (tmp_cnt != 8)) display tmp_cnt," ok" else display tmp_cnt," skip" endif tmp_cnt = tmp_cnt +1 edup

Цитата:

Сообщение от Ped7g

Looks all correct to me?

значит у меня не правильное понимание || и(людское) &&
и значит (|| людское and &&) != ((людское OR) людское and (людское AND)) :)

logical or (T = true [-1 in sj], F = false [0 in sj]):
F || F = F
T || F = T
F || T = T
T || T = T

logical and:
F && F = F
T && F = F
F && T = F
T && T = T

logical not:
!F = T
!T = F

And the general math rules for propagating logical not in complex expressions:
!(A < B) <=> A >= B
!(A == B) <=> A != B
!(A || B) <=> !A && !B
!(A && B) <=> !A || !B
...
https://en.wikipedia.org/wiki/Negation#Distributivity

sjamplus treats any non-zero value as true, and zero value as false, but when boolean-true is calculated, it is represented by ~0 (-1).
value = 4 ; value = 4
value = !4 ; value = 0
value = !!4 ; value = -1 (true)

( "людское ponimanje" => no idea what you are talking about, the math way is the only way I think about logic, even when talking in human language ... but there are many people not able to correctly negate predicate ... for example they ask you if you like blue colour, and you say "no", they will think you hate blue (which is incorrect negation "!like" is not "hate")) :)

Цитата:

Сообщение от Ped7g

"людское ponimanje" => no idea what you are talking about,

не знаю как у вас
у нас когда люди говорят

если (if)
A не равно 1
или (or)
A не равно 5
тогда (then)
Ы = 7
если нет (else)
тогда Ы = 0

это имеется в виду (и это правильно)

Код:

if A <> 1 if A <> 5 Ы = 7 else Ы = 0 endif else Ы = 0 endif

и ожидаемый результат
1 Ы=0
2 Ы=7
3 Ы=7
4 Ы=7
5 Ы=0

такое никогда не имеют в виду

Код:

if A <> 1 temp1 = 1 else temp1 = 0 endif if A <> 5 temp2 = 1 else temp2 = 0 endif if temp1 = 1 Ы = 7 else if temp2 = 1 Ы = 7 else Ы = 0 endif endif

и такой рeзультат
1 temp1=0 or temp2=1 = Ы=7
2 temp1=1 or temp2=1 = Ы=7
3 temp1=1 or temp2=1 = Ы=7
4 temp1=1 or temp2=1 = Ы=7
5 temp1=1 or temp2=0 = Ы=7
совершенно неожиданный

людскими словами описывается совершенно другой процесс
а такой вариант правильный только с точки зрения машинной логики
и людскими словами описать его намного сложней (это будет достаточно много слов)

к вам претензий нет
просто это не совсем очевидный аспект

Цитата:

Сообщение от NEO SPECTRUMAN

это имеется в виду (и это правильно)

Код:

if A <> 1 Ы = 7 else if A <> 5 Ы = 7 else Ы = 0 endif endif

Кривой код
else
Ы = 0
никогда не отработает

Цитата:

Сообщение от Bedazzle

никогда не отработает

пофиксил

поэтому и нужон долбанный список условий по типу if a=1,5,6
а не трехэтажные IF-ы
в которых можно чего то просмотреть

или нужны элементарные for и goto
чтоб не городить огород из dup-ов и if-ов
ТУПО пытаясь повторить их функционал

додо знаю щас ТЫ начнешь мне расказывать что
компилятор не должен это делать
это должен делать cpp, линкер, хренкер прочая ересь
и весь исходник должен состоят из батников
а компилятор должен только заменять nop на $00 и ничего более
и даже не должен делать переходы на метки вперед
и все метки для переходов должны быть предварительно объявлены в начале исходника
а то ж ведь компилятор еще не знает куда ему переходить когда код выглядит так
jp l1
l1
:v2_tong2:

Цитата:

Сообщение от NEO SPECTRUMAN

не знаю как у вас
у нас когда люди говорят

если (if)
A не равно 1
или (or)
A не равно 5
тогда (then)
Ы = 7
если нет (else)
тогда Ы = 0

yeah, we say "if not equal to 1 and not equal to 5". I guess some people could use "or" there, but they are just using the language wrong, or not understanding the logic expressions. (mind you, while I didn't finish my university studies, I made it far enough to have all exams from math basics, like logic evaluation, so for me it's easy to use the language correctly in logic constructs ... but from my experience with people who didn't study it, it's common they don't understand the rules correctly, and say the wrong thing, while they think something else - very common by non-math people, it's not easy to formulate the rule correctly in human language, even by intelligent people, if they are missing the formal math teaching).

But I think your example is actually correct if read in certain way, like this: "if A not equal to (1 or 5), then B = 7, else B = 0" -> now this makes some sense also from math point of view, a bit like incomplete distribution of negation, so you end with something like "IF A != (1 || 5)", but if you expand it to simpler statements, it needs the || negation, so "IF A != 1 && A != 5"

Any way, you can in sjasmplus source always try to pick the "simpler to read", so if you are more familiar with "1 or 5", then do the positive form and negate the final result, like this:

Код:

IF !(A == 1 || A == 5 || A == 7) ;; other values of A (not 1 or 5 or 7) ELSE ;; when A is 1 or 5 or 7 ENDIF ;; notice the "!" at beginning of the expression ;; flipping the final result of the test

Ped7g, а можно добавить режим
чтоб при перезаписи выдавались предупреждения?

чтоб такой код

Код:

org $C000 xor a xor b xor c xor d org $C001 xor e

выдавал
warning $C001 was overwriten

ну и если не заводить отдельный массив для этого
то хотя бы проверять на $00 перед записью

nop-ов не так много в коде
и лучше хоть какая то проверка чем никакой...

Current status: no such feature

Future:
- not as default for sure, because there are projects using the pattern of device memory re-use to build different binary blocks at the same position (savebin/savetap between).
- as option: possible... sounds like some work, to develop some simple and fast code for it, as there is no simple test (except having usage-bitmap for whole device memory, but that sounds ugly. But maybe there is some other way, maybe some lists of ranges and checking for overlaps.
- workaround: use less `ORG`, use more `ds/align` to advance to next block of code, use ASSERT or MMU with "w" or "e" mode to guard you against making current section bigger than what you expected, etc...

But it sounds like non-trivial thing to add, and also you should be able to work-around it with ASSERT quite well. Switching one option ON would be even better, but doesn't feel to me like critical feature needed ASAP.

But I will keep it on mind... (maybe add enhancement-issue on github so I don't forget about it :D )

Цитата:

Сообщение от Ped7g

- not as default for sure

это нужно только для отладочных проверочных целей
так как не всегда ясно правильно ли все расположено в памяти или нет
а проверить нет возможности

у меня 100% org-ов генерируется при помощи lua
а на сколько это правильно происходит в сочетании с макросами переменной длинны не ясно

- - - Добавлено - - -

Цитата:

Сообщение от Ped7g

use more `ds/align`

они не рационально расходуют память

вот такая ошибочная конструкция не выдает ошибку и не генерирует никакой код

Код:

macro test test endm org $8000 test

все что видно на экране

Цитата:

SjASMPlus Z80 Cross-Assembler v1.18.0 (https://github.com/z00m128/sjasmplus)

и найти эту ошибку трудно

с таким же успехом не работает такой не правильный код

Код:

macro exx exx endm org $8000 exx

а вот такая конструкция прекрасно рабоатет

Код:

macro EXX exx endm org $8000 EXX

я длительное время подобным пользовался и даже не замечал что это зацикленный макрос :)

For stuff like this - intentionally wanting to substitute legal instruction by your own macro:

Код:

macro exx ; exx ; infinite cycle @exx ; Z80 exx, avoiding infinite cycle of macro calling endm

you can use the "@" operator to prevent macro matching, ie. `@exx` inside macro definition to do real Z80 `exx`, and in your code you can then:

Код:

exx ; does the macro @exx ; does the Z80 exx even if "exx" macro exists

The error messages and state recovery in case of infinite loops in DEFINE/MACRO definitions is really bad, sometimes it will raise max-nesting limit, sometimes it will just crash the assembler - I know about it, but I have currently no mood to work on this stuff

Код:

macro test test endm org $8000 test

^^ this needs just few changes to be legit code, for example:

Код:

macro test nesting = nesting - 1 IF 0 < nesting test ENDIF endm org $8000 nesting = 4 test

^^^ this is valid source, and will compile fine, but I don't see how you would consider it "different" from yours infinite example (except hitting some guardian-value to notice it's in cycle too much).

So it's not so easy to guard against it... there are some protections in sjasmplus which should catch infinite substitutions and loops, but they are not working that good at this moment, sorry. There's a room for improvement in this area, I agree.

v1.18.1 released

https://github.com/z00m128/sjasmplus...es/tag/v1.18.1

Big-Endian hosts support (experimental and not tested continuously)
added "listall", "listact" commands to OPT - to switch between listing types
WHILE has optional argument to set explicit guardian-counter
ASSERT has optional argument (to add description/notes for expression)
SLOT and MMU will now accept also starting address of slot instead of its number
fix: option --sym was not exporting labels starting with underscore
fix: SAVENEX BMP-loader bug when certain builds of sjasmplus were unable to open BMP files
fix: after STRUCT instance the "main" label is not polluted by last field of STRUCT
minor bugfixes in parser, windows cmake-builds have now icon
docs: adding "Index" section
docs: adding some missing information (__DATE__, __TIME__), fixing HTML anchor names

Код:

13 0007 DD 8C ADC A,ixh ;* DD8C 14 0009 all_ops.asm(15): error: Label not found: IXh 15 0009 CE 00 ADC A,IXh ;* DD8C

yes, instructions and registers have to be same-case in sjasmplus. Labels are case sensitive.

такой стиль написания используется в
The Undocumented Z80 Documented (Sean Young)

да и местами в оригинальной документации
https://jpegshare.net/images/97/92/9...75351bf1e2.png

как то не хорошо получается...

- - - Добавлено - - -

я сначала подумал, что не все undoc opcodes поддерживаются...

I could be nitpicking that you are pointing onto "operation description", not source-syntax (that's "LD IY, (nn)")...

but I guess you are somewhat right, even the "IYh" in "operation" is unfortunate, and I see how you could expect sjasmplus to accept it.

But official documentation does only describe instruction, it does not define full syntax of "assembly" language, and it's quite common for assemblers to fill the gaps with their own rules. The same-case rule for instructions and registers is something I actually like, because then PascalCaseLabels can't be misunderstood as instructions by accident.

The most disturbing thing about your post is, that `OPT --syntax=i` will not accept "IYh" either, i.e. `Ld IYh,3` will fail even with `--syntax=i`, when `Ld iyh,3` works...
So I'm trying to make my mind, if this is bug or not. (do you have opinion?)
Otherwise what you see is "works as designed", not a bug.

But if we are talking about *you* writing *new* sjasmplus source, please just use same-case and avoid --syntax=i, I don't see how having mixed-case instructions/registers makes it better, IMO such source is harder to read. I know this is strongly personal and subjective topic, but I would suggest for new source style:

Код:

ORG $879A ; uppercase directives PascalCaseLabelsWithColon: ld iyh,3 ; lowercase instructions and registers ld hl,12 ; HL = 12 (with uppercase reg in comments)

I can see how adding options to sjasmplus helps with legacy sources you don't want to edit, but for new sources I would stick to *some* style (even if you don't like my suggestion above, feel free to modify it to your liking, but I have difficult time to accept that all-case instructions+registers are a problem, especially as when writing assembly the actual writing of instructions usually takes very little time, so fixing formatting of source is just few seconds, compared to hours of thinking and debugging).

Цитата:

Сообщение от NEO SPECTRUMAN

Код:

13 0007 DD 8C ADC A,ixh ;* DD8C 14 0009 all_ops.asm(15): error: Label not found: IXh 15 0009 CE 00 ADC A,IXh ;* DD8C

https://github.com/z00m128/sjasmplus...6ebadef44d6f9a
"parser: --syntax=i makes now also registers case insensitive"

Код:

... (listing) 12 0008 OPT --syntax=i ; test the syntax options "i" 13 0008 Label1: 14 0008 C3 08 00 Jp Label1 14 000B C3 08 00 jP Label1 ; instructions should be case insensitive 15 000E 00 00 Align 4 15 0010 aLiGN 4 ; directives too 16 0010 EB ex De,Hl ; registers should be also case insensitive 17 0011 DD 45 DD 4C ld b,IXl,c,IXh ; BTW this is actual way how Zilog describes half-ix regs ...

Still NOT recommended (by me), but possible, if you insist on it, and switch on the `--syntax=i` mode.

https://github.com/z00m128/sjasmplus...es/tag/v1.18.2

new exist operator to check label existence
the --syntax=i mode makes now also register parsing case insensitive
minor bugfixes (predefined values, savenex BMP loader less strict about "colors used" content)

Мне одному кажется что эту тему давно следовало бы пришпилить в "Важно:" , наравне с "SjASMPlus Z80 кросс ассемблер" или даже как более актуальную ?

@Ped7g as a Linux user I am very interested in converting characters in "ENCODING" from UTF-8 to one of these (WIN/DOS). But unfortunately it does not work it seems. Is it possible to implement? Or maybe I am missing something here. Actually it would be nice to have a full-fledged iconv or similar, as now it is limited to russian encodings...

- - - Добавлено - - -

Цитата:

Сообщение от Dart Alver

Мне одному кажется что эту тему давно следовало бы пришпилить в "Важно:" , наравне с "SjASMPlus Z80 кросс ассемблер" или даже как более актуальную ?

Да, готово. Просто ее постоянно поднимали, я и не заметил.

Цитата:

Сообщение от Shadow Maker

@Ped7g as a Linux user I am very interested in converting characters in "ENCODING" from UTF-8 to one of these (WIN/DOS). But unfortunately it does not work it seems. Is it possible to implement? Or maybe I am missing something here. Actually it would be nice to have a full-fledged iconv or similar, as now it is limited to russian encodings...

I'm also on linux and used to utf-8 everywhere, so I do get your idea, but I'm not sure what/how to add to sjasmplus.

As ZX asm programmer I expect everything to be 8bit, so the utf8 -> 8bit conversion makes sense, but by what rules? In our ZX demos we often use custom encoding, not even some old DOS or win CP page, but completely custom one, so no generic tool will help much with that.

Maybe with Russian texts you have less mixing and use only few encodings, but I'm still not sure how to define one.

And finally the sjasmplus project currently doesn't contain any utf-8 implementation, so if I would link against iconv/similar, it will grow the dependencies list, so it should be rather something very well working feature, to make that cost worth it.

BTW if you need some standard win/dos encoding, I guess you can add into your makefile/build script the pre-build step using iconv itself, for example having .asm in utf8, and "building" .a80 files by implicit makefile rule using iconv similar to this:

Код:

$ echo "Мне одному кажется" | iconv -f UTF-8 -t CP1251 | hd 00000000 cc ed e5 20 ee e4 ed ee ec f3 20 ea e0 e6 e5 f2 |... ...... .....| 00000010 f1 ff 0a |...|

and then build the ".a80" with sjasmplus. If you use the utf-8 chars only within double quotes or comments, it should work.

So under linux, for general conversion like utf-8 -> cp1251 I don't feel sjasmplus needs any change, you can easily work around that (I have no idea if other OS have iconv and other powerful tools, maybe it's more difficult on other systems).

But if you guys have some cool ideas, how to bring into sjasmplus something even better, something what would do even things which iconv can't cover and|or makes life easier for people who use completely custom encoding, let me know.

But I can't imagine any particular nice syntax for some new directive covering these special cases, and when I recently was helping on one sjasmplus project which was "scrambling" text strings with custom xor-scheme to make strings hidden from simple view, I did end writing macro using the {b adr} memory read, changing the regular `db "some text"` into final scrambled bytes in DUP-loop produced by encoding macro. So even many of custom encodings (following some simple formula for 90% of chars and having only few special rules) could be done quite easily in sjasmplus with post-process macros.

Feels to me like it's not very difficult to resolve any of this use-cases even with current sjasmplus, but I have difficult time to imagine change which would help and be also elegant and worth implementing. (except the obvious utf8->cp1251 internal conversion, but that feels to me a bit useless, as I can use `iconv` for that already).

- - - Updated - - -

edit: to make that command line more complete, including the sjasmplus... :D (just for my own amusement and test)

Код:

$ echo "txt: db \"Мне одному кажется\"" | iconv -f UTF-8 -t WINDOWS-1251 | sjasmplus - --raw=- | hd SjASMPlus Z80 Cross-Assembler v1.18.2 (https://github.com/z00m128/sjasmplus) Pass 1 complete (0 errors) Pass 2 complete (0 errors) Pass 3 complete Errors: 0, warnings: 0, compiled: 2 lines, work time: 0.001 seconds 00000000 cc ed e5 20 ee e4 ed ee ec f3 20 ea e0 e6 e5 f2 |... ...... .....| 00000010 f1 ff |..|

Цитата:

Сообщение от Ped7g

I'm also on linux and used to utf-8 everywhere, so I do get your idea, but I'm not sure what/how to add to sjasmplus.

Thanks for your detailed reply :) Yes, sure, I know how to do it without sjasmplus, but I feel shame that we have commands which are not really usable :)
In russian texts we usually use native dos or win russian encoding, custom encoding is also used, but it is not really common nowadays, when everyone have access to PC and code with sjasm, why invent something else :)
Only idea I have then is custom table approach (so basically a replacement list), so have it like:

Код:

replace_start "table1" db "тест тест" replace_end

or maybe this (but not sure what best way would be to select individual lines, so I added as a comment)

Код:

db "тест тест" ;replace "table1"

but not sure if it is really usable.

I'm not sure what you mean by those proposals.

Current sjasmplus reads source code in binary 8-bit mode, so whatever is inside double quotes in DB except few control codes which need to be escaped (`\0\n`) will be 1:1 assembled into machine code (I should probably do the test doing full 0..255 char string to be sure it works like that, yeah, I will add one).

So as ZX SW author, you have some encoding on mind (CP1251 or DOS-866 or some custom like 131 = "star" and 132-133-134 "group logo"), and you need to put those values (CP1251 azbuka chars or 131..134 bytes) into machine code, usually DB statement.

That's the result side. And the way how you edit those texts, for example Russian strings, is nice with UTF8 because of modern text editors using utf8 by default.

Current sjasmplus has this ENCODING directive, which makes possible auto-conversion from cp1251 to DOS-866 - it does nothing else, by default it does nothing, and if you do `ENCODING DOS` or use `--dos866` CLI option, any 128+ byte value in source code is transformed by hard-coded table converting CP1251 to DOS-866 (or "damaging" anything else, like UTF8).

So in case of Russian string, the task is "simple", do the utf8 -> cp1251 (or DOS-866) of source before the DB is assembled.

(BTW let's make clear one thing - I'm not going to add UTF8 support for symbol names, so anything outside of quotes does not need any extra support by sjasmplus, as anything non-ASCII is bug in source - the reasons are mostly overlapping with what I will write below, plus extra pain of sjasmplus processing source code heavily with custom implementation, not re-using common C++ library, so utf-8 symbols would cause major rewrite of parser - if somebody wants to do that, ok, but not me, the benefit of unreadable symbol names doesn't attract me at all, even if the code change would be simple)

But this "simple" case means the sjasmplus code will have to learn utf-8, and have the conversion table. And if there is Russian table, why not to add also some german, czech, etc...
... and you end up implementing `iconv` - which is by no means simple task, implementing utf-8 support correctly is major pain, I know of one C++ framework avoiding iconv and using custom code, and it took few years to polish that implementation enough to mostly work as expected.

And at that moment I don't see any benefit of putting iconv into sjasmplus, if I can call the iconv externally as I have shown in that example in previous post.

I can see some benefit of some magic directive which would allow me to define custom encoding, ie. that ★ is 130 and ☈☋☑ is 131,132,134, but I don't see any elegant and symple syntax for that, and the implementation again requires adding all the important parts of what iconv does, ie. understanding utf-8 encoding correctly.

So if you want just the "simple" utf-8 to cp1251 or dos-866 conversion, I'm failing to see why to bloat the sjasmplus code, and not call the external `iconv` as intermediate step before assembling. The result is same, but iconv is more robust and could handle all the common encodings, while sjasmplus will be always very limited in what it knows, unless I re-implement whole iconv into it (and go from ~300kB binary to ~5MB assembler).

- - - Updated - - -

So, I added the test to verify that "anything 8bit inside quotes (except the sensitive control codes) works":
https://github.com/z00m128/sjasmplus...t_encoding.asm

The "sensitive control codes" contains three values: 0, 10 and 13 ("\0\n\r" escape sequences within double-quotes).

All the other 0..255 values are assembled 1:1 to machine code.

So this part of sjasmplus works "as intended" and there's no extra bug involved or any issue.

The [utf8 text source] -> [8bit source] conversion is IMO lot easier to handle externally, and I'm slightly against adding this functionality into sjasmplus (you can of course try to change my mind, but I don't see enough arguments at this moment).

I can see the convenience of such addition, but considering the current size of sjasmplus code and its build-dependencies, I find it not worth of adding utf-8 support, especially as the external usage of `iconv` is trivial in case of non-custom encoding, and does cover LOT MORE than just Russian encodings.

Maybe on non-linux OS the benefit of built-in conversion would be even bigger (if they don't have easy tool like `iconv`), but then again, it's lot more easier to install linux and use it for assembling ZX project, than to modify sjasmplus sources, so my general advice is to use modern OS, and not to reinvent the wheel again and again inside sjasmplus just because some other SW is obsolete. :v2_dizzy_snowball:

- - - Updated - - -

One more note... I was even going to propose to call `iconv` from the source with SHELLEXEC (to convert some small "strings.asm" and then INCLUDE the converted one), but it turns out that's not so easy. The SHELLEXEC does execute only in the last third pass, so the converted strings are not available in earlier passes. I guess you can still do this in lua-script, calling `iconv` in first pass to generate "strings.8b.asm" (cp1251 encoded) from "string.asm" (utf8 - what you edit in editor), and then INCLUDE the converted file - let me know if you need example how to do this.

But I generally prefer to not use lua scripts in my asm, so I would instead rather create Makefile with the rule to produce that converted file before assembling of main project. :)

What about some kind of "translate" command, where you define two strings

Код:

source = "АБЦДЕФГ....." target = "ABCDEFG...."

And later in code

Код:

call print_string translate("АБЦД 123")

Sjasm is going to translate string char by char, and does put same symbol if it is not found in translator strings (like for 123 in example)?

Doesn't help that much until you implement utf-8 parsing. It could cover custom encodings 8bit -> 8bit, but then your syntax doesn't explain how you will define for example A -> 1, B -> 2, etc.. values which are not easy to enter into quotes.

Thinking about it, it's like two different issues. One is "utf-8 anything", and my answer is "no", I don't see how to add utf-8 support to sjasmplus without either adding lot of own code, or linking against some ICU-like library, but in either case raising the complexity and size of sjasmplus binary by whole order. And you can resolve the utf-8 by simply converting the source with `iconv` in the build script to some 8bit classic encoding, which then is assembled by sjasmplus correctly (I don't see anything problematic about this external way).

Second issue is "custom 8bit encoding" - I had to resolve few of these in my own ZX projects, and usually I enter the text in numbers or post-process the data by script written in sjasmplus macro. If the conversion would be even lot more complex, you can always do something very similar to what you propose with "translate(...)" in lua. So the status on this one is, that you can resolve it in current sjasmplus, but if somebody shows me more elegant syntax (to define custom encoding), I may implement that. Right now all the syntax I can imagine for such feature doesn't feel very attractive - I would have to study the docs before using it any way, to use it correctly, and in such case I could probably in similar time write the post-process macro changing the values in classic way in script code.

It just doesn't feel like I can add to sjasmplus something meaningful, what will help in most of the use cases and be easy to use, feels like I can add something what will work well for specific use-case, but will be mostly ignored by everyone else. Also I don't remember some nice solution from other assemblers to just copy it.

тот же bug что и в define

Код:

macro coord_x x var_x di : halt data_x nop sdfgj_x = var_x endm org $8000 coord_x ($5+1)

Код:

test.asm(8): error: Invalid labelname: var_($5 test.asm(17): ^ emitted from here test.asm(8): error: Unrecognized instruction: ) di test.asm(17): ^ emitted from here test.asm(8): error: Unexpected: ) di test.asm(17): ^ emitted from here test.asm(9): error: Invalid labelname: data_($5 test.asm(17): ^ emitted from here test.asm(9): error: Unrecognized instruction: ) nop test.asm(17): ^ emitted from here test.asm(9): error: Unexpected: ) nop test.asm(17): ^ emitted from here test.asm(10): error: Invalid labelname: sdfgj_($5 test.asm(17): ^ emitted from here test.asm(10): error: Unrecognized instruction: ) = var_($5+1) test.asm(17): ^ emitted from here test.asm(10): error: Unexpected: ) = var_($5+1) test.asm(17): ^ emitted from here

Код:

17 8000 coord_x ($5+1) 17 8000 > test.asm(8): error: Invalid labelname: var_($5 test.asm(17): ^ emitted from here test.asm(8): error: Unrecognized instruction: ) di test.asm(17): ^ emitted from here test.asm(8): error: Unexpected: ) di test.asm(17): ^ emitted from here 17 8000 >var_($5+1) di 17 8000 76 > halt test.asm(9): error: Invalid labelname: data_($5 test.asm(17): ^ emitted from here test.asm(9): error: Unrecognized instruction: ) nop test.asm(17): ^ emitted from here test.asm(9): error: Unexpected: ) nop test.asm(17): ^ emitted from here 17 8001 >data_($5+1) nop test.asm(10): error: Invalid labelname: sdfgj_($5 test.asm(17): ^ emitted from here test.asm(10): error: Unrecognized instruction: ) = var_($5+1) test.asm(17): ^ emitted from here test.asm(10): error: Unexpected: ) = var_($5+1) test.asm(17): ^ emitted from here 17 8001 >sdfgj_($5+1) = var_($5+1) 17 8001 >

и если define это можно простить
то для macro это вообще недопустимое поведение

хотя отдельный replaceallmacro именно с этим же функционалом не помешал бы

кстати по ходу это уже давно

Код:

sjasmplus-1.11.0 # file opened: test.asm test.asm(1): error: Invalid labelname: 6 0000 macro mcr x 7 0000 ~ 8 0000 ~ label_x = 1 9 0000 ~ 10 0000 endm 11 0000 15 0000 org $8000 16 8000 17 8000 mcr 4 17 8000 > 17 8000 >label_x = 1 17 8000 >

Код:

sjasmplus-1.12.0+ 6 0000 macro mcr x 7 0000 ~ 8 0000 ~ label_x = 1 9 0000 ~ 10 0000 endm 11 0000 15 0000 org $8000 16 8000 17 8000 mcr 4 17 8000 > 17 8000 >label_4 = 1 17 8000 >

yes, the 1.11.0 (and older) had inconsistent behaviour, sometimes substituting `label_x` with "x" macro argument, sometimes not (I don't remember exact details how to trigger it, but it was fairly trivial to modify your example a tiny bit, and it would start substituting the `label_x` also in 1.11.0 ... IIRC all it takes is to have for example another label: `xmax` which doesn't get substituted, but will affect `label_x` ... or something like that... it's now two years since I fixed it, I would have to check the old code to be sure how to trigger the old bug.

The 1.12+ does consistently substitute sub-word (every time) - what you see is "fix". :D

To fix your source in later one, don't use trivial argument names like `x` ... I personally suggest `x?` for macro arguments, or you can add underscore to prevent mid-word substitution like `_x` for macro argument.

Unfortunately the current state is based on huge misunderstanding. The original patch (to one of 1.07 RC versions I think) had bug, causing it to substitute `x` also in `label_x` when certain conditions were met (in the define-hashtable, collision on the first letter, affecting size of "bucket" and making the "x" found even in case it should have been ignored). And there was test in old test suite, testing the bugged behaviour!

So initially I was confused by the inconsistency, and I fixed it, to make the substitution to work always, but I fixed it the way how the old test was verifying it. And the defines/macro-args starting with underscore can substitute only at beginning of identifier, not in the middle.

Few versions later I finally understood how the patch was originally meant, it was supposed to do sub-word substitution in the opposite way, only with identifiers which do start with underscore. But unfortunately the original author of the patch didn't put any comments into the code, and didn't provide any tests, and later somebody added the wrong test testing the bugged behaviour. If the original author would document his idea, I would fix it the correct way. :v2_confu: