мейки умеют плавающую запятую?
там логарифмы, степеня?
Вид для печати
https://github.com/ped7g/ZXSpectrumN...acked.i.asm#L3
There was some ongoing discussion about packing 5bit 0..31 values (aka "strings" with limited charset) to bitstream, and how long the decoding routine has to be.
From the initial 22B by Busy we went down to 19B (thanks to Baze and Zilog), and I collected it all into my "code snippets" project (it's targetting ZX Next, but the particular 5b-decoding is classic Z80 only).
There's also sjasmplus macro to encode regular string literals... I'm so glad I did add WHILE in v1.18.0 ... it's becoming really handy :D (I could used `DUP inputdata_size` in this case too, but WHILE reads better to me, to understand what is happening).
Have fun. :) (I have some suspicion this is "old news" for you, but consider there are many newcomers trying to learn Z80 programming, the snippets are targetting those, as examples of small routines)
there are several different approaches possible.
One MSX project I have seen sources of does this:
using `--longptr` option to have 24bit address space and `output -> outend` to produce binary 2MB file in one go (assembling machine code sequentially, emitting the file byte by byte). Then the mapping looks like `ld a,address>>14 ; bank num from long address` and `ld hl,address&$3FFF` to have offset into it (+ adjust to particular target area in memory) ... or something like that, I hope I remember it correctly.
I would personally go the virtual DEVICE route, making the virtual mapping match my runtime mapping, and using 16b labels (with page info):
prepare full 1/2 MB memory state, and then SAVENEX to the "NEX" file which works with this pattern well.Код:DEVICE ZXSPECTRUMNEXT ; Next has 8ki pages
MMU 0 n, 26, $0000 ; map page 26 into $0000..$1FFF , org $0000, enable auto-wrap
spriteData1: incbin "spr1.bin" ; spriteData1 == $0000, $$spriteData1 == 26 (page num)
spriteData2: incbin "spr2.bin"
...
ORG $8000
; runtime mapping of data2 into $0000..$3FFF area (in case it does cross 8ki boundary while incbin)
nextreg $50,$$spriteData2 ; maps $0000..$1FFF to page of spriteData2 symbol
nextreg $51,$$spriteData2+1 ; maps $2000..$3FFF to page+1 (to cover for 8ki cross)
ld hl,spriteData2 ; some 0000-1FFF offset to the beginning of the data
For multi-load one can prepare each block in virtual memory, SAVEBIN/SAVEDEV it, then ORG back at the beginning and start over with second part...
All of these DEVICE "SAVE" directives are saving *current* state - local to their position in code, so you can produce different binary blocks from the same virtual memory area in one assembling - contrary to OUTPUT way, where you are emitting stream of bytes, making it quite tricky to "go back" (it's possible by opening the OUTPUT for rewrite, and then FPOS to position where you want to overwrite data - possible, but cumbersome a bit)).
The script few messages back to merge several bin blocks together into one big file using OUTPUT + INCBIN should definitely work (if small parts are less than 64kiB, then you will not even cross $10000 boundary, so no warning, but even with larger files it should work.. or you can avoid warning by some other trick, like INCBIN into MMU wrap-around page while also having OUTPUT active, or by using separate asm file without code, just to build the final big ROM, being assembled with --longptr switch).
....
But it also depends what you need at runtime, and how you plan to address those data... The example in this answer is slightly Next-specific for NEX file output, where I know the loader ".nexload" did already load all banks into memory, and I can just switch banks and data are there. Although I believe the principle does apply also to regular zx128 (but loader is up to you).
- - - Updated - - -
If you are just asking the basic "can sjasmplus produce 1MB ROM files?", then the answer is simple yes.
It should not only be possible, but it should feel like well supported, by multiple possible approaches. If you hit some issue with that, surely report it.
If you have issue to choose initial approach, then describe better what you are trying to do, and add some examples what you expect while writing runtime code, and what is the form of input data you want to assemble together. :)
- - - Updated - - -
Thinking about it second time, while answering the stuff above... you can also use DEVICE with large-enough memory, INCBIN everything into memory into continuous memory block, and SAVEDEV to big file. (I forgot to mention SAVEDEV before)
@Ped7g just wanted to thank you for your continuing support of this assembler :) I believe you've spent hundreds of hours of your free time on it. Some people here were somewhat harsh, but you were always nice and sympathetic. Thank you! And keep up the good work :)
Yes, it's saving bytes from DEVICE memory, which is not used in first passes during assembling, so it's completely zeroed in pass 1/2, and there's nothing to save.
The machine code production happens in third pass (also for OUTPUT/OUTEND/SAVETAP/... all the others).
(production = writing it... the machine code is "dry-simulated" in first 2 passes, ie. the assembler calculates how many bytes the instruction takes, to adjust labels, but then it throws away the opcode)
So in recent weeks I was abusing sjasmplus script a lot (as a form of test/exercise of the scripting implementation, some of the experience went into improving it just before v1.18.0 release)...
If you are serious about sjasmplus scripting, maybe you can pick up a trick or two from these:
https://github.com/ped7g/adventofcode
Код:
tmp_cnt = 0
dup 10
if ((tmp_cnt = 0) || (tmp_cnt = 5) || (tmp_cnt = 7) || (tmp_cnt = 8))
display tmp_cnt," skip"
else
display tmp_cnt," ok"
endif
tmp_cnt = tmp_cnt +1
edup
display " "
tmp_cnt = 0
dup 10
if ((tmp_cnt != 0) || (tmp_cnt != 5) || (tmp_cnt != 7) || (tmp_cnt != 8))
display tmp_cnt," ok"
else
display tmp_cnt," skip"
endif
tmp_cnt = tmp_cnt +1
edup
display " "
tmp_cnt = 0
dup 10
if ((tmp_cnt =! 0) || (tmp_cnt =! 5) || (tmp_cnt =! 7) || (tmp_cnt =! 8))
display tmp_cnt," ok"
else
display tmp_cnt," skip"
endif
tmp_cnt = tmp_cnt +1
edup
display " "
tmp_cnt = 0
dup 10
if ((tmp_cnt <> 0) || (tmp_cnt <> 5) || (tmp_cnt <> 7) || (tmp_cnt <> 8))
display tmp_cnt," ok"
else
display tmp_cnt," skip"
endif
tmp_cnt = tmp_cnt +1
edup
почему не работает != ? o_OКод:> 0x0000 skip
> 0x0001 ok
> 0x0002 ok
> 0x0003 ok
> 0x0004 ok
> 0x0005 skip
> 0x0006 ok
> 0x0007 skip
> 0x0008 skip
> 0x0009 ok
>
> 0x0000 ok
> 0x0001 ok
> 0x0002 ok
> 0x0003 ok
> 0x0004 ok
> 0x0005 ok
> 0x0006 ok
> 0x0007 ok
> 0x0008 ok
> 0x0009 ok
>
> 0x0000 ok
> 0x0001 skip
> 0x0002 skip
> 0x0003 skip
> 0x0004 skip
> 0x0005 skip
> 0x0006 skip
> 0x0007 skip
> 0x0008 skip
> 0x0009 skip
>
test.asm(52): error: Syntax error: > 0) || (tmp_cnt <> 5) || (tmp_cnt <> 7) || (
tmp_cnt <> 8))
test.asm(52): error: ')' expected
почему =! дает какой то результат?
Looks all correct to me?
The first part is doing "skip" when cnt is 0 or 5 or 7 or 8
The second part is doing "ok" every time (first two "(cnt is not 0) or (cnt is not 5)" are enough to be always true for any cnt value)
The third part "=!" is parsed as: "=" is equivalence operator, "!" is logical not, so 0 becomes -1, and non-zero values become 0
That leads to "ok" when cnt is -1 or 0
Fourth is just syntax error, the "<>" operator doesn't exist in sjasmplus.
Did you want in second block negation of first block? Then: !(A || B) = (!A) && (!B)
(note the "logical or" becomes "logical and" when whole expression is negated)
Код:tmp_cnt = 0
dup 10
if ((tmp_cnt != 0) && (tmp_cnt != 5) && (tmp_cnt != 7) && (tmp_cnt != 8))
display tmp_cnt," ok"
else
display tmp_cnt," skip"
endif
tmp_cnt = tmp_cnt +1
edup
logical or (T = true [-1 in sj], F = false [0 in sj]):
F || F = F
T || F = T
F || T = T
T || T = T
logical and:
F && F = F
T && F = F
F && T = F
T && T = T
logical not:
!F = T
!T = F
And the general math rules for propagating logical not in complex expressions:
!(A < B) <=> A >= B
!(A == B) <=> A != B
!(A || B) <=> !A && !B
!(A && B) <=> !A || !B
...
https://en.wikipedia.org/wiki/Negation#Distributivity
sjamplus treats any non-zero value as true, and zero value as false, but when boolean-true is calculated, it is represented by ~0 (-1).
value = 4 ; value = 4
value = !4 ; value = 0
value = !!4 ; value = -1 (true)
( "людское ponimanje" => no idea what you are talking about, the math way is the only way I think about logic, even when talking in human language ... but there are many people not able to correctly negate predicate ... for example they ask you if you like blue colour, and you say "no", they will think you hate blue (which is incorrect negation "!like" is not "hate")) :)
не знаю как у вас
у нас когда люди говорят
если (if)
A не равно 1
или (or)
A не равно 5
тогда (then)
Ы = 7
если нет (else)
тогда Ы = 0
это имеется в виду (и это правильно)
и ожидаемый результатКод:if A <> 1
if A <> 5
Ы = 7
else
Ы = 0
endif
else
Ы = 0
endif
1 Ы=0
2 Ы=7
3 Ы=7
4 Ы=7
5 Ы=0
такое никогда не имеют в виду
и такой рeзультатКод:if A <> 1
temp1 = 1
else
temp1 = 0
endif
if A <> 5
temp2 = 1
else
temp2 = 0
endif
if temp1 = 1
Ы = 7
else
if temp2 = 1
Ы = 7
else
Ы = 0
endif
endif
1 temp1=0 or temp2=1 = Ы=7
2 temp1=1 or temp2=1 = Ы=7
3 temp1=1 or temp2=1 = Ы=7
4 temp1=1 or temp2=1 = Ы=7
5 temp1=1 or temp2=0 = Ы=7
совершенно неожиданный
людскими словами описывается совершенно другой процесс
а такой вариант правильный только с точки зрения машинной логики
и людскими словами описать его намного сложней (это будет достаточно много слов)
к вам претензий нет
просто это не совсем очевидный аспект
пофиксил
поэтому и нужон долбанный список условий по типу if a=1,5,6
а не трехэтажные IF-ы
в которых можно чего то просмотреть
или нужны элементарные for и goto
чтоб не городить огород из dup-ов и if-ов
ТУПО пытаясь повторить их функционал
додо знаю щас ТЫ начнешь мне расказывать что
компилятор не должен это делать
это должен делать cpp, линкер, хренкер прочая ересь
и весь исходник должен состоят из батников
а компилятор должен только заменять nop на $00 и ничего более
и даже не должен делать переходы на метки вперед
и все метки для переходов должны быть предварительно объявлены в начале исходника
а то ж ведь компилятор еще не знает куда ему переходить когда код выглядит так
jp l1
l1
:v2_tong2:
yeah, we say "if not equal to 1 and not equal to 5". I guess some people could use "or" there, but they are just using the language wrong, or not understanding the logic expressions. (mind you, while I didn't finish my university studies, I made it far enough to have all exams from math basics, like logic evaluation, so for me it's easy to use the language correctly in logic constructs ... but from my experience with people who didn't study it, it's common they don't understand the rules correctly, and say the wrong thing, while they think something else - very common by non-math people, it's not easy to formulate the rule correctly in human language, even by intelligent people, if they are missing the formal math teaching).
But I think your example is actually correct if read in certain way, like this: "if A not equal to (1 or 5), then B = 7, else B = 0" -> now this makes some sense also from math point of view, a bit like incomplete distribution of negation, so you end with something like "IF A != (1 || 5)", but if you expand it to simpler statements, it needs the || negation, so "IF A != 1 && A != 5"
Any way, you can in sjasmplus source always try to pick the "simpler to read", so if you are more familiar with "1 or 5", then do the positive form and negate the final result, like this:
Код:IF !(A == 1 || A == 5 || A == 7)
;; other values of A (not 1 or 5 or 7)
ELSE
;; when A is 1 or 5 or 7
ENDIF
;; notice the "!" at beginning of the expression
;; flipping the final result of the test
Ped7g, а можно добавить режим
чтоб при перезаписи выдавались предупреждения?
чтоб такой код
выдавалКод:org $C000
xor a
xor b
xor c
xor d
org $C001
xor e
warning $C001 was overwriten
ну и если не заводить отдельный массив для этого
то хотя бы проверять на $00 перед записью
nop-ов не так много в коде
и лучше хоть какая то проверка чем никакой...
Current status: no such feature
Future:
- not as default for sure, because there are projects using the pattern of device memory re-use to build different binary blocks at the same position (savebin/savetap between).
- as option: possible... sounds like some work, to develop some simple and fast code for it, as there is no simple test (except having usage-bitmap for whole device memory, but that sounds ugly. But maybe there is some other way, maybe some lists of ranges and checking for overlaps.
- workaround: use less `ORG`, use more `ds/align` to advance to next block of code, use ASSERT or MMU with "w" or "e" mode to guard you against making current section bigger than what you expected, etc...
But it sounds like non-trivial thing to add, and also you should be able to work-around it with ASSERT quite well. Switching one option ON would be even better, but doesn't feel to me like critical feature needed ASAP.
But I will keep it on mind... (maybe add enhancement-issue on github so I don't forget about it :D )
это нужно только для отладочных проверочных целей
так как не всегда ясно правильно ли все расположено в памяти или нет
а проверить нет возможности
у меня 100% org-ов генерируется при помощи lua
а на сколько это правильно происходит в сочетании с макросами переменной длинны не ясно
- - - Добавлено - - -
они не рационально расходуют память
вот такая ошибочная конструкция не выдает ошибку и не генерирует никакой код
все что видно на экранеКод:macro test
test
endm
org $8000
test
и найти эту ошибку трудноЦитата:
с таким же успехом не работает такой не правильный код
Код:macro exx
exx
endm
org $8000
exx
а вот такая конструкция прекрасно рабоатет
я длительное время подобным пользовался и даже не замечал что это зацикленный макрос :)Код:macro EXX
exx
endm
org $8000
EXX
For stuff like this - intentionally wanting to substitute legal instruction by your own macro:
you can use the "@" operator to prevent macro matching, ie. `@exx` inside macro definition to do real Z80 `exx`, and in your code you can then:Код:macro exx
; exx ; infinite cycle
@exx ; Z80 exx, avoiding infinite cycle of macro calling
endm
The error messages and state recovery in case of infinite loops in DEFINE/MACRO definitions is really bad, sometimes it will raise max-nesting limit, sometimes it will just crash the assembler - I know about it, but I have currently no mood to work on this stuffКод:exx ; does the macro
@exx ; does the Z80 exx even if "exx" macro exists
^^ this needs just few changes to be legit code, for example:Код:macro test
test
endm
org $8000
test
^^^ this is valid source, and will compile fine, but I don't see how you would consider it "different" from yours infinite example (except hitting some guardian-value to notice it's in cycle too much).Код:macro test
nesting = nesting - 1
IF 0 < nesting
test
ENDIF
endm
org $8000
nesting = 4
test
So it's not so easy to guard against it... there are some protections in sjasmplus which should catch infinite substitutions and loops, but they are not working that good at this moment, sorry. There's a room for improvement in this area, I agree.
https://github.com/z00m128/sjasmplus...es/tag/v1.18.1
- Big-Endian hosts support (experimental and not tested continuously)
- added "listall", "listact" commands to OPT - to switch between listing types
- WHILE has optional argument to set explicit guardian-counter
- ASSERT has optional argument (to add description/notes for expression)
- SLOT and MMU will now accept also starting address of slot instead of its number
- fix: option --sym was not exporting labels starting with underscore
- fix: SAVENEX BMP-loader bug when certain builds of sjasmplus were unable to open BMP files
- fix: after STRUCT instance the "main" label is not polluted by last field of STRUCT
- minor bugfixes in parser, windows cmake-builds have now icon
- docs: adding "Index" section
- docs: adding some missing information (__DATE__, __TIME__), fixing HTML anchor names
Код:13 0007 DD 8C ADC A,ixh ;* DD8C
14 0009
all_ops.asm(15): error: Label not found: IXh
15 0009 CE 00 ADC A,IXh ;* DD8C
yes, instructions and registers have to be same-case in sjasmplus. Labels are case sensitive.
такой стиль написания используется в
The Undocumented Z80 Documented (Sean Young)
да и местами в оригинальной документации
https://jpegshare.net/images/97/92/9...75351bf1e2.png
как то не хорошо получается...
- - - Добавлено - - -
я сначала подумал, что не все undoc opcodes поддерживаются...
I could be nitpicking that you are pointing onto "operation description", not source-syntax (that's "LD IY, (nn)")...
but I guess you are somewhat right, even the "IYh" in "operation" is unfortunate, and I see how you could expect sjasmplus to accept it.
But official documentation does only describe instruction, it does not define full syntax of "assembly" language, and it's quite common for assemblers to fill the gaps with their own rules. The same-case rule for instructions and registers is something I actually like, because then PascalCaseLabels can't be misunderstood as instructions by accident.
The most disturbing thing about your post is, that `OPT --syntax=i` will not accept "IYh" either, i.e. `Ld IYh,3` will fail even with `--syntax=i`, when `Ld iyh,3` works...
So I'm trying to make my mind, if this is bug or not. (do you have opinion?)
Otherwise what you see is "works as designed", not a bug.
But if we are talking about *you* writing *new* sjasmplus source, please just use same-case and avoid --syntax=i, I don't see how having mixed-case instructions/registers makes it better, IMO such source is harder to read. I know this is strongly personal and subjective topic, but I would suggest for new source style:
I can see how adding options to sjasmplus helps with legacy sources you don't want to edit, but for new sources I would stick to *some* style (even if you don't like my suggestion above, feel free to modify it to your liking, but I have difficult time to accept that all-case instructions+registers are a problem, especially as when writing assembly the actual writing of instructions usually takes very little time, so fixing formatting of source is just few seconds, compared to hours of thinking and debugging).Код:ORG $879A ; uppercase directives
PascalCaseLabelsWithColon:
ld iyh,3 ; lowercase instructions and registers
ld hl,12 ; HL = 12 (with uppercase reg in comments)
https://github.com/z00m128/sjasmplus...6ebadef44d6f9a
"parser: --syntax=i makes now also registers case insensitive"
Still NOT recommended (by me), but possible, if you insist on it, and switch on the `--syntax=i` mode.Код:... (listing)
12 0008 OPT --syntax=i ; test the syntax options "i"
13 0008 Label1:
14 0008 C3 08 00 Jp Label1
14 000B C3 08 00 jP Label1 ; instructions should be case insensitive
15 000E 00 00 Align 4
15 0010 aLiGN 4 ; directives too
16 0010 EB ex De,Hl ; registers should be also case insensitive
17 0011 DD 45 DD 4C ld b,IXl,c,IXh ; BTW this is actual way how Zilog describes half-ix regs
...
https://github.com/z00m128/sjasmplus...es/tag/v1.18.2
- new exist operator to check label existence
- the --syntax=i mode makes now also register parsing case insensitive
- minor bugfixes (predefined values, savenex BMP loader less strict about "colors used" content)
Мне одному кажется что эту тему давно следовало бы пришпилить в "Важно:" , наравне с "SjASMPlus Z80 кросс ассемблер" или даже как более актуальную ?
@Ped7g as a Linux user I am very interested in converting characters in "ENCODING" from UTF-8 to one of these (WIN/DOS). But unfortunately it does not work it seems. Is it possible to implement? Or maybe I am missing something here. Actually it would be nice to have a full-fledged iconv or similar, as now it is limited to russian encodings...
- - - Добавлено - - -
Да, готово. Просто ее постоянно поднимали, я и не заметил.
I'm also on linux and used to utf-8 everywhere, so I do get your idea, but I'm not sure what/how to add to sjasmplus.
As ZX asm programmer I expect everything to be 8bit, so the utf8 -> 8bit conversion makes sense, but by what rules? In our ZX demos we often use custom encoding, not even some old DOS or win CP page, but completely custom one, so no generic tool will help much with that.
Maybe with Russian texts you have less mixing and use only few encodings, but I'm still not sure how to define one.
And finally the sjasmplus project currently doesn't contain any utf-8 implementation, so if I would link against iconv/similar, it will grow the dependencies list, so it should be rather something very well working feature, to make that cost worth it.
BTW if you need some standard win/dos encoding, I guess you can add into your makefile/build script the pre-build step using iconv itself, for example having .asm in utf8, and "building" .a80 files by implicit makefile rule using iconv similar to this:
and then build the ".a80" with sjasmplus. If you use the utf-8 chars only within double quotes or comments, it should work.Код:$ echo "Мне одному кажется" | iconv -f UTF-8 -t CP1251 | hd
00000000 cc ed e5 20 ee e4 ed ee ec f3 20 ea e0 e6 e5 f2 |... ...... .....|
00000010 f1 ff 0a |...|
So under linux, for general conversion like utf-8 -> cp1251 I don't feel sjasmplus needs any change, you can easily work around that (I have no idea if other OS have iconv and other powerful tools, maybe it's more difficult on other systems).
But if you guys have some cool ideas, how to bring into sjasmplus something even better, something what would do even things which iconv can't cover and|or makes life easier for people who use completely custom encoding, let me know.
But I can't imagine any particular nice syntax for some new directive covering these special cases, and when I recently was helping on one sjasmplus project which was "scrambling" text strings with custom xor-scheme to make strings hidden from simple view, I did end writing macro using the {b adr} memory read, changing the regular `db "some text"` into final scrambled bytes in DUP-loop produced by encoding macro. So even many of custom encodings (following some simple formula for 90% of chars and having only few special rules) could be done quite easily in sjasmplus with post-process macros.
Feels to me like it's not very difficult to resolve any of this use-cases even with current sjasmplus, but I have difficult time to imagine change which would help and be also elegant and worth implementing. (except the obvious utf8->cp1251 internal conversion, but that feels to me a bit useless, as I can use `iconv` for that already).
- - - Updated - - -
edit: to make that command line more complete, including the sjasmplus... :D (just for my own amusement and test)
Код:$ echo "txt: db \"Мне одному кажется\"" | iconv -f UTF-8 -t WINDOWS-1251 | sjasmplus - --raw=- | hd
SjASMPlus Z80 Cross-Assembler v1.18.2 (https://github.com/z00m128/sjasmplus)
Pass 1 complete (0 errors)
Pass 2 complete (0 errors)
Pass 3 complete
Errors: 0, warnings: 0, compiled: 2 lines, work time: 0.001 seconds
00000000 cc ed e5 20 ee e4 ed ee ec f3 20 ea e0 e6 e5 f2 |... ...... .....|
00000010 f1 ff |..|
Thanks for your detailed reply :) Yes, sure, I know how to do it without sjasmplus, but I feel shame that we have commands which are not really usable :)
In russian texts we usually use native dos or win russian encoding, custom encoding is also used, but it is not really common nowadays, when everyone have access to PC and code with sjasm, why invent something else :)
Only idea I have then is custom table approach (so basically a replacement list), so have it like:
or maybe this (but not sure what best way would be to select individual lines, so I added as a comment)Код:replace_start "table1"
db "тест тест"
replace_end
but not sure if it is really usable.Код:db "тест тест" ;replace "table1"
I'm not sure what you mean by those proposals.
Current sjasmplus reads source code in binary 8-bit mode, so whatever is inside double quotes in DB except few control codes which need to be escaped (`\0\n`) will be 1:1 assembled into machine code (I should probably do the test doing full 0..255 char string to be sure it works like that, yeah, I will add one).
So as ZX SW author, you have some encoding on mind (CP1251 or DOS-866 or some custom like 131 = "star" and 132-133-134 "group logo"), and you need to put those values (CP1251 azbuka chars or 131..134 bytes) into machine code, usually DB statement.
That's the result side. And the way how you edit those texts, for example Russian strings, is nice with UTF8 because of modern text editors using utf8 by default.
Current sjasmplus has this ENCODING directive, which makes possible auto-conversion from cp1251 to DOS-866 - it does nothing else, by default it does nothing, and if you do `ENCODING DOS` or use `--dos866` CLI option, any 128+ byte value in source code is transformed by hard-coded table converting CP1251 to DOS-866 (or "damaging" anything else, like UTF8).
So in case of Russian string, the task is "simple", do the utf8 -> cp1251 (or DOS-866) of source before the DB is assembled.
(BTW let's make clear one thing - I'm not going to add UTF8 support for symbol names, so anything outside of quotes does not need any extra support by sjasmplus, as anything non-ASCII is bug in source - the reasons are mostly overlapping with what I will write below, plus extra pain of sjasmplus processing source code heavily with custom implementation, not re-using common C++ library, so utf-8 symbols would cause major rewrite of parser - if somebody wants to do that, ok, but not me, the benefit of unreadable symbol names doesn't attract me at all, even if the code change would be simple)
But this "simple" case means the sjasmplus code will have to learn utf-8, and have the conversion table. And if there is Russian table, why not to add also some german, czech, etc...
... and you end up implementing `iconv` - which is by no means simple task, implementing utf-8 support correctly is major pain, I know of one C++ framework avoiding iconv and using custom code, and it took few years to polish that implementation enough to mostly work as expected.
And at that moment I don't see any benefit of putting iconv into sjasmplus, if I can call the iconv externally as I have shown in that example in previous post.
I can see some benefit of some magic directive which would allow me to define custom encoding, ie. that ★ is 130 and ☈☋☑ is 131,132,134, but I don't see any elegant and symple syntax for that, and the implementation again requires adding all the important parts of what iconv does, ie. understanding utf-8 encoding correctly.
So if you want just the "simple" utf-8 to cp1251 or dos-866 conversion, I'm failing to see why to bloat the sjasmplus code, and not call the external `iconv` as intermediate step before assembling. The result is same, but iconv is more robust and could handle all the common encodings, while sjasmplus will be always very limited in what it knows, unless I re-implement whole iconv into it (and go from ~300kB binary to ~5MB assembler).
- - - Updated - - -
So, I added the test to verify that "anything 8bit inside quotes (except the sensitive control codes) works":
https://github.com/z00m128/sjasmplus...t_encoding.asm
The "sensitive control codes" contains three values: 0, 10 and 13 ("\0\n\r" escape sequences within double-quotes).
All the other 0..255 values are assembled 1:1 to machine code.
So this part of sjasmplus works "as intended" and there's no extra bug involved or any issue.
The [utf8 text source] -> [8bit source] conversion is IMO lot easier to handle externally, and I'm slightly against adding this functionality into sjasmplus (you can of course try to change my mind, but I don't see enough arguments at this moment).
I can see the convenience of such addition, but considering the current size of sjasmplus code and its build-dependencies, I find it not worth of adding utf-8 support, especially as the external usage of `iconv` is trivial in case of non-custom encoding, and does cover LOT MORE than just Russian encodings.
Maybe on non-linux OS the benefit of built-in conversion would be even bigger (if they don't have easy tool like `iconv`), but then again, it's lot more easier to install linux and use it for assembling ZX project, than to modify sjasmplus sources, so my general advice is to use modern OS, and not to reinvent the wheel again and again inside sjasmplus just because some other SW is obsolete. :v2_dizzy_snowball:
- - - Updated - - -
One more note... I was even going to propose to call `iconv` from the source with SHELLEXEC (to convert some small "strings.asm" and then INCLUDE the converted one), but it turns out that's not so easy. The SHELLEXEC does execute only in the last third pass, so the converted strings are not available in earlier passes. I guess you can still do this in lua-script, calling `iconv` in first pass to generate "strings.8b.asm" (cp1251 encoded) from "string.asm" (utf8 - what you edit in editor), and then INCLUDE the converted file - let me know if you need example how to do this.
But I generally prefer to not use lua scripts in my asm, so I would instead rather create Makefile with the rule to produce that converted file before assembling of main project. :)
What about some kind of "translate" command, where you define two strings
And later in codeКод:source = "АБЦДЕФГ....."
target = "ABCDEFG...."
Sjasm is going to translate string char by char, and does put same symbol if it is not found in translator strings (like for 123 in example)?Код:call print_string
translate("АБЦД 123")
Doesn't help that much until you implement utf-8 parsing. It could cover custom encodings 8bit -> 8bit, but then your syntax doesn't explain how you will define for example A -> 1, B -> 2, etc.. values which are not easy to enter into quotes.
Thinking about it, it's like two different issues. One is "utf-8 anything", and my answer is "no", I don't see how to add utf-8 support to sjasmplus without either adding lot of own code, or linking against some ICU-like library, but in either case raising the complexity and size of sjasmplus binary by whole order. And you can resolve the utf-8 by simply converting the source with `iconv` in the build script to some 8bit classic encoding, which then is assembled by sjasmplus correctly (I don't see anything problematic about this external way).
Second issue is "custom 8bit encoding" - I had to resolve few of these in my own ZX projects, and usually I enter the text in numbers or post-process the data by script written in sjasmplus macro. If the conversion would be even lot more complex, you can always do something very similar to what you propose with "translate(...)" in lua. So the status on this one is, that you can resolve it in current sjasmplus, but if somebody shows me more elegant syntax (to define custom encoding), I may implement that. Right now all the syntax I can imagine for such feature doesn't feel very attractive - I would have to study the docs before using it any way, to use it correctly, and in such case I could probably in similar time write the post-process macro changing the values in classic way in script code.
It just doesn't feel like I can add to sjasmplus something meaningful, what will help in most of the use cases and be easy to use, feels like I can add something what will work well for specific use-case, but will be mostly ignored by everyone else. Also I don't remember some nice solution from other assemblers to just copy it.
тот же bug что и в define
Код:macro coord_x x
var_x di : halt
data_x nop
sdfgj_x = var_x
endm
org $8000
coord_x ($5+1)
Код:test.asm(8): error: Invalid labelname: var_($5
test.asm(17): ^ emitted from here
test.asm(8): error: Unrecognized instruction: ) di
test.asm(17): ^ emitted from here
test.asm(8): error: Unexpected: ) di
test.asm(17): ^ emitted from here
test.asm(9): error: Invalid labelname: data_($5
test.asm(17): ^ emitted from here
test.asm(9): error: Unrecognized instruction: ) nop
test.asm(17): ^ emitted from here
test.asm(9): error: Unexpected: ) nop
test.asm(17): ^ emitted from here
test.asm(10): error: Invalid labelname: sdfgj_($5
test.asm(17): ^ emitted from here
test.asm(10): error: Unrecognized instruction: ) = var_($5+1)
test.asm(17): ^ emitted from here
test.asm(10): error: Unexpected: ) = var_($5+1)
test.asm(17): ^ emitted from here
и если define это можно проститьКод:
17 8000 coord_x ($5+1)
17 8000 >
test.asm(8): error: Invalid labelname: var_($5
test.asm(17): ^ emitted from here
test.asm(8): error: Unrecognized instruction: ) di
test.asm(17): ^ emitted from here
test.asm(8): error: Unexpected: ) di
test.asm(17): ^ emitted from here
17 8000 >var_($5+1) di
17 8000 76 > halt
test.asm(9): error: Invalid labelname: data_($5
test.asm(17): ^ emitted from here
test.asm(9): error: Unrecognized instruction: ) nop
test.asm(17): ^ emitted from here
test.asm(9): error: Unexpected: ) nop
test.asm(17): ^ emitted from here
17 8001 >data_($5+1) nop
test.asm(10): error: Invalid labelname: sdfgj_($5
test.asm(17): ^ emitted from here
test.asm(10): error: Unrecognized instruction: ) = var_($5+1)
test.asm(17): ^ emitted from here
test.asm(10): error: Unexpected: ) = var_($5+1)
test.asm(17): ^ emitted from here
17 8001 >sdfgj_($5+1) = var_($5+1)
17 8001 >
то для macro это вообще недопустимое поведение
хотя отдельный replaceallmacro именно с этим же функционалом не помешал бы
кстати по ходу это уже давно
Код:sjasmplus-1.11.0
# file opened: test.asm
test.asm(1): error: Invalid labelname:
6 0000 macro mcr x
7 0000 ~
8 0000 ~ label_x = 1
9 0000 ~
10 0000 endm
11 0000
15 0000 org $8000
16 8000
17 8000 mcr 4
17 8000 >
17 8000 >label_x = 1
17 8000 >
Код:sjasmplus-1.12.0+
6 0000 macro mcr x
7 0000 ~
8 0000 ~ label_x = 1
9 0000 ~
10 0000 endm
11 0000
15 0000 org $8000
16 8000
17 8000 mcr 4
17 8000 >
17 8000 >label_4 = 1
17 8000 >
yes, the 1.11.0 (and older) had inconsistent behaviour, sometimes substituting `label_x` with "x" macro argument, sometimes not (I don't remember exact details how to trigger it, but it was fairly trivial to modify your example a tiny bit, and it would start substituting the `label_x` also in 1.11.0 ... IIRC all it takes is to have for example another label: `xmax` which doesn't get substituted, but will affect `label_x` ... or something like that... it's now two years since I fixed it, I would have to check the old code to be sure how to trigger the old bug.
The 1.12+ does consistently substitute sub-word (every time) - what you see is "fix". :D
To fix your source in later one, don't use trivial argument names like `x` ... I personally suggest `x?` for macro arguments, or you can add underscore to prevent mid-word substitution like `_x` for macro argument.
Unfortunately the current state is based on huge misunderstanding. The original patch (to one of 1.07 RC versions I think) had bug, causing it to substitute `x` also in `label_x` when certain conditions were met (in the define-hashtable, collision on the first letter, affecting size of "bucket" and making the "x" found even in case it should have been ignored). And there was test in old test suite, testing the bugged behaviour!
So initially I was confused by the inconsistency, and I fixed it, to make the substitution to work always, but I fixed it the way how the old test was verifying it. And the defines/macro-args starting with underscore can substitute only at beginning of identifier, not in the middle.
Few versions later I finally understood how the patch was originally meant, it was supposed to do sub-word substitution in the opposite way, only with identifiers which do start with underscore. But unfortunately the original author of the patch didn't put any comments into the code, and didn't provide any tests, and later somebody added the wrong test testing the bugged behaviour. If the original author would document his idea, I would fix it the correct way. :v2_confu: