1

I have a task to calculate how many characters are in a .txt file whose name the user enters and edit characters if needed. I am new at Assembly x86 so I need some help with file reading and symbols reading in a file.

As my code below shows I use int 21,3d to open the file and int 21,3f to read the file. But I don't understand how to read symbols from file correctly, because if I have 100 random symbols in my txt file, how to read one by one and count them all?

My code:

.data
fname_input db 255,?,255 dup("$")
buff db 255,?,255 dup("$")
endl     db 13,10,"$"
handle dw ?

.code
start:

    mov dx, @data
    mov ds, dx

    mov ah, 0Ah
    mov dx, offset fname_input ;input put in to buffer
    int 21h
    
    mov ah, 3fh 
    mov al, 00 ;only read
    mov dx, offset fname_input ; name of the file to open
    int 21h
    
    mov ah,3fh 
    mov bx,[handle]
    mov cx,4             
    mov dx,offset buff
    int 21h
    
    mov ax, 4c00h ;exit
    int 21h 

end start
NewAtC
  • 55
  • 7
  • 1
    The `buff` you have is for reading from console via int21/0a. For reading from a file you only need a single byte and read that in a loop. It's unclear if you even need to read the file contents, don't you just need the file size? – Jester Nov 25 '21 at 01:03
  • 1
    "Symbols" is not a technical term in this context. Files are just binary blobs where everything is either an octet byte or some multiple of - or fraction of - a hardware word. – Dai Nov 25 '21 at 02:16
  • AH=42h is seek, you can use it to go to the end of the file. http://spike.scu.edu.au/~barry/interrupts.html#ah42 says it returns the file position in DX:AX, which would be the length after seeking to the end. (This is the DOS equivalent of C stdio `fseek(SEEK_END)` / `ftell`, or of Unix `lseek(fd, SEEK_END)`). You don't need to actually read each byte one at a time, or into a buffer, to find the file's length. Finding the length of a 1 GiB file can be done just as quickly as for a 2-byte file. – Peter Cordes Nov 25 '21 at 03:03
  • @Jester I need file size, but also I have to modife file contest and that's why I need to scan a single byte till EOF reached. – NewAtC Nov 26 '21 at 07:19
  • @PeterCordes Thanks, but as I can understand I can use AH=42h to scan each byte in file? Because I need to modify the file. – NewAtC Nov 26 '21 at 07:26
  • 1
    No, of course if you need to modify the file's data, you need to read it. The efficient way to do that is to read a whole block of the file (or the whole file if it can't be huge) into memory, loop over the buffer, then write it back out. It's very inefficient to read 1 char at a time, seek back, and overwrite it. Everyone assumed you might only need the file length because you described an inefficient way to get that. – Peter Cordes Nov 26 '21 at 08:03
  • @NewAtC I could answer your question today, but why would I do so? You did not respond to my previous answers (https://stackoverflow.com/questions/69916450/how-to-skip-spaces-at-the-beginning-of-the-string-in-assembly-x86 on Nov 10 and https://stackoverflow.com/questions/69924339/dont-get-full-buffer-with-dos-input-string-output-string-calls on Nov 11). You could have commented, voted, or accepted. – Sep Roland Nov 26 '21 at 15:26
  • @SepRoland Sorry, maybe I missed your answers, or I used previous asnwers from other users. I read your aswers, thanks for your help. – NewAtC Nov 29 '21 at 23:11
  • @PeterCordes what interrupt should I use when to read a block of data from file? – NewAtC Nov 29 '21 at 23:27
  • 1
    `int 21h`, the DOS interrupt, with one of the service numbers for a read-file function. http://spike.scu.edu.au/~barry/interrupts.html – Peter Cordes Nov 30 '21 at 00:52

1 Answers1

1

Corrections to the code.

mov ah, 3fh 
mov al, 00 ;only read
mov dx, offset fname_input ; name of the file to open
int 21h

mov ah,3fh 
mov bx,[handle]
mov cx,4             
mov dx,offset buff
int 21h
  • It's maybe a typo, but the DOS.OpenFile function is 3Dh (so not 3fh)
  • The filename is not at the address of offset fname_input. That's where you defined the input structure for the DOS.BufferedInput function 0Ah.
    The actual filename starts 2 bytes higher up in memory, and for now is terminated by the code 13. You must change this code to 0 before you can present this to the DOS.OpenFile function.
  • You must never omit checking for any errors reported by DOS!
  • Your DOS.ReadFile function 3Fh uses the handle variable even before you initialized it!

Way to solve the task

The simplest(1) solution will read the file one byte at a time, until the read function reports it could not fulfil the request. That will happen at file's end.
For every byte you receive, you can increment a counter for establishing the file length, and if you find that the byte needs changing, then you can set the file pointer one position back and write the new character code to the file. Because you not only need read access to the file, you'll have to ask DOS for read/write access when you open the file.

    mov  si, offset TheBuffer
    mov  word ptr [si], 0050h    ; Set both lengths for DOS.BufferedInput
    mov  dx, si
    mov  ah, 0Ah                 ; DOS.BufferedInput
    int  21h
    xor  bx, bx
    mov  bl, [si + 1]            ; Length of the filename
    mov  [si + 2 + bx], bh       ; Changing carriage return 13 into zero-terminator 0

    lea  dx, [si + 2]            ; ASCIIZ Filename
    mov  ax, 3D02h               ; DOS.OpenFile for read/write
    int  21h                     ; -> AX CF
    jc   ERROR
    mov  [handle], ax

MainLoop:
    mov  dx, offset TheBuffer
    mov  cx, 1
    mov  bx, [handle]
    mov  ah, 3Fh                 ; DOS.ReadFile
    int  21h                     ; -> AX CF
    jc   ERROR
    cmp  ax, cx
    jb   EOF

    ...

    jmp  MainLoop

EOF:
    mov  bx, [handle]
    mov  ah, 3Eh                 ; DOS.CloseFile
    int  21h                     ; -> AX CF
    
    mov  ax, 4C00h               ; DOS.Terminate
    int  21h 

TheBuffer db 512 dup (0)

At the ellipsis in the above code snippet, you can do anything you need to do with that one byte that you received.
In order to set the filepointer one position back so you can update the file with the new character that you prepared in TheBuffer, you need to use the DOS.MoveFilepointer function 42h. Use it with a 32-bit offset of -1 in CX:DX.

    mov  dx, -1
    mov  cx, -1
    mov  bx, [handle]
    mov  ax, 4201h               ; DOS.MoveFilepointer from current position
    int  21h                     ; -> DX:AX CF
    jc   ERROR

    mov  dx, offset TheBuffer
    mov  cx, 1
    mov  bx, [handle]
    mov  ah, 40h                 ; DOS.WriteFile
    int  21h                     ; -> AX CF
    jc   ERROR

(1) A solution that reads more than 1 byte at a time will be more efficient, albeit somewhat more involved. In such case defining a buffer of 512 bytes is best. It nicely matches the disk sector size and the buffers that DOS maintains.

Sep Roland
  • 26,423
  • 5
  • 40
  • 66