ARM-DSP Bridge

May 2007

Revision History
Revision 1.0May 10th, 2007Christiaan Baaij

First release

Revision 1.1July 5th, 2007Christiaan Baaij

First Minimal Implementation

Revision 1.2August 16th, 2007Christiaan Baaij

DMA transfers using circular buffer implemented


Table of Contents

1. Bridge Architecture
1.1. Memory Map
1.2. Boot Sequence
1.3. External Memory
2. Linux API
2.1. Overview
2.2. Bridge Control Device
2.3. Bridge Memory Device
2.4. Bridge Communication Device
3. DSP Bridge library
3.1. Overview
3.2. Bridge Communication
3.3. Memory Functions
3.4. String Functions
3.5. Framework
4. Bridge Control Utility
4.1. Overview
4.2. Commands
5. Building the bridge & C54x development
5.1. Building the kernel module
5.2. Building the DSP bridge library
5.3. Building the bridge control utility 'bridgectl'
5.4. Installing the bridge on the OSD
5.5. Using the bridge
5.6. C54x Development

List of Figures

1.1. Mapping of Internal DSP Memory

Chapter 1. Bridge Architecture

1.1. Memory Map

The C5409 based DSP on the DM320 architecture has 3 types of internal memory: Dual Access RAM (DARAM), Single Access RAM (SARAM) and On-Chip ROM. The DARAM and SARAM can be accessed from the ARM. The DSP has 32 KW (kilo words, 1 word = 16 bit) of DARAM that can be mapped into Program and Data Space simultaneously, 16KW of Page 1 Program SARAM and 16KW of Data SARAM. The DARAM can be mapped exclusively to Data Space by setting the OVLY to 0. On-Chip ROM is not seen in ARM memory space and will therefor not be used by the bridge.

Mapping of Internal DSP Memory

Figure 1.1. Mapping of Internal DSP Memory

1.2. Boot Sequence

The ARM downloads a specific boot image for the DSP to execute. There can be many boot images, one for every different task.

The boot sequence is as follows:

  • The ARM holds the DRST bit of the HPIBCTL register low for at least 2 DSP cycles to make the DSP go into reset. The ARM then brings it out of reset.

  • DSP status register PMST is initialized to move the vector table to 7F80h, all the interrupts are disabled except for INT0 and the DSP is set to IDLE1 mode.

  • While the DSP is in IDLE1, the ARM loads Program code and Data values to their respective memories

  • When the ARM finishes downloading the DSP code, it wakes up the DSP from IDLE1 mode by asserting INT0.

  • The DSP then branches to address 7F80h where the new interrupt vector table is located. The ARM should have loaded this location with at least a branch to the start code

1.3. External Memory

External SDRAM can only be accessed through the several DMA controllers. Unlike the original C5409 on which the DM320 DSP is based on, the DM320 DSP cannot access this external memory directly as described in C5409 and C54x documents. The DMA controller of the HPIB bridge can transfer data between internal DSP RAM and external SDRAM. Co-Processor DMA (COP) can transfer data between SDRAM and the internal RAM of the peripherals on the DM320 SoC (such as the image buffers).

Chapter 2. Linux API

2.1. Overview

The kernel driver would just be responsible for resetting and running DSP memory and allow access to the internal memory of the DSP to userland applications. This would mean that application developers have to design their own communication interface between ARM and DSP code. Example of such a communication interface would be a flag in the DSP internal memory that the DSP can set if it's done reading a buffer, and the ARM is able to poll so that it knows when to write new data in the buffer again.

2.2. Bridge Control Device

The DSP control device provides a DSP control API for Linux userland application. Applications can control DSP reset and perform other controls through this device. To see an example of its usage check out the source code for the bridge control utility.

2.2.1. ioctl()

Following ioctl commands are defined.

  • DSP_CTL_IOCRESET

  • DSP_CTL_IOCRUN

    Resets the DSP or releases the reset.

  • DSP_CTL_IOCINT0

    Sends an interrupt to the INT0 interrupt line of the DSP

  • DSP_CTL_IOCNMI

    Sends an interrupt to the NMI interrupt line of the DSP

2.3. Bridge Memory Device

The DSP memory device provides the access to the DSP memory space for the DSP program loader in Linux userland. The DSP program loader loads the DSP binary image to the DSP internal memories (i.e. DARAM and SARAM) through this device.

CAUTION: The Internal DSP memory is only 16bit addressable on the ARM: to resemble the DSP addressing. So reading and writing should be done 1 word at a time, not 1 byte at a time (the read and write call will just fail otherwise). Below is an example of how to read one 16bit word at address 0x264 and print it to the console.

2.3.1. read()

Reads data from DSP internal memory space

2.3.2. write()

Writes data to the DSP internal memory space

2.3.3. lseek()

Seeks in the DSP's internal memory

2.3.4. Example

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <fcntl.h>

#include "arm-dsp_bridge.h"

int main(int argc, char **argv) {
    int fd;
    unsigned short* buf;

    buf = (unsigned short *)malloc(sizeof(short));

    if ((fd = open(DSPMEMDEVNM, O_RDWR)) < 0) {
        perror("open memory device");
        return -1;
    }

    lseek(fd, 0x264, SEEK_SET);
    if (read(fd,buf,sizeof(short)) != sizeof(short)) {
        perror("read memory device");
        free(buf);
        return -1;
    }
    printf("value at word address 0x264: 0x%x\n", *buf);
    free(buf);

    return 0;
}

2.4. Bridge Communication Device

The bridge communication device is the Linux counterpart of the bridge library on the DSP. It supports the display of debug messages send from the DSP and reading and writing to developer-specified buffers on the DSP using DMA transfers. A circular buffer scheme (in SDRAM) on the Linux side provides asynchronous transfers between the ARM and the DSP. Because this device only supports transfers between the two memories it is non-seekable. At the end of this chapter there is an example of how to use the read() and write() calls.

2.4.1. poll() / select()

Check if read and/or write call will become blocking. A read call blocks if the circular read buffer is empty. A write call blocks if the circular write buffer is full.

2.4.2. write()

Writes data to the circular write buffer. A program on the DSP must make a _hpib_read call to transfer this data from the circular write buffer to a buffer on the DSP. This call becomes blocking if the write buffer is full, and stays blocking until the DSP reads the previous data. Because the write() uses DMA transfers internally, transfer sizes have to be a multitude of 4 bytes (32 bit) as demanded by the HPIB DMA controller (the write call will fail otherwise).

NB: You will not be notified if the data reaches the DSP correctly. So you will need to implement this functionality yourself if needed.

2.4.3. read()

Reads data from the circular read buffer. A program on the DSP must make a _hpib_write call to transfer data from a buffer on the DSP to this circular read buffer first. This call becomes blocking if the circular read buffer is empty, and stays blocking until the DSP places data in the buffer. Because the read() uses DMA transfers internally, transfer sizes have to be a multitude of 4 bytes (32 bit) as demanded by the HPIB DMA controller (the write call will fail otherwise).

2.4.4. Example

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <fcntl.h>

#include "arm-dsp_bridge.h"

int main(int argc, char **argv) {
    int fd;
    int i = 0;
    unsigned short* writebuf;
    unsigned short* readbuf;

    char str1[]="Hello World from the DSP: through the DMA void!!";
    char str2[]="ffffffffffffffffffffffffffffffffffffffffffffffff";
    /** 
     * Number of chars is an even number so transfer size will be a
     * multitude of 4 bytes
     */
    writebuf = (unsigned short *)malloc(strlen(str1)*sizeof(short));
    readbuf = (unsigned short *)malloc(strlen(str2)*sizeof(short));
    
    if ((fd = open(DSPCOMDEVNM, O_RDWR)) < 0) {
        perror("open communications device");
        free(writebuf);
        free(readbuf);
        return -1;
    }

    for(i=0; i < strlen(str1); ++i){
        writebuf[i]=str1[i];
    }

    if (write(fd,writebuf,strlen(str1)*sizeof(short)) != strlen(str1)*sizeof(short)) {
        perror("write com device");
        free(writebuf);
        free(readbuf);
        return -1;
    }

    if (read(fd,readbuf,strlen(str2)*sizeof(short)) != strlen(str2)*sizeof(short)) {
        perror("write com device");
        free(writebuf);
        free(readbuf);
        return -1;
    }

    for(i=0; i < strlen(str2); ++i){
        str2[i] = readbuf[i];
    }

    printf("Read from DSP: %s\n",str2);

    free(writebuf);
    free(readbuf);

    close(fd);

    return 0;
}

Chapter 3. DSP Bridge library

3.1. Overview

The bridge library provides very basic memory, string and debug functions. It also provides the DSP counterpart of the Linux bridge communication device. To use the bridge communication functions as DSP application has to be based on the accompanied framework. The framework and the library are currently written assembly, though future version might be done in C, depending if there will ever be a free compiler for the Texas Instruments C5409 DSP. Also note that there is no multi-process support at all at the moment, meaning only 1 single-threaded program can be loaded and run in DSP memory at a time.

3.2. Bridge Communication

3.2.1. hpibio.asm

Currently implements the counterparts of the read() and write() calls of the bridge communication device on the ARM (Linux) side. At the end of this section there's an example application using the _hpib_read and _hpib_write functions. These functions use the _hpiDma_transfer (for internal use only) to transfer the bytes between SDRAM and Internal DSP memory.

3.2.1.1. _hpib_read

  • Argument 1: Buffer location in internal DSP RAM

  • Argument 2: Count (in bytes) to transfer from circular write buffer to DSP buffer

  • Returns: Amount (in bytes) actually transferred

This function is the counterpart of the write() call of the bridge communication device on the ARM (Linux) side. This call blocks (goes into a tight loop) if the circular write buffer is empty and will stay blocking until the ARM writes data into the circular write buffer. The call also temporary blocks if there is still a DMA transfer in progress. The _handle_hpiDma_end function of the framework gets called whenever a transfer finishes (more on the framework later). Count has to be a multitude of 4 bytes or the function will just return saying it has read zero bytes.

NB: Due to the linear nature of the DMA transfer controller, a _hpib_read will only read to the 'end' of the circular buffer even if there are more filled bytes available at the 'beginning' of the circular buffer. This means _hpib_read will have to be called again (once the previous DMA transfer finishes) to read the remaining bytes. This can easily be implemented by filling in the handle_hpiDma_end function.

3.2.1.2. _hpib_write

  • Argument 1: Buffer location in internal DSP RAM

  • Argument 2: Count (in bytes) to transfer from circular write buffer to DSP buffer

  • Returns: Amount (in bytes) actually transferred

This function is the counterpart of the write() call of the bridge communication device on the ARM (Linux) side. This call blocks (goes into a tight loop) if the circular read buffer is full and will stay blocking until the ARM reads data from the circular read buffer. The call also temporary blocks if there is still a DMA transfer in progress. The _handle_hpiDma_end function of the framework gets called whenever a transfer finishes (more on the framework later). Count has to be a multitude of 4 bytes or the function will just return saying it has written zero bytes.

NB: Due to the linear nature of the DMA transfer controller, a _hpib_write will only write to the 'end' of the circular buffer even if there are more free bytes available at the 'beginning' of the circular buffer. This means _hpib_write will have to be called again (once the previous DMA transfer finishes) to write the remaining bytes. This can easily be implemented by filling in the handle_hpiDma_end function.

NB: You will not be notified if the data reaches the ARM correctly. So you will need to implement this functionality yourself if needed.

3.2.2. debug.asm

Currently implements only the _debug function.

3.2.2.1. _debug

  • Argument 1: Location of the message to send

  • Returns: nothing

This function sends a debug message to the ARM, it's delivery is however not assured. The string you send can be a maximum of 255 characters long. _debug uses the unsafe _strcpy function which won't stop copying until it finds a null character. So be certain that your strings are zero-terminated.

3.2.3. Example

;*-----------------------------------------------------------------------------*
;*  Main Program                                                               *
;*-----------------------------------------------------------------------------*
            .mmregs
;*--------------------------- YOUR CODE STARTS HERE ---------------------------*
            ; Any definitions or references you need to add go here
            .include "main.inc"
            .global _inputBuffer
            .bss    _inputBuffer,__INPUT_BUFFER_LEN,0,0

            .global _memset
            .global _debug
            .global _hpib_read
            .global _hpib_write

;*--------------------------- YOUR CODE ENDS HERE -----------------------------*
            ; External Functions
            .global _init_dspCom

            .text
            ; Internal functions
            .global _main
_main:
;*--------------------------- YOUR CODE STARTS HERE ---------------------------*
            ; Call any hardware initialization routines here

;*--------------------------- YOUR CODE ENDS HERE -----------------------------*
            CALL    #_init_dspCom           ; Initiliaze DSP Communication struct

;*--------------------------- YOUR CODE STARTS HERE ---------------------------*
            ; Your main program starts here

            ; Clear input buffer
            STM     #_inputBuffer,AR1       ; Starting at location of _dspComBuffer
            NOP
            PSHM    AR1                     ; Store it on the stack
            STM     #0,AR1                  ; Set value to 0
            NOP
            PSHM    AR1                     ; Store it on the stack
            STM     #__INPUT_BUFFER_LEN,AR1 ; Set length to __INPUT_BUFFER_LEN
            NOP
            PSHM    AR1                     ; Store it on the stack
            CALL    #_memset                ; Call memset function 

            ; Call read function
            STM     #_inputBuffer,AR1 
            NOP
            PSHM    AR1
            STM     #__INPUT_BUFFER_LEN,AR1
            NOP
            ; __INPUT_BUFFER_LEN is in words, so multiply by 2
            LDM     AR1,A
            SFTL    A,#1,A
            STLM    A,AR1
            PSHM    AR1
            CALL    #_hpib_read

            POPM    AR1                     ; Get return value from _hpib_read
            PSHM    AR1                     ; And put it back on the stack

            ; debug("Read finished")
            STM     #__SL1,AR1
            NOP
            PSHM    AR1
            CALL    #_debug
            
            ; debug(&_inputBuffer)
            STM     #_inputBuffer,AR1
            NOP
            PSHM    AR1
            CALL    #_debug

            ; Write back the content of _inputBuffer
            POPM    AR2                     ; Get return value from previous _hpib_read
            STM     #_inputBuffer,AR1
            NOP
            PSHM    AR1
            PSHM    AR2
            CALL    #_hpib_write

            POPM    AR1                     ; Get return value from _hpib_write         

            ; Just loop forever
MAIN_LOOP:
            B       MAIN_LOOP

;*--------------------------- YOUR CODE ENDS HERE -----------------------------*

            RET

            .sect   ".const"
;*--------------------------- YOUR CODE STARTS HERE ---------------------------*
            ; Put any strings here
__SL1:      .string "Read finished",0
;*--------------------------- YOUR CODE ENDS HERE -----------------------------*

3.3. Memory Functions

3.3.1. _memset

  • Argument 1: Location of the buffer

  • Argument 2: Value to set the buffer to

  • Argument 3: Length of the buffer

  • Returns: nothing

This function simply sets the buffer at 'Location' to 'Value' for 'Length' amount of words

3.4. String Functions

3.4.1. _strcpy

  • Argument 1: String Source Location

  • Argument 2: String Destination Location

  • Returns: nothing

Copies the string for 'Source' to 'Destination'. Note that this function is unsafe and won't stop copy until it finds a null character, so make sure your strings are zero-terminated. Defining strings as ".string <string>, 0 " does the trick.

3.5. Framework

To use the communication features your DSP application has to be based on the framework supplied with the bridge. It as just gives you a basic starting point.

3.5.1. boot.asm

This file gets called when the program is loaded and started. It sets the stack pointer and start your _main function. If your _main function isn't a loop, it will return to this file and will loop forever in the _exit function. If you want to change the stack size, this file is the place to do that; the current default stack size is 1000 words.

3.5.2. main.asm

This is where your _main function will reside and is called once the stack size is set. Any further hardware initialisations should occur here.

3.5.3. vectors.asm

This is the interrupt vector table of the C5409 DSP, currently only four interrupts are plugged with interrupt handlers: reset, nmi, int0 and hpib_dma. You should leave these interrupts as they are, unless you know what you're doing. These four interrupts are enabled by the _init_dspCom called in _main.

3.5.4. interrupthandler.asm

This file contains the functions _handle_int0 and _handle_nmi. If you want to implement these handles then you should uncomment the CONTEXT_SAVE and CONTEXT_RESTORE macro's and place the functionality between those two macro's.

3.5.5. hpidmahandler.asm

This interrupt handler gets called whenever a DMA transfer finished. This handler could for example start a new _hpib_read or _hpib_write function once a DMA transfer finishes. Be careful though that these functions block and that you're working in an interrupt context.

3.5.6. dspcom.inc

An include file used by the inner workings of the bridge communication functions. It describes certain constants and the communication structure.

3.5.7. contextswitch.inc

Contains the CONTEXT_SAVE and CONTEXT_RESTORE macro's (which basically push and pop all the registers)

3.5.8. linker.cmd

Linker command specification file for Binutils linker. It defines where the different memory sections will be placed in DSP memory.

3.5.9. Makefile

Basic makefile that builds a COFF2 executable of the framework files.

Chapter 4. Bridge Control Utility

4.1. Overview

The bridge control utility 'bridgectl' is used to load DSP programs into internal DSP memory and start/stop the DSP. The programs can either be in the form <program.bin> <interruptvector.bin> or <program.out>. Where .bin files are in a binary format generated by 'c54x-objcopy' and .out files are in COFF2 format generated either by the binutils c54x assembler or Texas Instruments Code Composer Studio.

4.2. Commands

Following bridge commands are specified:

  • start <COFF2-program.out>

    Resets the DSP, loads the specified <COFF2-program.out> (COFF2 format) into internal DSP memory and then releases the reset to start the program

  • stop

    Resets the DSP.

  • load <program.bin> <interruptvector.bin>

    Loads the <program.bin> (binary format) at address 0x80 and the <interruptvector.bin> (binary format) at address 0x7F80

  • loadcoff <coffile>

    Loads the specified <COFF2-program.out> (COFF2 format) into internal DSP memory

  • run (=unreset)

    Releases the reset making the program start

  • reset

    Resets the DSP.

Chapter 5. Building the bridge & C54x development

5.1. Building the kernel module

  • Get the correct toolchain to build kernel modules and userspace programs for an ARM processor

    Easiest is just to install the VMWare Image made by Crweb, it works for me, it should work for you.

  • Get the bridge software

    • Either get the latest build from svn: svn co https://svn.neurostechnology.com/hackers/darchon/arm-dsp_bridge/

    • Or download the archive: wget https://svn.neurostechnology.com/hackers/darchon/arm-dsp_bridge.tgz

  • Configure the build process for the DM320 DSP module (dm320dsp_module). Edit the Makefile for both modules to make KERNELDIR point to the kernel source tree that is used to build the kernel for the OSD

  • Build the module by running make

5.2. Building the DSP bridge library

  • Make sure the binutils C54x toolchain is installed. (Look it up on google how to do this)

  • The bridge library can be bound in the ./bridgelib directory of the source archive

  • Copy the directory content and run make in the destination directory to build the library.

5.3. Building the bridge control utility 'bridgectl'

For this process I'm assuming you are using Crweb VMWare image, if not, you're on your own

  • Copy the 'bridgectl' directory from the archive to ~/Scratchbox-Home/

  • Type /scratchbox/login to enter the scratchbox building environment

  • Enter the 'bridgectl' directory you just copied and run make inside of it

5.4. Installing the bridge on the OSD

I'm assuming you are using the VMWare image and are netbooting your OSD from this image for this step.

  • Copy the compiled kernel module, the load/unload scripts (dm320_dsp_load, dm320_dsp_unload) and the bridgectl utility to /srv/neuros-osd-rootfs/root/

  • Login to the OSD through the serial port

  • Unload ALL Ingenient kernel modules

  • Run the dm320_dsp_load script to install and load the bridge kernel modules

    NB: The scripts need 'awk' to run. If 'awk' is not built into your busybox config, either rebuilt your busybox environment or execute the commands in the script manually.

5.5. Using the bridge

To make your Linux programs use the DSP bridge include the 'arm-dsp_bridge.h' header that is in the ./include directory of the source archive. To make your DSP programs use the bridge, link against the bridge library and use the framework. Both can be found in ./bridgelib directory of the source archive. To load compiled DSP binaries on the DSP use the 'bridgectl' utility.

Some things to keep in mind when using the bridge:

  • DSP programs are always single threaded. And only one program can be loaded and run in DSP memory at a time.

  • NEVER use/overwrite the data memory locations 0x90 - 0x94. These are used internally by the bridge.

  • The bridge library functions overwrite registers without regard for their content. So if there is valuable data in them, be sure to save them before calling the bridge functions.

  • Do NOT use HPIB DMA directly, always use the _hpib_read and _hpib_write functions.

5.6. C54x Development

When you are developing for the C54x DSP, there are two ways to go:

  • Texas Instruments Code Composer Studio

    This is a really use piece of software. It compiles C-code, has lots of libraries and debugging support. Too much to explain here, just check out Texas Instruments website. CCS compiles to COFF2 format by default.

  • Binutils Assembler

    The only open-source tools for the C54x DSP is the binutils assembler. To use binutils for the C54x DSP you will need to compile it specifically for this archicture, check on google how to do this, it's really easy. The binutils assembler/linker builds to COFF0 by default, use "c54x-objcopy <input> -O coff2-c54x <output>" to convert an assembled binary to COFF2 format.