Advertisements

Assembler in C

First of all what is an assembler? [1] [2]:

Typically a modern assembler creates object code by translating assembly instruction mnemonics into opcodes, and by resolving symbolic names for memory locations and other entities. The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications.

Types of assembler:

There are two types of assemblers based on how many passes through the source are needed to produce the executable program.

  • One-pass assemblers go through the source code once and assume that all symbols will be defined before any instruction that references them.
  • Two-pass assemblers create a table with all symbols and their values in the first pass, then use the table in a second pass to generate code. The assembler must at least be able to determine the length of each instruction on the first pass so that the addresses of symbols can be calculated.

The advantage of the two-pass assembler is that symbols can be defined anywhere in program source code, allowing programs to be defined in more logical and meaningful ways, making two-pass assembler programs easier to read and maintain.

Starting now to make an assembler one first has to define an input and output form. That is what assembler takes as input and what kind of output it will provide. In here I’m trying to make an assembler based on book “System Programming and Operating System by D M Dhamdhere” so I’m assuming following input form:

             start 101

             read n

             mover breg,one

             movem breg,term

again:   mult breg,term

             add creg,one

             movem creg,term

             comp creg,four

             bc le,again

             div breg,two

             movem breg,result

             print result

             stop

n ds 2

result ds 1

one dc ‘1’

term ds 1

two dc ‘2’

four dc ‘4

end

And output form:

101) 90113     

102) 42116     

103) 52117     

104) 32117     

105) 13116     

106) 53117     

107) 63119     

108) 72104     

109) 82118     

110) 52115     

111) 100115   

112) 0 000

113) 

115) 

116)00 0 001 

117) 

118)00 0 001 

119)00 0 001 

120)  

At first declare fix structures that contains default information to be used.

struct directive{
     char symbol[10];
     int code;
}dir[5]={"start",1,"end",0,"origin",1,"ltorg",2};

struct mnemtab{
     char mnem[20];
     int opcode;
     int len;
}l[12]={"stop",0,1,"add",1,1,"sub",2,1,"mult",3,1,"mover",4,1,"movm",5,1
        "comp",6,1,"bc",7,1,"div",8,1,"read",9,1,"print",10,1};

struct regtab{
     char regsym[10];
     int code;
}reg[4]={"areg",1,"breg",2,"creg",3,"dreg",4};

struct dclcode{
     char mne[5];
     int address;
}dcl[2]={"ds",1,"dc",2};

struct compcode{
     char sym[5];
     int code;
}ccode[7]={"lt",1,"le",2,"eq",3,"gt",4,"ge",5,"any",6};

In above code I have declared all required struct that holds default information used in our program. The first one is “directive” which holds predefined directives and its code. Second one “mnemtab” holds mnemonics, its opcode and its length. Third one “regtab” is used for holding register name and its code. Fourth one “dclcode” are used for declaration statements and its code. The final one “compcode” are used for comparison symbols and its code. Beside this create some more structure that we will be used to store information as it process the input file:

struct buffer{
     char lbl[10];
     char m[10];
     char op1[10];
     char op2[10];
}buf;

struct symtab{
     char symbol[10];
     int add;
}sym[50];

struct littab{
     char lit[10];
     int address;
}literal[20];

struct tab_inc{
     int index;
     long fpos;
     int type;
}t_inc[10];

In above 4 structs the first one is used as buffer that will store information about the line that will be processed at the time. Second one is used as symbol table. This will store all symbol and its relevant addresses in it. The third one “littab” is used to store literals and its relevant address. Now create some getter, setter and update methods for above declared structs. First for getting directive code based on directive:

int get_dir(char a[10]){
      int i;
      for(i=0;i<4;i++){
            if(!strcmp(a,dir[i].symbol))
                 return dir[i].code;
      }
      return -1;
}

Update symbol to add new symbol entry in it. This function will also check for duplicate declaration of symbol. If duplicate declaration found then shows warning as “Multiple declaration of [symbolName]”.

void update_symtab(FILE *fe){
      int i,j;
      for(i=0;i<sc;i++){
            if(!strcmp(sym[i].symbol,buf.m)){
                   if(sym[i].add==0)
                         sym[i].add=lc;
                   else{
                         printf("\nMultiple declaration of %s",buf.m);
                         fprintf(fe,"\nMultiple declaration of %s",buf.m);
                   }
                   return;
            }
      }
}

Get symbol from symbol table.

int get_sym(char s[10]){
      int i,len=strlen(s);
      if(s[0]=='=') {
            for(i=0;i<ltc;i++){
                  if(!strcmp(literal[i].lit,s)) {
                        t_inc[tc].type=0;
                        return i;
                  }
            }
            strcpy(literal[ltc].lit,s);
            ltc++;
            t_inc[tc].type=0;
            return (ltc-1);
      }
      else if(s[len-1]==':'){
            s[len-1]='\0';
            strcpy(sym[sc].symbol,s);
            sym[sc].add=lc;
            sc++;
            return (sc-1);
      }
      else{
           for(i=0;i<sc;i++){
                 if(!strcmp(s,sym[i].symbol)) {
                        t_inc[tc].type=1;
                        return i;
                 }
           }
           strcpy(sym[sc].symbol,s);
           t_inc[tc].type=1;
           sym[sc].add=0;
           sc++;
           return (sc-1);
      }
}

Next is getting register code based on its name.

int get_reg(char a[]) {
      int i;
      for(i=0;i<3;i++) {
            if(!strcmp(a,reg[i].regsym))
                  return reg[i].code;
      }
      return 0;
}

Get mnemonics opcode based on its name,

int get_btcode(char a[20]) {
      int i;
      for(i=0;i<11;i++) {
            if(strcmp(l[i].mnem,a)==0)
                  return l[i].opcode;
      }
      return -1;
}

Get declaration statement code.

int get_dcl(char a[10]) {
      int i;
      for(i=0;i<3;i++) {
            if(!strcmp(dcl[i].mne,a))
                  return dcl[i].address;
      }
      return -1;
}

Clean buffer after processing on it.

void clean_buf() {
      strcpy(buf.lbl,"");
      strcpy(buf.m,"");
      strcpy(buf.op1,"");
      strcpy(buf.op2,"");
}

Get comparison statement code based on its name.

int get_ccode(char a[5]){
      int i;
      for(i=0;i<7;i++) {
            if(!strcmp(ccode[i].sym,a))
                   return ccode[i].code;
      }
      return -1;
}

Up to now I have declared data structure kind thing for the assembler and now let’s start building processing of assembler. At first one need to parse the input file to get desired information for this I have used following method which tokenize file line by line & store each token into buffer.

void parser(FILE *fi) {
      char a[20];
      int i;
      int ctr=0;
      while((c=getc(fi))!=EOF && (c!='\n')) {
            if((c!=' ') && (c!='\t') && (c!='\n')) {
                  i=0;
                  a[i++]=c;
                  while(((c=getc(fi))!=' ') && (c!=' ') && (c!='\n') && (c!=EOF) && (c!=',')) {
                         a[i++]=c;
                  }
                  a[i]='\0';
                  if(a[0]!='\0'){
                        if(ctr==0){
                             if(a[i-1]==':'){
                                    strcpy(buf.lbl,a);
                                    printf("\nlbl:%s",buf.lbl);
                                    continue;
                              }
                              else{
                                    strcpy(buf.m,a);
                                    printf("\nm:%s",buf.m);
                                    ctr++;
                              }
                        }
                        else if(ctr==1){
                              strcpy(buf.op1,a);
                              printf("\nop1:%s",buf.op1);
                              ctr++;
                        }
                        else if(ctr==2){
                              strcpy(buf.op2,a);
                              printf("\nop2:%s",buf.op2);
                              ctr=0;
                       }
                 }
                 if(c=='\n'){
                       ctr=0;
                       lnctr++;
                       return;
                 }
           }
      }
}

Now that data is parsed from file let’s start processing on this data. In pass-1 assemblers create a table with all symbols and their values. Also the assembler at least be able to determine the length of each instruction on the first pass so that the addresses of symbols can be calculated. I have created following method for Pass – 1 of assembler.

void pass1(FILE *fe,FILE *fo){
      int code,opcode,dcode;
      static int fstop=1,fend=1;
      long fpos;
      if(fstop){
            if(strcmp(buf.lbl,""))
                  get_sym(buf.lbl);
            opcode=get_btcode(buf.m);
            if(opcode==0){
                  fstop=0;
                  fprintf(fo,"%d) 0 000 \n",lc);
                  lc++;
                  return;
            }
            if(opcode==7){
                  fprintf(fo,"%d) %d",lc,opcode);
                  code=get_ccode(buf.op1);
                  fprintf(fo,"%d",code);
                  code=get_sym(buf.op2);
                  fpos=ftell(fo);
                  t_inc[tc].index=code;
                  t_inc[tc].fpos=fpos;
                  tc++;
                  fprintf(fo,"   \n");
                  lc++;
                  return;
            }
            if(opcode!=-1){
                  fprintf(fo,"%d) %d",lc,opcode);
                  code=get_reg(buf.op1);
                  if(code==0){
                        strcpy(buf.op2,buf.op1);
                  }
                  fprintf(fo,"%d",code);
                  code=get_sym(buf.op2);
                  fpos=ftell(fo);
                  t_inc[tc].index=code;
                  t_inc[tc].fpos=fpos;
                  tc++;
                  lc++;
                  fprintf(fo,"   \n");
            }
            else{
                  dcode=get_dir(buf.m);
                  if(dcode!=-1) {
                        switch(dcode){
                              case 1:
                                    lc=atoi(buf.op1);
                                    break;
                        }
                  }
                  else{
                        fprintf(fo,"%d) ***",lc);
                        code=get_reg(buf.op1);
                        if(code==0) {
                              if(!strcmp(buf.op2,""))
                                    strcpy(buf.op2,buf.op1);
                              else{
                                    fprintf(fo," *** ");
                                    fprintf(fe," Error at line %d :: Illegal register %s",lnctr 1,buf.op1);
                                    code=get_sym(buf.op2);
                                    fpos=ftell(fo);
                                    t_inc[tc].index=code;
                                    t_inc[tc].fpos=fpos;
                                    tc++;
                                    lc++;
                              }
                        }
                        fprintf(fo," %d",code);
                        code=get_sym(buf.op2);
                        fpos=ftell(fo);
                        t_inc[tc].index=code;
                        t_inc[tc].fpos=fpos;
                        tc++;
                        lc++;
                        fprintf(fo,"   \n");
                        printf("\n Error at line %d :: Unknown mnemonics: %s",lnctr-1,buf.m);
                        fprintf(fe,"\n Error at line %d :: Unknown mnemonics: %s",lnctr-1,buf.m);
                  }
            }
      }
      else{
            if(fend) {
                  if(!strcmp(buf.m,"end")){
                        fprintf(fo,"%d)   \n",lc);
                        lc++;
                        fend=0;
                  }
                  else{
                        code=get_dcl(buf.op1);
                        switch(code) {
                              case 1:
                                    fprintf(fo,"%d)  \n",lc);
                                    update_symtab(fe);
                                    lc+=atoi(buf.op2);
                                    break;
                              case 2:
                                    fprintf(fo,"%d)00 0 001  \n”,lc);
                                    update_symtab(fe);
                                    lc++;
                                    break;
                        }
                  }
            }
            else if(strcmp(buf.m,"")) {
                  code=decode_lit(buf.m,fe);
                  fprintf(fo,"%d) 00 0 %d  \n",lc,code);
                  lc++;
            }
      }
      clean_buf();
      return;
}

And for the final thing that is Pass – 2 of assembler. In pass-2 assembler uses the table created in first pass and generates code for that. For that I have created following method:

void pass2(FILE *fop){
      int i,adr,x;
      long pos;
      for(i=0;i<tc;i++){
            pos=t_inc[i].fpos;
            adr=t_inc[i].index;
            fseek(fop,pos,0);
            if(t_inc[i].type==0) {
                  if(literal[adr].address!=0)
                        fprintf(fop,"%d \t",literal[adr].address);
                  else{
                        fprintf(fop,"***");
                        printf("\n Undelared literal: %s",literal[adr].lit);
fprintf(fop,"\n Undelared literal: %s",literal[adr].lit);
                  }
            }
            else{
                  if(sym[adr].add!=0)
                        fprintf(fop,"%d\t",sym[adr].add);
                  else{
                        fprintf(fop,"***");
                        printf("\n Undeclared symbol: %s",sym[adr].symbol);
                        fprintf(fop,"\n Undeclared symbol: %s",sym[adr].symbol);
                  }
            }
      }
}

And it’s done. Assembler is ready to run. Go on try it and give your valuable response.

References:-

  1. http://en.wikipedia.org/wiki/Assembly_language#Assembler
  2. http://searchdatacenter.techtarget.com/definition/assembler
  3. http://www.cse.iitb.ac.in/~dmd/

Note: – This is not perfect assembler solution and may have some bugs. It is just for understanding basic assembler concept. Also above code is not an optimized version.

Advertisements
%d bloggers like this: