13 de enero de 2014

Most used functions in Racket bytecode

Introduction

The idea is to count how many times each function appears in the bytecode of compiled code of Racket (version 5.3.6).
When compiling a Racket program, several steps are performed automatically. First, all macros are expanded, then the code is compiled to a specific bytecode that is later optimized and the result is saved in a .zo file. The optimization includes constant propagation, functions inlining (including functions defined in other modules) and more. So between the original program and the compiled bytecode version the difference is very large. Functions can be primitive (defined in C) or be defined in other modules.
The functions can be assigned to variables or passed as parameters to other functions. We will count only how many times each function appears in the direct application position . For example, in (zero? (+ 1 2)) we will count zero? and +, but in (cons' sum +) we will only have cons, but not +.
Furthermore, we will ignore the application of functions that are arguments to other functions, for example in (define (apply-to-random f) (f (random))) will ignore f. Although it is likely that this code is inlined by the optimizer and if the original code is (apply-to-random add1) will get (add1 (random)) and in this case we will count add1.
This is a static count of how many times each function, the number of calls to each function when executed may be very different appears.

Results

In the directory Racket (version 5.3.6) there are 4575 .zo files and 3145659 function calls. Each file has an average of 688 function calls.
Calls
 
Unique Functions Calls per Function Calls per File
Prim 2042289 65% 1033 4% 1977 446
NoPrim 1103370 35% 23992 96% 46 241
Total 3145659 25025 126 688
Although only 4% of the found functions are primitive, but 65% of the calls to refers to these functions. This is because each primitive function appears 1977 times, while each non-primitive function appears in only 46 calls. All of this is not very surprising because the optimizer inlines many non-primitive functions.
Counting the functions we obtained the following results. The non-primitive functions are in italics.
Pos Count Name Prim
1 309802 datum->syntax #t
2 305884 list #t
3 118896 cons #t
4 96201 unsafe-cdr #t
5 91229 null? #t
6 81452 car #t
7 77059 eq? #t
8 71870 unsafe-car #t
9 62725 _*keep-s-expr:p@(lib "scribble/private/manual-scheme.rkt") #f
10 57761 _to-element15.25:p@(lib "scribble/racket.rkt") #f
11 45041 vector #t
12 42930 syntax-e #t
13 42924 values #t
14 42063 pair? #t
15 41244 _to-paragraph31.31:p@(lib "scribble/racket.rkt") #f
16 38956 cdr #t
17 36267 _check-pre-part:P@(lib "scribble/doclang.rkt") #f
18 30222 + #t
19 29284 _find-method/who:p@(lib "racket/private/class-internal.rkt") #f
20 26117 _*Value:f@(lib "typed-racket/rep/type-rep.rkt") #f
21 25417 _make-var-id:mk@(lib "scribble/racket.rkt") #f
22 22954 _flat-named-contract:p@(lib "racket/contract/private/misc.rkt") #f
23 19851 vector-ref #t
24 19697 srcloc #t
25 18502 list? #t
The most popular functions are datum->syntax and list. They appear almost three times more than the next functions. The following functions also handle lists. I think this is because the code of Racket many functions that are the definition macros. When you compile something simple like
{define-syntax-rule (repeat n
                       body ...)
  (for ([i (in-range n)])
     body ...)}
we get an expanded version that uses primitive functions like datum->syntax and much list management code. I guess this in because we are analyzing the code that defines Racket from a more simplified version of Racket and therefore uses many macros. Also, in the code there are also several aditonal languages built on top of Racket, adding more macros. I guess the normal code has usually less macros.
To see the counting of all the function, we can order them from the most common to the least common and plot the number of occurrences of each one. To see the general behavior clearly, we use logarithmic scales on both axes. On this scale the straight lines represent functions of the type xk.

We see that the only big jump is between the first two functions and the rest, the x3 corresponds to half scale mark. The primitive and non-primitive functions are fairly interleaved, although there are many non-primitive that appear only once.
We can divide it into two groups. The amount of occurrences of the first 100 follow a formula of the form x-0.997. Instead the others follow a formula of the form x-1.53. The coefficients are about -1.0 and -1.5. There seems to be a change in behavior at that point, but I don't know the reason. It may be only numerology and a direct application of the Mar's law: Everything is linear if plotted log-log with a fat magic marker. My corollary is:. It's easier with two markers.

Modifying decompile

The bytecode can translated to an almost human readable format using decopile. It's not really a direct translation, because some of the bytecode instructions do not have a direct representation in Racket. Do not take it literally, but it is generally understandable and it gives a good idea of what is is optimized and what remained.
This is much similar to what we want to do, so we just take the source code of decompile, change all decompile to explore and start making changes. For example:
  • No need to see what's in the provide and require, because they only serve to connect the functions of a module with functions in another module, not running code.
  • No need to see what's inside of the syntax, because they have data that is used to generate new code in macros.
  • Some functions have two versions, one for running and one for inlining. We ignore the inlining version.
  • We ignore direct values as numbers, #t and #f, strings, quote.
  • We don't need to return any results, so we remove the lists representing the decompiled code.
The following is one of the main functions. The arguments are:
  • expr is the expression to be analyzed.
  • glob have some global definitions, for example the variables (and function) and globally defined and defined in other modules.
  • stack has local variables, roughly corresponds to what is usually saved on the stack
  • closed is has the closed functions that were analyzed analized
The idea is simply to look at every possible structure and recursively analyze the parts that interest us.
{define (explore-expr expr globs stack closed); short version, with some omissions.
  (match expr
    [(struct assign (id rhs undef-ok?))
     (explore-expr rhs globs stack closed)]
    [(? lam?)
     (explore-lam expr globs stack closed)]
    [(struct case-lam (name lams))
     (for ([lam (in-list lams)])
       (explore-lam lam globs stack closed))]
    [(struct let-one (rhs body type unused?))
     (let ([id (or (extract-id rhs) (gensym (or type (if unused? 'unused 'local))))])
       (explore-expr rhs globs (cons id stack) closed)
       (explore-expr body globs (cons id stack) closed))]
    [(struct let-void (count boxes? body))
     (let ([ids (extract-ids body count)])
       (let ([vars (for/list ([id (in-list ids)])
                     (or id (gensym (if boxes? 'localvb 'localv))))])
         (explore-expr body globs (append vars stack) closed)))]
    [(struct let-rec (procs body))
     (begin
       (for ([proc (in-list procs)])
         (explore-expr proc globs stack closed))
       (explore-expr body globs stack closed))]
    [(struct install-value (count pos boxes? rhs body))
     (begin
       (explore-expr rhs globs stack closed)
       (explore-expr body globs stack closed))]
    [(struct boxenv (pos body))
     (explore-expr body globs stack closed)]
    [(struct branch (test then else))
     (begin
       (explore-expr test globs stack closed)
       (explore-expr then globs stack closed)
       (explore-expr else globs stack closed))]
    [(struct application (rator rands))
     (let ([vars (for/list ([i (in-list rands)]) (gensym 'rand))])
       (explore-var rator globs (append vars stack) closed)
       (explore-expr rator globs (append vars stack) closed)
       (for ([rand (in-list rands)])
         (explore-expr rand globs (append vars stack) closed)))]
    [(struct apply-values (proc args-expr))
     (begin
       (explore-var proc globs stack closed)
       (explore-expr proc globs stack closed)
       (explore-expr args-expr globs stack closed))]
    [(struct seq (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct beg0 (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct closure (lam gen-id))
     (unless (hash-ref closed gen-id #f)
       (hash-set! closed gen-id #t)
       (explore-expr lam globs stack closed))]
    [else (void)])}


We will recursively travel across the structure that represents the bytecode, reaching the applications of functions (the normal version and the apply-values version). In these two cases we call the new explore-var function that computes the function name and sends it to the main develop using a function stored in a parameter.
The parameters used to define almost-global variables. In general it is better to return the results as the result of the function. In this case I thought it was better to return the result using that parameter because the values were obtained very deeply  in recursive function calls and did not want to change to much the parameters of these functions. Another advantage is that you can show the variables as they found, instead of waiting until the end.
Also, you can change the function stored in the parameter, without having to modify the code of explore. The parameter code is much simpler and does not depend on the internal details of the bytecode, making it easy to separate them and only focus on the data visualization part.
(define current-explore-found (make-parameter void))

{define (explore-var expr globs stack closed)
  (let ([v (match expr
             [(struct toplevel (depth pos const? ready?))
              (list-ref/protect globs pos 'toplevel)]
             [(struct primval (id))
              (hash-ref primitive-table id {lambda () (error "unknown primitive")})]
             [(struct localref (unbox? offset clear? other-clears? type))
              (list-ref/protect stack offset 'localref)]
             [else #f])])
    ((current-explore-found) v))}


Looking for the .zo

The main program calls these functions to analyze each file zo..
First we define some auxiliary functions to find the Racket directory and filter the .zo files..
#lang racket
(require compiler/decompile
         compiler/zo-parse)
(require racket/match)
(require "explore.rkt")

{define (get-racket-directory)
  (let-values ([(dir file must-be-dir?)
                (split-path (find-system-path 'exec-file))])
    dir)}

{define (zo-file? f)
  (and (file-exists? f)
       (filename-extension f)
       (bytes=? (filename-extension f) #"zo"))}

We define a hash to store the results and a function that saves  there the symbols that appear, keeping only the interned to filter the arguments and local variables.
(define found (make-hasheq))

{define (count-founded v)
  (when (and v (symbol-interned? v))
    (hash-set! found v (add1 (hash-ref found v {lambda () #;(displayln v) 0})))
    #;(displayln (list v (hash-ref found v))))}

Now we combine them all, we get the files and we analyze them.
(parameterize ([current-explore-found count-founded])
  (for ([file (sequence-filter zo-file? (in-directory (get-racket-directory)))])
    #;(newline)
    (displayln file)
    {define program-zo (let ([port (open-input-file file)])
                        (begin0
                          (zo-parse port)
                          (close-input-port port)))}
    (explore program-zo)
    #;(pretty-display (decompile program-zo))
    #;(pretty-display program-zo)))

Finally, it sorts the data in the hash and displays it.
(newline)
(newline)
(for ([name/count (in-list (sort (hash->list found) > #:key cdr))])
  (display (cdr name/count)) (display " ") (displayln (car name/count)))
  

Ideas

  • I would like to analyze combinations of functions. For example, counting the expressions of the form (f (g ?)) ignoring the value of ?. Also combinations of special forms and functions. For example, in the if's
  • I deleted a lot of code of decompile, mostly to concentrate on understanding the part that I wanted to understand. It would be nice to have a version that keeps a bigger part of the initial information, so when we get to an interesting point we can see the decompiled version. (I should modify the decompile module to export the auxiliary functions and think what to do with closed functions (which are not closures).

Complete code

Complete code of the explore.rkt module that defines explore. It's based in the decompile module.
#lang racket/base
(require compiler/zo-parse
         syntax/modcollapse
         racket/port
         racket/match
         racket/list
         racket/path)

(provide explore
         current-explore-found)

;; ----------------------------------------
(define current-explore-found (make-parameter void))
;; ----------------------------------------

{define primitive-table
  ;; Figure out number-to-id mapping for kernel functions in `primitive'
  (let ([bindings
         (let ([ns (make-base-empty-namespace)])
           (parameterize ([current-namespace ns])
             (namespace-require ''#%kernel)
             (namespace-require ''#%unsafe)
             (namespace-require ''#%flfxnum)
             (namespace-require ''#%extfl)
             (namespace-require ''#%futures)
             (for/list ([l (namespace-mapped-symbols)])
               (cons l (with-handlers ([exn:fail? {lambda (x) #f}])
                         (compile l))))))]
        [table (make-hash)])
    (for ([b (in-list bindings)])
      (let ([v (and (cdr b)
                    (zo-parse
                     (open-input-bytes
                      (with-output-to-bytes
                          {lambda () (write (cdr b))}))))])
        (let ([n (match v
                   [(struct compilation-top (_ prefix (struct primval (n)))) n]
                   [else #f])])
          (hash-set! table n (car b)))))
    table)}

{define (list-ref/protect l pos who)
  (list-ref l pos)
  #;
  (if (pos . < . (length l))
      (list-ref l pos)
      `(OUT-OF-BOUNDS ,who ,pos ,(length l) ,l))}

;; ----------------------------------------

;; Main entry:
{define (explore top)
  (let ([stx-ht (make-hasheq)])
    (match top
      [(struct compilation-top (max-let-depth prefix form))
       (let ([globs (explore-prefix prefix stx-ht)])
         (explore-form form globs '(#%globals) (make-hasheq) stx-ht))]
      [else (error 'explore "unrecognized: ~e" top)]))}

{define (explore-prefix a-prefix stx-ht)
  (match a-prefix
    [(struct prefix (num-lifts toplevels stxs))
     (let ([lift-ids (for/list ([i (in-range num-lifts)])
                       (gensym 'lift))]
           [stx-ids (map (lambda (i) (gensym 'stx))
                         stxs)])
       (append
        (map {lambda (tl)
               (match tl
                 [#f '#%linkage]
                 [(? symbol?) (string->symbol (format "_~a" tl))]
                 [(struct global-bucket (name))
                  (string->symbol (format "_~a" name))]
                 [(struct module-variable (modidx sym pos phase constantness))
                  (if (and (module-path-index? modidx)
                           (let-values ([(n b) (module-path-index-split modidx)])
                             (and (not n) (not b))))
                      (string->symbol (format "_~a" sym))
                      (string->symbol (format "_~s~a@~s~a"
                                              sym
                                              (match constantness
                                                ['constant ":c"]
                                                ['fixed ":f"]
                                                [(function-shape a pm?)
                                                 (if pm? ":P" ":p")]
                                                [(struct-type-shape c) ":t"]
                                                [(constructor-shape a) ":mk"]
                                                [(predicate-shape) ":?"]
                                                [(accessor-shape c) ":ref"]
                                                [(mutator-shape c) ":set!"]
                                                [else ""])
                                              (mpi->string modidx)
                                              (if (zero? phase)
                                                  ""
                                                  (format "/~a" phase)))))]
                 [else (error 'explore-prefix "bad toplevel: ~e" tl)])}
             toplevels)
        stx-ids
        (if (null? stx-ids) null '(#%stx-array))
        lift-ids))]
    [else (error 'explore-prefix "huh?: ~e" a-prefix)])}



{define (mpi->string modidx)
  (cond
   [(symbol? modidx) modidx]
   [else
    (collapse-module-path-index modidx (build-path
                                        (or (current-load-relative-directory)
                                            (current-directory))
                                        "here.rkt"))])}

{define (explore-module mod-form orig-stack stx-ht mod-name)
  (match mod-form
    [(struct mod (name srcname self-modidx prefix provides requires body syntax-bodies unexported
                       max-let-depth dummy lang-info internal-context flags pre-submodules post-submodules))
     (let ([globs (explore-prefix prefix stx-ht)]
           [stack (append '(#%modvars) orig-stack)]
           [closed (make-hasheq)])
       `(,mod-name ,(if (symbol? name) name (last name)) ....
           ,@(if (null? flags) '() (list `(quote ,flags)))
           ,@(let ([l (apply
                       append
                       (for/list ([req (in-list requires)]
                                  #:when (pair? (cdr req)))
                         (define l (for/list ([mpi (in-list (cdr req))])
                                     (define p (mpi->string mpi))
                                     (if (path? p)
                                         (let ([d (current-load-relative-directory)])
                                           (path->string (if d
                                                             (find-relative-path (simplify-path d #t)
                                                                                 (simplify-path p #f)
                                                                                 #:more-than-root? #t)
                                                             p)))
                                         p)))
                         (if (eq? 0 (car req))
                             l
                             `((,@(case (car req)
                                    [(#f) `(for-label)]
                                    [(1) `(for-syntax)]
                                    [else `(for-meta ,(car req))])
                                ,@l)))))])
               (if (null? l)
                   null
                   `((require ,@l))))
          ,@(for/list ([submod (in-list pre-submodules)])
              (explore-module submod orig-stack stx-ht 'module))
          ,@(for/list ([b (in-list syntax-bodies)])
              (let loop ([n (sub1 (car b))])
                (if (zero? n)
                    (cons 'begin
                          (for/list ([form (in-list (cdr b))])
                            (explore-form form globs stack closed stx-ht)))
                    (list 'begin-for-syntax (loop (sub1 n))))))
          ,@(map {lambda (form)
                   (explore-form form globs stack closed stx-ht)}
                 body)
          ,@(for/list ([submod (in-list post-submodules)])
              (explore-module submod orig-stack stx-ht 'module*))))]
    [else (error 'explore-module "huh?: ~e" mod-form)])}

{define (explore-form form globs stack closed stx-ht)
  (match form
    [(? mod?)
     (explore-module form stack stx-ht 'module)]
    [(struct def-values (ids rhs))
     (if (inline-variant? rhs)
         (explore-expr (inline-variant-direct rhs) globs stack closed)
         (explore-expr rhs globs stack closed))]
    [(struct def-syntaxes (ids rhs prefix max-let-depth dummy))
     (let ([globs (explore-prefix prefix stx-ht)])
       (explore-form rhs globs '(#%globals) closed stx-ht))]
    [(struct seq-for-syntax (exprs prefix max-let-depth dummy))
     (let ([globs (explore-prefix prefix stx-ht)])
       (for/list ([rhs (in-list exprs)])
         (explore-form rhs globs '(#%globals) closed stx-ht)))]
    [(struct seq (forms))
     (map {lambda (form)
            (explore-form form globs stack closed stx-ht)}
          forms)]
    [(struct splice (forms))
     (map {lambda (form)
            (explore-form form globs stack closed stx-ht)}
          forms)]
    [(struct req (reqs dummy))
     (void)]
    [else
     (explore-expr form globs stack closed)])}

{define (extract-name name)
  (if (symbol? name)
      (gensym name)
      (if (vector? name)
          (gensym (vector-ref name 0))
          #f))}

{define (extract-id expr)
  (match expr
    [(struct lam (name flags num-params arg-types rest? closure-map closure-types tl-map max-let-depth body))
     (extract-name name)]
    [(struct case-lam (name lams))
     (extract-name name)]
    [(struct closure (lam gen-id))
     (extract-id lam)]
    [else #f])}

{define (extract-ids body count)
  (define (extract-ids! body ids)
    (match body
      [(struct let-rec (procs body))
       (for ([proc (in-list procs)]
             [delta (in-naturals)])
         (when (< -1 delta (vector-length ids))
           (vector-set! ids delta (extract-id proc))))
       (extract-ids! body ids)]
      [(struct install-value (val-count pos boxes? rhs body))
       (extract-ids! body ids)]
      [(struct boxenv (pos body))
       (extract-ids! body ids)]
      [else #f]))
  (define ids (make-vector count #f))
  (extract-ids! body ids)
  (vector->list ids)}

{define (explore-var expr globs stack closed)
  (let ([v (match expr
             [(struct toplevel (depth pos const? ready?))
              (list-ref/protect globs pos 'toplevel)]
             [(struct primval (id))
              (hash-ref primitive-table id {lambda () (error "unknown primitive")})]
             [(struct localref (unbox? offset clear? other-clears? type))
              (list-ref/protect stack offset 'localref)]
             [else #f])])
    ((current-explore-found) v))}

{define (explore-expr expr globs stack closed)
  (match expr
    [(struct assign (id rhs undef-ok?))
     (explore-expr rhs globs stack closed)]
    [(? lam?)
     (explore-lam expr globs stack closed)]
    [(struct case-lam (name lams))
     (for ([lam (in-list lams)])
       (explore-lam lam globs stack closed))]
    [(struct let-one (rhs body type unused?))
     (let ([id (or (extract-id rhs) (gensym (or type (if unused? 'unused 'local))))])
       (explore-expr rhs globs (cons id stack) closed)
       (explore-expr body globs (cons id stack) closed))]
    [(struct let-void (count boxes? body))
     (let ([ids (extract-ids body count)])
       (let ([vars (for/list ([id (in-list ids)])
                     (or id (gensym (if boxes? 'localvb 'localv))))])
         (explore-expr body globs (append vars stack) closed)))]
    [(struct let-rec (procs body))
     (begin
       (for ([proc (in-list procs)])
         (explore-expr proc globs stack closed))
       (explore-expr body globs stack closed))]
    [(struct install-value (count pos boxes? rhs body))
     (begin
       (explore-expr rhs globs stack closed)
       (explore-expr body globs stack closed))]
    [(struct boxenv (pos body))
     (explore-expr body globs stack closed)]
    [(struct branch (test then else))
     (begin
       (explore-expr test globs stack closed)
       (explore-expr then globs stack closed)
       (explore-expr else globs stack closed))]
    [(struct application (rator rands))
     (let ([vars (for/list ([i (in-list rands)]) (gensym 'rand))])
       (explore-var rator globs (append vars stack) closed)
       (explore-expr rator globs (append vars stack) closed)
       (for ([rand (in-list rands)])
         (explore-expr rand globs (append vars stack) closed)))]
    [(struct apply-values (proc args-expr))
     (begin
       (explore-var proc globs stack closed)
       (explore-expr proc globs stack closed)
       (explore-expr args-expr globs stack closed))]
    [(struct seq (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct beg0 (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct with-cont-mark (key val body))
      (begin
        (explore-expr key globs stack closed)
        (explore-expr val globs stack closed)
        (explore-expr body globs stack closed))]
    [(struct closure (lam gen-id))
     (unless (hash-ref closed gen-id #f)
       (hash-set! closed gen-id #t)
       (explore-expr lam globs stack closed))]
    [else (void)])}

{define (explore-lam expr globs stack closed)
  (match expr
    [(struct closure (lam gen-id)) (explore-lam lam globs stack closed)]
    [(struct lam (name flags num-params arg-types rest? closure-map closure-types tl-map max-let-depth body))
     (let ([vars (for/list ([i (in-range num-params)]
                            [type (in-list arg-types)])
                   (gensym (format "~a~a-"
                                   (case type
                                     [(ref) "argbox"]
                                     [(val) "arg"]
                                     [else (format "arg~a" type)])
                                   i)))]
           [rest-vars (if rest? (list (gensym 'rest)) null)]
           [captures (map {lambda (v)
                            (list-ref/protect stack v 'lam)}
                          (vector->list closure-map))])
       (explore-expr body globs (append captures vars rest-vars) closed))])}