Improving Internal Call Machinery

In this, today’s edition of PSP CLR project updates, I’m going to discuss the internal call improvements I made over the weekend. Reminder: an internal call, in the context of the Common Language Interface, is a method (or an invocation of the same) whose body is provided by the runtime itself — not to be confused with a call of a method with the internal access specifier.

The original internal call implemention in PSPCLR was a huge hack, and existed only so that I could provide some semblance of a Console.Write method to print “Hello, World!” on the screen. The first PSPCLR loader I discussed publically was largely important because it could finally successfully parse all the CLR metadata, not because it could do anything useful. The initial internal call dispatch system was designed around the method’s qualified name: I’d planned on simply storing a std::map that associated names with function pointers, and doing a map lookup and invoke when I ran into a method with the internal call flag set.

The immediate problem with that approach is that it is string-based — fully-qualified names alone are not sufficient to cope with overloaded methods, so some sort of encoding would be needed to bake the parameter types into the map key. Since these would all be in string literals on the CLR side, compile-time checking of the formatting of those literals was impossible. Additionally, the dispatch being a map lookup meant that the dispatch time (at least for the first call) was going to be logarithmic.

Fortunately, I don’t think this approach got very far off the ground at all. It’s entirely likely that the system was simply hard-coded to accept only Write and crash spectacularly for any other calls (it’s been a while, and my memory of the subject is fuzzy).

An Invariant

The thing is, since the CLR is going to provide the bodies of internal calls, the set of calls needs to be defined at the time the CLR is built. In theory one could decorate any old method with the internal call attribute, but in practice the results are not going to be useful — effectively, only the assemblies that support the CLR itself can have internal calls. Thus, reflecting over the support assemblies at build time and generating the internal call dispatch machinery is an option — as it turns out, a very attractive option, since it gives us constant time dispatch and compile (actually, link) time enforcement that all internal calls have backing functions in the CLR.

My plan was to simply use the .NET reflection interfaces in an MSBuild task, grab all the appropriate methods, and generate a function that returned pointers to the appropriate implementations based on the method’s index in the assembly (which is now fixed, since the assembly has been compiled). Interestingly enough, however, that won’t work. There are some places where the managed reflection APIs cannot go — apparently since they attempt to load the types to a certain extent, even if you specify a “reflection only load” of the assembly, they will produce errors for some types when they can’t create a mapping to the System.Object type it expects. This happens for a lot of those core types I discussed last time, and obnoxiously enough, most of those types will eventually need internal calls.

COM to the Rescue

It turns out there is a solution, fortunately: there is a native COM API for reflecting CLI metadata without actually loading anything. IMetadataImport in particular has the good stuff.

The API is a little cumbersome to use, but not horribly so, especially if you’ve already been knee-deep in the relational swamp of CLR data tables for a while anyway. I did not want to mess around with more native code than I had to, so I took advantage of the COM interop facilities offered by .NET and built a wrapper. I must give credit to Petrény Zsolt for having some posts in his blog archives that explained the process of wrapping the native reflection APIs quite well. The only thing that really threw me there was that the methods in the wrapper must be defined in the correct order; reordering them broke the interface.

So, with the COM wrangling out of the way, I was able to produce a build task for mapping internal call implementations to their MethodDef objects at construction time. The generated code is, as I hinted at before, a massive switch statement that returns function pointers. The name of the function is suitably mangled such that it can uniquely identify overload functions (it looks something like “internalCall_System_Console_Write_taking_Int32“). The generated code declares the prototypes of those functions so that it can return them, but does not implement any of them — it expects the implementation to be provided elsewhere. That way, if you don’t provide it, you get a linker error, which should help ensure that the CLR loader never lacks support for an internal call defined in its core library.

What’s Next?

In the course of rejiggering all this stuff, I stumbled on two serious problems that have yet to be addressed. First, I have no sane support for strings. Second, even though the internal call system supports it, I can’t actually call overloaded methods via a MemberRef token (which happens when the method being called is not in the executing assembly).

Both of these problems are rooted in the whole “quick hack to get Hello, World on the screen” problem I mentioned above. I’ll probably tackle one or both of these issues in the coming week.

Posted 01/01/1970 in Development.
« The Core CLR Types |