Partitioning Your Code Base Through .NET Assemblies and Visual Studio Projects

Should every Visual Studio project really be in its own assembly? And what does 'Copy Local=True' really mean? Patrick Smacchia is no stranger to large .NET projects, and is well placed to lay a few myths to rest, and gives some advice that promises up to a tenfold increase in speed of compilation.

This article is aimed at

  • Providing a list of DO and DON’T when it comes to partitioning a code base into .NET assemblies and Visual Studio projects.
  • Shedding light on.NET code componentization and packaging.
  • Suggesting ways of organizing the development environment more effectively.

The aim of this is to increase the speed of .NET developer tools, including VS and C#/VB.NET compilers, by up to an order of magnitude. This is done merely by rationalizing the development of a large code base. This will significantly increase productivity and decrease the maintenance cost of the .NET application .

This advice is gained from years of real-world consulting and development work and has proved to be effective in several settings and on many occasions..

Why create another .NET assembly?

The design of Visual Studio .NET (VS) has always encouraged the idea that there is  a one-to-one correspondence between assemblies and VS projects. It is easy to assume from using the Visual Studio IDE that  VS projects are components of your application; and that you can create projects at whim, since by default, VS proposes to take care of the management of project dependency.

Assemblies, .exe and .dll files, are physical units. They are units of deployment. By contrast, a component is better understood as logical unit of development and testing. A component is therefore a finer-grained concept than an assembly: An assembly will typically contain several components.

Today, most .NET development teams end up having hundreds, and possibly thousands, of VS projects.  The task of maintaining a one-to-one relationship between assembly and component will have these consequences

  • Developers’ tools will slow down by up to an order of magnitude . The whole stack of .NET tooling infrastructure, including VS and the C# and VB.NET compilers, work much  faster with fewer larger assemblies  than it does with many smaller assemblies.
  • Deployment packaging will become a complex task and therefore more error-prone.
  • Installation time and application start-up time will increase because of the overhead cost per assembly.
  • In the case of an API whose public surface is spread across several assemblies, there will be latency because of the burden for client API consumers to figure out which assemblies to reference.

All these common .NET development problems are a consequence of the usage of a physical object, an assembly, to implement a logical concept, a component. So, if we shouldn’t automatically create a new assembly for each component, what are the good reasons to create an assembly?  What common practices don’t constitute good reasons to do so?

Common valid reasons to create an assembly

  • Tier separation, if there is a requirement to run some different pieces of code in different AppDomains, different processes, or different machines. The idea is to avoid overwhelming the precious Window process memory with large pieces of code that are not needed. In this case, an assembly is especially created to contain shared interfaces used for communication across tiers.
  • AddIn/PlugIn model, if there is a need for a physical separation of interface/factory/implementation . As in the case of  Tier Separation, an assembly is often dedicated to contain shared interfaces used for communication across the plugin and its host environment.
  • Potential for loading large pieces of code on-demand: This is an optimization made by the CLR: assemblies are loaded on-demand. In other words, the CLR loads an assembly only when a type or a resource contained in it is needed for the first time. Because of this, you don’t want to overwhelm your Window process memory with large amounts of code that are seldom if ever required.
  • Framework features separation: With very large frameworks, users shouldn’t be forced to embed every feature into their deployment package. For example, most of the time an ASP.NET process doesn’t do some Window Forms and vice-versa, hence the need for the two  assemblies System.Web.dll and System.Window.Forms.dll. This is valid only for large frameworks with assemblies sized in MB.
    A quote from Jeremy Miller, renowned .NET developer, explains this perfectly:
     Nothing is more irritating to me than using 3rd party toolkits that force you to reference a dozen different assemblies just to do one simple thing.
  • Large pieces of code, that don’t often evolve (often automatically generated code) can become a drain on developer productivity if they are continuously handled in the developer environment. It is better to isolate them in a dedicated assembly within a dedicated VS solution that only rarely needs to be opened and compiled on the developer workstation. 
  • Test/application code separation. If only the assemblies are released rather than the source code, it is likely that tests should be nested in one or several dedicated test assemblies.

Common invalid reasons to create an assembly

  • Assembly as unit of development, or as a unit of test. Modern Source Control Systems make it easy for several developers to work simultaneously on the same assembly (i.e the same Visual Studio project). The unit should, in this case,  be the source file. One might think that, by having fewer and bigger VS projects, you’d increase the contentions on sharing VS .sln, .csproj and .vbproj files. But as usual, the best practice is to keep these files checked-out just for the few minutes required to tweak project properties or add new empty sources files.
  • Automatic detection of dependency cycles between assemblies by MSBuild and Visual Studio. It is important to avoiding dependency cycles between components,  but you can still do this if your components are not assemblies but sub-set of assemblies. There are  tools such as  NDepend which can detect dependency cycles between components within assemblies.
  • Usage of internal visibility to hide implementations details. The public/internal visibility level is useful when developing a framework where it is necessary to hide the implementation details from the rest of the world. Your team is not the rest of the world, so you don’t need to create some assemblies especially to hide some implementations details to them.
    In order to prevent usage and restrict visibility of some implementation details, a common practice is to define some sub-namespaces named Impl, and use tooling like NDepend or others to restrict usage of the Impl sub-namespaces.

Merging assemblies

There are two different ways to merge the contents of several assemblies into a single one.

  • Usage of the ILMerge tool to merge several assemblies into a single one. With ILMerge, merged assemblies are losing their identity (name, version, culture, and public key).
  • Embedding several assemblies as resources in a single assemblies, and use the event AppDomain.CurrentDomain.AssemblyResolve to extract assemblies at runtime. The difference with ILMerge is that all assemblies keep their identity.

This doesn’t solve those problems that are due to there being too many VS projects, thereby causing  a significant slowdown of VS, and the compiler’s execution time.

Reducing the number of assemblies

Technically speaking, the task of merging the source code of several assemblies into one is a relatively light one that takes just a few hours. The tricky part is to define the proper new partition of code across assemblies. At that point you’ll certainly notice that there are groups of assemblies with a high cohesion. These, are certainly candidates to be merged together into a single assembly.

By looking at assemblies dependencies with a Dependency Structure Matrix (DSM) such as the one of NDepend, these groups of cohesive assemblies form obvious squared patterns around the matrix diagonal. Here is a DSM taken on more than 700 assemblies within a real-world application:

1237-clip_image002.jpg

 

Increase Visual Studio solution compilation performance

You can use a simple technique to reduce the compilation-time of most real-world VS solutions by up to an order of magnitude, especially when you have already merged several VS projects into a few.

On a modern machine, the optimal performance of the C# and VB.NET compiler is about of 20K logical Lines of Code per second, so you can measure the room for improvement ahead. A logical Lines of Code (Loc) represents a Sequence Point. A sequence point is the code excerpt highlighted in dark red in the VS code editor window, when creating a breakpoint. Most of .NET tools for developers, including VS and NDepend measure Lines of Code through sequence points.

By default, VS stores each VS project in its own directory. Typically VS suggests the folder hierarchy for a project, named here MyVSProject:

1237-clip_image004.jpg

A VS solution typically has several VS projects and, by default, each VS project lives in its own directory hierarchy. At compilation time, each  project builds its assembly in its own bin\Debug or bin\Release directory. By default,  when a project A references a project B, the project B is compiled before A. However,  the assembly B is then duplicated in the bin\Debug or bin\Release directory of A. This duplication action is the consequence of the value ‘True’ having been set by default  for the option ‘Copy Local‘ of an assembly reference. Whereas it makes sense for a small solution, it will soon cause problems for larger applications

1237-clip_image006.jpg

As the size and complexity of solutions increase, the practice of duplicating assembly at compilation time is extremely costly in terms of performance. In other words:

 

Copy Local = true is evil

1237-clip_image008.jpg

Imagine a VS solution with 50 projects. Imagine also that there is a core project used by the 49 others projects. At Rebuild-All time, the core assembly will be compiled first, and then duplicated 49 times. Not only this is a huge waste of disk, but also of time. Indeed, the C# and VB.NET compilers don’t seem to have any checksum and caching algorithm to pinpoint whether the metadata of an assembly has already been parsed. As a result, the core assembly has its metadata parsed 49 times, and this takes a lot of time, and can actually consume most of the compilation resources.

From now on, when adding a reference to a VS project, make sure first, to add an assembly reference, and second, make sure that Copy Local is set to False (which doesn’t seem to be always the case)

1237-clip_image010.jpg

A slight drawback to referencing directly assemblies directly rather than their corresponding VS projects, is that it is now your responsibility to define the build-order of VS projects. This can be achieved through the Project Dependencies panel:

1237-img2D.jpg

Organize the development environment

When  Copy Local  is set to true, the top level assemblies, typically executable assemblies, automatically end up with the whole set of assemblies that they used  being duplicated in their own .\bin\Debug directory. When the user starts an executable assembly, it just works. There is no FileNotFoundException since all the assemblies that are needed are in the same directory.

If you set ‘Copy Local = false’, VS will, unless you tell it otherwise, place each assembly alone in its own .\bin\Debug directory. Because of this, you will need to configure VS to place assemblies together in the same directory. To do so, for each VS project, go to VS > Project Properties > Build tab > Output path, and set the Ouput path to ..\bin\Debug for debug configuration, and ..\bin\Release for release configuration.

1237-clip_image015.jpg

Now that all assemblies of the solution reside in the same directory, there is no duplication and VS works and compiles much faster.

Organisation of Assemblies

If there are many library assemblies and just a few executable assemblies, it might be useful to display only executable ones in the output directory ..\bin\Debug (and in the ..\bin\Release one as well). Library assemblies are then stored in a dedicated sub directory ..\bin\Debug\Lib (and  ..\bin\Release\Lib). This way, when the users browse the directory, they only see the executables without the dll assemblies and so can start any executable straight away. This is the strategy we adopted for the three NDepend executable assemblies:

1237-clip_image017.jpg

If you wish to nest libraries in a sub-lib directory, it is necessary tell the CLR how to locate, at run-time, library assemblies in the sub-directory .\lib. For that you can use the AppDomain.CurrentDomain.AssemblyResolve event, this way:

In this piece of code you will notice how we

  • construct the path of the sub-directory, by relying to the properties: Assembly.GetExecutingAssembly().Location
  • bind the event AppDomain.CurrentDomain.AssemblyResolve immediately in the Main() method, and then call a SubMain() method.
  • need the SubMain() method, because if libraries types are called from the Main() method, the CLR tries to resolve libraries assemblies even before the method Main() is called, hence, even before the event AppDomain.CurrentDomain.AssemblyResolve is binded.

Instead of relying on the AppDomain.CurrentDomain.AssemblyResolve event, it is also possible to use an executableAssembly.exe.config file for each executable. Each file will redirect the CLR probing for assemblies to the sub-directory .\Lib. To do so, just add a Application Configuration File for each executable assembly, and put the following content in it:

 

The solution involving the event AppDomain.CurrentDomain.AssemblyResolve is usually preferable because it avoids having to deploy the extra files in the redistributable.

Test assemblies organization

The other advantage to concentrating all assemblies of your application in the same..\bin\Debug directory is that test assemblies can be put under the ..\bin directory. This way, at test execution time, tests assemblies are running the assemblies of the application directly under the ..\bin\Debug  directory. Therefore, it is not necessary to duplicate the application assemblies just for tests.

 To do so, we suggest that you use the Application Configuration File trick we’ve just described, to redirect CLR probing to sub-directories of the ..\bin directory. If you’ve put your NUnit assemblies (or equivalent) in a ..\bin\NUnit directory, the probing XML element of the Application Configuration File for each test assembly (that can be a library or an executable assembly) will look like.

The list of directories listed in the privatePath XML attribute can only be a sub-directory of the current directory. This is why the directory ..\bin that contains all application assemblies in sub-directory, is well suited to contain test assemblies.

VS Solutions files and Common build actions

When the code base reaches a particular size, the user is forced to spread the code across several VS solutions. This is because, even with few VS projects and a solid hardware, VS slows down significantly when hosting a VS solution with more than 50K LoC.

We recommend that you put all the VS solutions files in the same root directory, the one that contains the .\bin folder described above. If each VS solution’s files are stored in its own  hierarchy of folders, it quickly becomes a considerable task to find them.

It saves a great deal of time if you put a few .bat files in this root folder  that execute common build actions. This way, the developer is one-click away from:

  • Rebuilding all in Debug mode
  • Rebuilding all in Release mode
  • Running all tests and Producing a report
  • Running all tests with code coverage and Producing a report
  • Rebuilding all in Debug mode, Running all tests and Producing a report

Rebuilding a solution file with a .bat file is as simple as writing in the .bat file:


Guidelines

  • Reduce drastically the number of assemblies of your code base.
  • Create a new assembly only when this is justified by a specific requirement for physical separation.
  • In a Visual Studio project, use ‘reference by assembly’ instead of  ‘reference by Visual Studio project’.
  • Never use the Visual Studio referencing option ‘Copy Local = True’.
  • Put all VS solutions and Build action .bat files in a $rootDir$ directory.
  • Compile all assemblies in directories: $rootDir$\bin\Debug and $rootDir$\bin\Release
  • Use the directory $rootDir$\bin to host tests assemblies.

For more articles like this, sign up to the fortnightly Simple-Talk newsletter.

Tags: ,

  • 70275 views

  • Rate
    [Total: 64    Average: 4.2/5]
  • Per Salmi

    Third Party assemblies and references?
    Ok, this seems very interesting.

    But what to do with third party assemblies and the copy local = false setting?

    And when you say references should be added as ‘reference by assembly’ do you create the reference to the bindebug or binrelease assembly then?

  • Patrick Smacchia

    Re: Third Party assemblies and references?
    Good questions indeed. For third-party assemblies, put them in a $rootDir$binthirdparty.

    Then during your build process, before compiling copy all of them in $rootDir$binDebugLib and $rootDir$binReleaseLib. (and before that make sure to clean all the $rootDir$binDebug and Release dirs).

    You have the choice to reference third-party assembly from VS project, from $rootDir$binthirdparty or $rootDir$binDebugLib (the first option is a bit cleaner, since in this dir assembly are like more ‘immutable’).

    >And when you say references should be added as ‘reference by assembly’ do you create the reference to the bindebug or binrelease assembly then?

    All dev related things (tests, contracts…) are mostly concern with Debug build. So what we do is that we reference assemblies from $rootDir$binDebug and $rootDir$binDebugLib, even in Release VS configuration.

    Notice that we also have an obfuscation phase, that occurs after all compilation process. In that case of course assembly referencing is made in a special $rootDirbinReleaseObfuscated dir, but it is the work of the obfuscator to do this referencing automatically, at obfuscation time.

  • Anonymous

    Round peg square hole
    It sounds like a tooing issue,sadly. It you have a large complex archutecture it really seems that yoy should use something else.

  • T

    Any Special instructions for Silverlight and Ria?
    Sounds Interesting…

    How would this idea apply to Silverlight Projects using assembly caching?

    Also any idea how to improve things when using Ria / Entity Framework to do the model and code generation?

  • Patrick Smacchia

    Re: Any Special instructions for Silverlight and Ria?
    I am not familiar with Silverlight assembly caching technology. But if I get it well, it is a technology to load Silverlight core runtime extra assemblies on-demand. Thus extra Silverlight core assemblies are referenced and downloaded on-demand automatically, isn’t it? How is it related with the topic of the article?

  • Frank Quednau

    http://realfiction.net
    Just a remark that may be a resharper limitation rather than anything else…when copy = false in project reference properties then a test project fails to see the project reference it is coded against when using the resharper unit test runner.

  • Patrick Smacchia

    Re: http://realfiction.net
    Thanks Frank for the feedback. I use extensively R# but don’t use the R# testing features (still faithfool to TestDriven.NET).

    Could you report that to the R# team? I don’t think it’d be difficult for them to fix that, the referencing info is still here, it is just a matter of being a bit smart by comparing assembly referenced and VS project outputed assembly name.

  • Dan

    Many LoC
    Patrick, this is a great article, very comprehensive and well written.

    One question though, when you say “VS slows down significantly when hosting a VS solution with more than 50K LoC,” do you mean the IDE or the compile time?

  • Patrick Smacchia

    RE: Many LoC
    Dan, glad you found this content useful!

    >when you say “VS slows down significantly when hosting a VS solution with more than 50K LoC,” do you mean the IDE or the compile time?

    I have a powerful machine, with SSD disk and everything, and in my experience, when a solution host more than 50K LoC, slows down significantly at:
    sln load time, compile time, but also at code edit time as well.

    I guess having R# hosted in my VS’s bias this impression and a naked VS should be faster.

    Also, my VS’s also hosts NDepend, but NDepend begins to slow down at higher threshold, like 500 KLoC.

  • Dan

    Many LoC
    Patrick, this is a great article, very comprehensive and well written.

    One question though, when you say “VS slows down significantly when hosting a VS solution with more than 50K LoC,” do you mean the IDE or the compile time?

  • David Brabant

    “References by assemblies” instead of “References by projects”
    Could you elaborate on why one should do that? I was convinced that the opposite was they way to go.

  • Patrick Smacchia

    http://www.NDepend.com
    >When you say “VS slows down significantly when hosting a VS solution with more than 50K LoC,” do you mean the IDE or the compile time?

    Dan, I mean that with a fast machine, having VS + popular addins like R# and NDepend, for more than 50KLoC VS slows down too much for sln loading time, compilation, and even others common task (code edition, some debugging operations…).

    To sustain the developer productivity, the IDE shouldn’t freeze out for more than 5 seconds. Else the temptation for the developer is too high to get distracted (coffe, facebook, email…)

  • Patrick Smacchia

    http://www.NDepend.com
    >”References by assemblies” instead of “References by projects” Could you elaborate on why one should do that?

    David, I am a firm believer that developers should understand 100% a technology. The problem with “References by projects” is the default VS behavior: assemblies get compiled in their own dir and CopyLocal is set to true. Then the developer needs to fix this wrong behavior.

    By using “References by assemblies”, the developer gets closer with the grunt work of fixing the VS default behavior. There is no apparent magic.

    Also, as stated in the article, with “References by assemblies” the developer needs to fix the Project Dependencies/Build Order manually. Some might see there a loss of productivity. Personally I prefer doing so, to gain more understanding about how VS is working with my code base.

  • TriSys

    Copy Local = true is evil
    Patrick, you are a star – many thanks for this article. This has certainly helped with some of our VS2010 woes.

    Regards, Garry@TriSys

  • kubs

    References by assemblies
    Hi Patrick,
    I’m not sure I understand the “reference by assembly” issue. Do you mean that I should add a reference to an assembly by browsing to its output location (using ‘Add Reference’–>’Browse’ dialog)? if so, how do you distinguish between release/debug modes?

    Thanks!

  • DoingWhatEyeDo

    Another issue – Obfuscation
    Another issue with multiple assemblies to add under the list in section “Why create another .NET assembly?” is obfuscation. The more assemblies you have the more effort you have to put into obfuscation development and testing.

  • Patrick Smacchia

    http://www.NDepend.com
    >I’m not sure I understand the “reference by assembly” issue. Do you mean that I should add a reference to an assembly by browsing to its output location (using ‘Add Reference’–>’Browse’ dialog)?

    Yes, you need to browse the Debug version of the assembly, that typically contains the public API of the Release version of the assembly.

    >if so, how do you distinguish between release/debug modes?

    There is no need to distinguish. A Release assembly is compiled by referencing the Debug assemblies already compiled. This doesn’t provoke any problem since the Debug version of an assembly referenced contains the public API of the Release version of the assembly .

  • Patrick Smacchia

    http://www.NDepend.com
    >The more assemblies you have the more effort you have to put into obfuscation development and testing.

    100% agree. Whatever the obfuscation technology used, it is a sensitive process to set up the obfuscation process right. And the more assemblies are obfuscated, the more error prone is this process.

  • Anonymous

    CopyLocal=false AND Output Path?
    In my experience, once you’ve configured the output path to be a common folder it’s not necessary to also set CopyLocal=false. VS is intelligent enough to see that the referenced project assembly has already been placed in the output path and so doesn’t copy it again.

    Am I wrong?

  • it3xl.ru

    Tip for Silverlight.
    It is simple for Silverlight.
    You should use another shared folder for projects under the “Silverlight .Net Framework”.
    For example the “..bin” and any “..binSilverlight” for Silverlight project and his class libraries.
    Usually Silverlight means other tier (Cllient logic) in same solution with a Web-Server tier (ASP.NET). That is why you must use different shared folders too.

  • juanagui

    Project reference + copy local false
    I’m using project references w/ copy local false, hence getting the build order for free plus no copies. Imho, this is the best combination, any reasons why you recommend using assembly references?
    Thanks,
    Juan

  • juanagui

    Project reference + copy local false
    I’m using project references w/ copy local false, hence getting the build order for free plus no copies. Imho, this is the best combination, any reasons why you recommend using assembly references?
    Thanks,
    Juan

  • mglaze

    Confusing and misleading
    Hi Patrick, thank you for your informative article. I would like to register my complaint that the information presented here is confusing and misleading. I have just joined a development team (of 20+ developers on a fairly large codebase) who has apparently based their entire development process on the recommendations you’ve made in this article. Let me tell you: it is a disaster. One of the many disastrous results of this approach is that to work on this team, resharper is more or less a requirement – as it is the only way to effectively get around many of the limitations in the IDE due to file references.

    Your recommendations lead to confusion between the issue of copylocal=true/false and the issue of file/project references – these are two unrelated things. Just because project references give you copylocal=true by default doesn’t mean you must switch to file references to get copylocal=false.

    If you google for “project references vs file references” you will find numerous pieces of evidence, almost all supporting the proper usage of project references – so I won’t bother to repeat all of the reasons for that, as it is easily found. But suffice it to say, besides the well established majority opinion you can find on google, you will also find among the top few links Microsoft’s own recommendations, which have been to use project references for projects within solutions – since their original best practices articles in 2002, and into their latest documentation for VS 2012. I’m certainly not one to take all advice from Microsoft directly without considering better approaches – but in this case it seems to be fairly well accepted by the developer community as a whole, except for a few select individuals who have been misled by some of the ideas presented here.

    Your ideas about reducing the number of assemblies, optimising compile time, etc. are all appreciated – but throwing file references into that mix for no explainable reason is just confusing and misleading. It would be great if you would address the file references issue on its own, and perhaps either justify or retract that particular part of this statement. Thanks!

  • pckujawa

    Another vote for project references
    I found this article quite informative, but I have to agree with mglaze (Confusing and misleading) about project references. (And for completeness, http://msdn.microsoft.com/en-us/library/Ee817674(pandp.10).aspx is a link to MS’s own best practices on structuring solutions and projects.)

    I tried to implement the advice in this article and ran into a number of issues (one of which, as mentioned by other commentors, is what to do about separating debug and release, because “release is not debug” (http://www.hanselman.com/blog/ReleaseISNOTDebug64bitOptimizationsAndCMethodInliningInReleaseBuildCallStacks.aspx)). In the end, I’ve found that using project references makes life so, so much easier. But I think most of the rest of the advice in the article is great.