11 July 2014

Catching Bad Data in Entity Framework

Any website that rejects the users' input without giving enough information to correct what they're doing is doomed to be unpopular. Entity Framework offers three different ways of validating data before writing it to the database. As well as describing how to harness these validation methods Jon Smith shows how capture these errors to make the feedback to the user less cryptic.

Most of the time, bad data gets to a system because our users get things wrong with data entry. For them, we need to provide meaningful error messages to help them quickly correct their input otherwise they will get annoyed or, on ecommerce sites, they may give up and not buy anything. There are other ways that bad data will arrive: it could mean an attack by hackers, for example. Whatever the cause, we need to prevent bad data reaching the database.

Entity Framework’s (EF) data validation is part of the solution for catching bad data in an application. By default, EF validates all data before it is written to the database, using a wide range of data validation methods. However, because EF comes after the user-interface data validation, its error feedback is often relegated to catching any exceptions that  EF throws and showing a generic ‘there was a problem’ message. I want to show you how you can use EF’s data validation to improve your error checking and how to pass EF’s error messages back to the user. That way you have a cleaner database and, hopefully, less frustrated users.

The many levels of EF’s Data Validation

In EF there are four possible checks that can be applied to data before it is written to the database. Three of these are inside EF and are easy to set up while the last one involved capturing errors reported by the underlying database.

The four EF data validations are:

  • Data Annotations, e.g. [MinLength(100)] to check the length of a string.
  • IValidatableObject. This is a method where to can code your own checks.
  • ValidateEntity. A DbContext method that allows access to the database while checking.
  • DbUpdateException Capturing database-generated errors

By default, EF checks for all three of these data validations on any entry that has been added or modified. If found it will run the requested validations and only commit the database update if all checks pass. 

To help you understand how to use each of these techniques, let me go through each of them in turn with an example.

Data Annotations

Data annotations are attributes that add validation rules to a property or class (see System.ComponentModel.DataAnnotations). These data annotations define rules that a separate validation stage checks against the current content of the property. Many of the .NET frameworks include validation via data annotations, like Windows 8 apps and ASP.NET MVC. (see Dino Esposito’s article on data annotations in MVC https://www.simple-talk.com/dotnet/asp.net/asp.net-mvc-annotated-for-input/). Like these other .NET frameworks EF also validates data against any data annotations before it commits that data to the database.  

Below is a simple example of a database to hold blog posts where we have a Tag class.. Tags are often added to Posts to indicate which topics the post is about, e.g. ‘SQL Server’ or ‘EF6’. I have also included a property called Slug, which in this context is the part of a URL which identifies a page using human-readable keywords.

The data annotations are the items in square brackets above the properties it relates to. [MaxLength] is fairly obvious, in that it sets the maximum length of the string (there is also a MinLength too). [Required] means that the property cannot be blank and [RegularExpression] allows more complex tests on strings.

I should also point out that in EF Code First, where you define the database through code, some of these attributes also define the table structure, e.g. [Required] means the column will not be NULL and [MaxLength] sets the size of the string in the database.

IValidatableObject

IValidatableObject is an interface that can be applied to any class. You then add a ‘Validate' method which then allows more complex tests to be written than can be achieved the data annotations. Again, the same validation stages in MVC, EF etc. that check data annotations will also validate any classes that have the IValidatableObject interface. Let’s look at a simple example by adding a Validate method to our Tag class:

As you can see. you have access to the properties of the class instance you are checking. In this sample test I check that the slug relates to the Name of the tag, so a Tag with a Name of “News HeadLines” could have a Slug of “news”, but not of “ReadAllAboutIt”.

As you can imagine, you can write almost any test you like as long as it only needs to access members of the class. I have personally used it to check some complex data where the state of one property in a class depends on the state of another.

The really nice thing about IValidatableObject is because many other .NET frameworks use this, such as ASP.NET MVC then any tests you write here are also checked closer to the user interface, so feedback is quicker.

ValidateEntity in EF’s DbContext

ValidateEntity is more complex than the previous validation procedures, but it can check things that nothing else can. Therefore it is a good tool to have in your armoury. If you are using Code First EF then you will be defining your own DbContext to put your DbSets in. It is in your DbContext that you can override the ValidateEntity method and add your own tests. The powerful thing about ValidateEntity is that you can access the current DbContext, which means you can look at the database when doing your validation.

Below is a check to see if the Slug in our Tag class is unique, which is actually something you would want to do on a real Tag if you were using the slug to make a unique URL.

If you look at the test below you can see that I have to check that the entity is a Tag and that it is being added or updated. If it is then I can check that no other tags, other than itself, have that slug.

One of our readers has pointed out that this check is insufficient in a multi-user environment because the test and commit are not simultaneous. This is quite true and the database would need some form of UNIQUE constraint to ensure the database was correct. In the next section I show how database generated errors can also be captured, although the error messages may not be quite as clear as the one above.

DbUpdateException during SaveChanges

EF works on top of a database and that database will have its own data integrity rules, which are the ultimate arbitrator of whether the database is changed. When EF’s SaveChanges() or SaveChangesAsync() methods are called if the database encounters any errors then the whole commit is rolled back and the errors are passed back via an exception. By capturing this exception we can gain access to the database specific errors. However it is quite complex to unravel this error information so let me describe the parts first, using SQL Server as our target database provider.

Because EF is designed to work with multiple database providers then the structure of the System.Data.Entity.Infrastructure.DbUpdateException is layered.

  1. DbUpdateException: This says that saving changes to the database failed. It contains a list of all entries that were in this failed commit. Its inner exception is System.Data.UpdateException.
  2. UpdateException: This says that the modifications could not be persisted to the database. Its inner exception is dependent on type of database it is connected to. In this example SQL Server returns a System.Data.SqlClient.SqlException. A different exception type would be returned for other database providers such as MySQL.
  3. SqlException contains information on all the errors that occurred inside the SQL database. There are a number of ways of accessing this information via the SqlException's Number, Errors properties or the more general exception Data property.

There are various problems with decoding these errors. First and foremost these are internal database errors and therefore are a) not user friendly and b) may contain information on the internals of the system which we do not want to reveal. For this reason we need to interpret them and turn them into useful messages. However this can be a challenge as, unlike the three previous approaches, here we cannot be sure exactly what entity or property each error is linked to.

Below is an example that traps two types SQL server error. The first, 516, is a general constraint error that can be triggered by a number of issues, for instance when the user tries to delete a row that another entity links to by a non-nullable foreign key. Because it could happen in a number of ways it is difficult to come up with a good user error message. The second, 2601, is a much more specific error about a duplicate key so we can do a much better job with the error message.

These two sql errors show some of the challenges of decoding database errors and turning them into useful user feedback. There are multiple ways we could improve the code above to provide better error messages but as good project leaders or developers we need to look for the best effort/reward. I would suggest that sometime it would be easier to add duplicate EF checks before we commit, like the test for uniqueness of the Slug in the ValidateEntity section, as these have more information with which to produce a useful message.

Why should I bother passing EF errors back to the UI?

Most of the examples I see of using EF with ASP.NET MVC normally just lets DbContext‘s SaveChanges() throw an exception and show a generic ‘there was an error’ message. However in my applications I catch the EF errors and show them to the user. That way the user gets more meaningful error message that they can act on. This is more work for the developer, so let me make the case as to why you should bother.

Firstly, if you add tests using DbContext‘s ValidateEntity then your specific error messages will be fed back to the user. In our case a message saying “The Slug on tag ‘xxx’ must be unique.” is a pretty important message for the user to see.

However the main reason I pass back EF error messages is because I often have to use Data Transfer Objects (DTO) to ‘reshape’ data between the database and the user. The side effect  of using DTOs is that some errors are only found when EF tries to update the database. In that case I need to make sure the useful error messages are not lost, but passed to the user.

DTOs are a common way of solving the mismatch between the database orbusiness objects and the user interface. This isn’t the place for a full explanation of system design patterns and DTOs (see Dino Esposito article on DTOs by Dino Esposito http://msdn.microsoft.com/en-us/magazine/ee236638.aspx). Instead I will give you an example that I hope shows you why DTOs are useful and why they mean some errors are only caught by EF.

Below is a class definition of the Post database entry. This post class has links to two other data classes: The Blogger of the post, which is the Author, and the Tags which have been assigned to the Post.

Now if we want to allow an authorized user to edit the Blogger and Tags properties of a Post then we need to add some more information, and maybe hide the Blogger and Tags properties from the UI as they can’t be displayed. We therefore  create another class like I have listed below. This type of class is often called a DTO.

As you can see, we have taken the original Post class and added two new properties, Bloggers and UserChosenTags, and removed the Post’s Blogger and Tags properties. The Bloggers list is filled in with all the possible authors and the UserChosenTags list is filled in with all the possible tags. Once the user has chosen the Blogger and Tags the DTO converts these back to the Post’s BlogId and Tags properties and then EF can save the changed Post entry.

This is a very typical operation in any application which needs a DTOs or similar adapter to link the user interface to what is in the database. However the effect is that the actual Post class is not seen by the user interface so any data errors inside the Post class, like ensuring there is a valid author attached, will only be caught by EF when it write the data to the database.

In fact for the analytical applications I work on I would say 95% of my data goes through DTOs. This very often means there are properties that exist in the data classes that are never seen by the user interface layer, so they cannot be validated at the user level. Therefore the first check on those properties is done by EF, and it might find a problem. This is why I always pass EF validation errors back to the user interface so that the user gets a meaningful error message they can act on.

Passing EF errors back to the user interface

Having made the case for passing EF errors back up to the user I will now describe one way this can be implemented.  BeBelow I show a new method inside DbContext called SaveChangesWithValidation, which returns a status. This needs two methods, the SaveChangesWithValidation and a supporting class I have called EfStatus. The code is as follows.

And a the new method SaveChangesWithValidation inside your DbContext

As you can see, the EfStatus stores the errors in the standard ValidationResult format. I use this format because data annotations and IValidateableObject produces ValidationResult, so your application most likely uses this type already. In my ASP.NET MVC applications I add  a method that copies the errors to the model state of the class to be reshown to the user so that the EF errors appear on their screen.

Summary

It is important to check that the data that you want to write to the database is correct. EF provides a number of independent features for making comprehensive checks on data before it is written to the database. Now you know what options you have for data validation in EF, you might like to consider what data would be a problem if it was written to the database. When you’ve determined what should be prevented from reaching the database, you can choose the appropriate approach to stop that vulnerability. A properly-written SQL Server database will have its own system of constraints as well, but this would never be a substitute for what I’ve described. Both together provide defence in depth against bad data.

I believe I have made a case for passing the error messages that are generated by EF’s validation back up to the user interface. If you are working on an existing project, then  that might be hard to retrofit: On the other hand a little thought should allow you to design this into  new projects using methods like those listed in this article.

You may be interested in visiting a live ASP.NET MVC5 web site at http://samplemvcwebapp.net/ from which most of these examples were taken. It contains some fictitious blog posts you can edit and includes links to the application code on GitHub.  This is part of an open-source project that I am currently building to help developers quickly build applications with good error checking and user feedback.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 54438 times – thanks for reading.

  • Rate
    [Total: 37    Average: 4.4/5]
  • Share

Jon Smith

View all articles by Jon Smith

Related articles

Also in .NET

Posting Form Content via JavaScript

Web-based applications run smoother if instead of using the traditional form method, they use JavaScript to post data to the server and to update the user interface after posting data: It also makes it easier to keep POST and GET actions separated. SignalR makes it even slicker; it can even update multiple pages at the same time. Is it time to use JavaScript to post data rather than posting via the browser the traditional way?… Read more

Also in .NET Framework

What's New in C# 6

The C# language itself has changed little in version 6, the main importance of the release being the introduction of the Roslyn .NET Compiler Platform. However the New features and improvements that have been made to C# are welcome because they are aimed at aiding productivity. Paulo Morgado explains what they are, and how to use them.… Read more

Also in BI

Relational Algebra and its implications for NoSQL databases

With the rise of NoSQL databases that are exploiting aspects of SQL for querying, and are embracing full transactionality, is there a danger of the data-document model's hierarchical nature causing a fundamental conflict with relational theory? We asked our relational expert, Hugh Bin-Haad to expound a difficult area for database theorists.… Read more

Also in Database

SQL Server System Functions: The Basics

Every SQL Server Database programmer needs to be familiar with the System Functions. These range from the sublime (such as @@rowcount or @@identity) to the ridiculous (IsNumeric()) Robert Sheldon provides an overview of the most commonly used of them.… Read more
  • vsoares

    Localization
    I find your article to be great but everytime a read an article about this subject the same question pops up into my mind: what about string localization? How do return error messages from EF to UI with string localization. Should we return errorcodes and the description of the error in the default language? Return only the errorcode?

  • vsoares

    Localization
    I find your article to be great but everytime a read an article about this subject the same question pops up into my mind: what about string localization? How do return error messages from EF to UI with string localization. Should we return errorcodes and the description of the error in the default language? Return only the errorcode?

  • JoNSmith

    Localization
    Hi vsoares,

    Good question. Being Microsoft they have thought long and hard about localization. You might like to look at a recent post by Dino Esposito on localization https://www.simple-talk.com/dotnet/asp.net/shouldnt-your-asp.net-mvc-apps-support-localization/ which covers some of this.

    For the three methods I mention then IValidatableObject and the DbContext Validate Methods are code so can all read strings from a language resource file, or whatever you want. The DataAnnotations need special handling and have localization controls build into them using two parameters, ErrorMessageResourceName and ErrorMessageResourceType. These allow you link to resource file for text.

    I hope that helps.

    Jon Smith

  • sdmcnitt

    Concurrent Users
    Putting unique key validation logic in the application ignores the activity of concurrent users.

    It would be cool if a standard way to surface the errors that the database constraints will raise when violated could be demonstrated.

    This would also allow for any Business Validation rules in database triggers to be shown to the user (custom errors with messages that the user must see such as “SAP Integration error: Field XXX is required for link to HR.”)

    Ed: Jon Smith has now added to the article to cover this point. Many thanks for your suggestion!

  • JoNSmith

    RE: Concurrent Users
    Hi sdmcnitt,

    Yes, you are right that the example I picked for using ValidateEntity in the DbContext should have been handled by database constraints. I was looking for something that was simple and obvious to people, but on reflection it was not the best example.

    However all is not lost, as your question spurred me into revising the article to explain how you can catch database errors and feed then back to the UI by extending the other approaches I showed in the article.

    I hope this helps.

  • AlwaysLearning

    RE: Concurrent Users
    Hi Jon,

    Thank you for your article, though I have found some errors:
    public void SetErrors(IEnumerable<DbEntityValidationResult> errors)
    should be:
    public EfStatus SetErrors(IEnumerable<DbEntityValidationResult> errors)

    And in public EfStatus SetErrors(IEnumerable<ValidationResult> errors):
    errors = errors.ToList();
    should be:
    _errors = errors.ToList();

    I am new to ASP.NET MVC+EF, so please consider me a noob here. Do you have a usage example for SaveChangesWithValidation() in a View Controller? I’m assuming that there’s more to it than the below since I’m not seeing database error messages that have been captured and raised through EfStatus…
    //db.SaveChanges();
    EfStatus efStatus = db.SaveChangesWithValidation();
    if (efStatus.EfErrors != null && efStatus.EfErrors.Count > 0)
    {
    //Do something here?
    return View(model);
    }
    return RedirectToAction("Index");

    Thanks and kind regards,
    Ant.

  • AlwaysLearning

    RE: Concurrent Users
    Derp, I feel so silly now… The database validation errors are now popping out in the "@Html.ValidationSummary(true)" area after I replaced my do something with:

    //using System.ComponentModel.DataAnnotations;
    //using System.Linq;
    efStatus.EfErrors.ToList().ForEach(validationResult => ModelState.AddModelError(String.Empty, validationResult.ErrorMessage));

    It seems that the ValidationSummary area only displays messages for String.Empty – otherwise the messages should appear on specific fields.

    Thanks and kind regards,
    Ant.

  • JoNSmith

    RE: AlwaysLearning
    Hi Ant,

    Glad you worked things out. You might like to look at an open-source project called GenericServices (https://github.com/JonPSmith/GenericServices) which uses SaveChangesWithValidation() inside it. GenericServices is a library for making for data access easier in ASP.NET MVC web applications. There are two example sites, one basic and one quite complex, which might give you some ideas.

    I have also written quite widely about GenericServices, with a couple of articles in simple-talk, see https://www.simple-talk.com/dotnet/asp.net/using-entity-framework-with-an-existing-database–user-interface/ for instance. The GenericServices Wiki has more information too.

  • syamdavid

    Nice Article
    It is one of the excellent article I find. I am novice to entityframework. I have a question. In your article you return an object of type EfStatus from the SaveChangesWithValidation method. For save method, usually there is no return value apart from the EfStatus . How do I implement this for the methods that return a list or a integer? Please let me know. Thank you.

  • JoNSmith

    Re: Nice Article
    Hi syamdavid,

    Glad you found the article useful. Obviously the SaveChanges method doesn’t need to return a value so for this article I used a simple EfStatus. However in my open-source project GenericServices (see https://github.com/JonPSmith/GenericServices) I often want to return a status with a value.

    I have therefore created two version of an EFStatus-like class called SuccessOrErrors. The ordinary SuccessOrErrors does not return a value but the SuccessOrErrors<T> generic class can return any type of value you want. I use these two classes throughout my applications to return useful feedback to the user. You can find these classes at https://github.com/JonPSmith/GenericServices/tree/master/GenericLibsBase/Core .

    I hope that helps.

  • syamdavid

    Re: Nice Article
    Thank you for the quick response. I got the classes and update them. Can you help me how to update the return type object ‘Result’ of ISuccessOrErrors<T> in my business layer? Thanks

Join Simple Talk

Join over 200,000 Microsoft professionals, and get full, free access to technical articles, our twice-monthly Simple Talk newsletter, and free SQL tools.

Sign up

See what's happening behind the scenes

Take a peek at the bowels of the ship – the lower decks – the actual servers of SQL Server Central itself.

See what's happening