Tuesday, August 19, 2008

Initial Observations on the Entity Framework

After spending a few hours getting acquainted with the Entity Framework (EF) in VS/C# SP1, I've got a few things that I'd like to discuss. In all, I find EF to be a very robust, mostly flexible framework that allows developers to create an ORM data layer very quickly with what seems to be little overhead.

To give you an idea of my tests, I built a simple yet modular db for multiple types of related web applications. For instance, if you are looking at a blog, a forum/announcement board, or Flickr, you're looking at three seemingly disparate applications that have a lot of the same elements in implementation (comments, keywords/tags, and categories, for instance).

With this in mind, consider the following structure:

Yes, it's very basic, but we're not going overboard on this example. I can take this database, and create an EntitySet similar to the following using the built in wizard:

This was extremely easy, and took a few short steps:

  1. From your project, Add >New Item > ADO.Net Entity Diagram. You will only see this option once SP1 is installed if you haven't installed any of the EF Betas or CTFs.

  2. In the Entity Data Model Wizard, pick the Add >New Item >Generate from database option.

  3. A choose your data connection option will appear: either add the db to your project, or add a standard db connection (I'll revise this later if you need to learn the steps to connect to a db, just ask).

  4. Choose all the Tables, Views, and Stored Procedures that you wish to add.
  5. Click Finish, and enjoy your new Entity Model (ModelName.edmx in your project).
So far, things are pretty clean, but I noticed a few gotchas while working with a slightly more complex model:

If you've used any of the EF betas and are looking for the CSDL, MSL, and SSDL files, they are all parts of the .edmx structure. The all in one format is quite a bit easier to manage file-wise (and it makes sense), but it is a little more cumbersome to edit by hand (the corresponding portions of the XML are identified with comments, so it's not a big deal).

Entities have trouble being created without keys, and a simple db design error can set you back quickly. If I were to add a BlogCommentID column as an identity column for the BlogComment table (a so- so- practice, but not uncommon one), it would have to exist in the created Entity (If I want to be able to access the Entity directly).

In one of my initial passes with EF, I had incorrectly mapped the FK from Blog-BlogComment (I went from BlogID->CommentID, not BlogID->BlogID). The EF Wizard allowed this relationship, but didn't allow the valid one to process. Manually adjusting the edmx was an experience, but it worked, at least until I tried to add any new db elements. The automated update Entities from Database process overwrote my changes each time, and it was a real mess.

Of course, the error was my own as far as the relationship in the database, but I have a feeling that purist db design is unfortunately the exception rather than the norm in companies that have been out there for quite a while, and the second that you figure a workaround, it's potentially a breaking change every time you update your schema.

Stored Procedure support is really nice, but can be cumbersome, as you have to create Entities for all stored procedures that return column subsets, unless you start to add null support for all non-important columns in your entities. For instance, if I want BlogTitles for category, I'd usually add a stored procedure that did the dirty work, then make a SQLcommand call that returns an untyped DataSet with my values. The Entity Framework does provide Entity versions of the same classes, but we're using the Entity Framework for data abstraction, so I'd rather have a GetBlogTitlesByCategory that returns a strongly typed class structure built for me. These strongly typed classes need to be added as Entities to the edmx so that they can be effectively mapped to the stored procedure.

It makes sense why this step is necessary (you can't determine data types for columns by looking directly at a stored procedure), but the frustrating portion is that these procedures might be added as function calls into the edmx file (and visible in your schema), but no actual method is created because the Entity return values are not created by the Wizard. You will not see the actual method definitions in the generated classes until you add the Entity(-ies) into your edmx, or you map the return value of the stored procedure to existing Entities (hence the need for nullable support, as the blog example would return blogid, title, and the create date, but not necessarily the body of the blog). A code generation tool for this would be nice, but converting a pre-existing data structure over to EF could turn into a painful exercise (but may help you refactor out similar procedures).

I'll give you another example of the problem: an account table stores a password salt, username, and password hash values. The user, when logging in, passes in the username and password. The business logic of the app takes the password and the salt, and creates the password hash value, and the Account Entity is returned if the username, hash, and salt values all match. I want the actual hashing algorithm independent of the db, but the db will store the salt, which is generated when the account is created. I don't want the Account Entity to contain the Hash or Salt values when the Account is returned, but I also want them to be required on the DB side. All I have to do on the Entity side is delete the autogenerated hash and salt columns, and they are not available for use.

The problem comes when we need to get the salt from the db in order to hash up the password when the user logs in. I have a stored procedure that will return the appropriate salt value in a one row, one column select statement, but I have to create an entity to get the value, and I get an entity back, when all I need is a single string! I tried mapping the function to a scalar, but that required re-writing the stored procedure to fit (not a good option for pre-existing sp's). I've tried setting ReturnType="Collection(String)" in the FunctionImport tag, but the entity designer doesn't create the method for me.

Of course, there's always LINQ to Entities, but I'm not sure that the LINQ query:

string passSalt = (from a in entity.Account
where a.Username == username
select a.PasswordSalt).First();

Or similar is more efficient(Keep in mind LINQ queries are designed to return a collection, not a scalar, so there's an additional level of processing to get the required value in this instance - the First method), and it forces me to keep at least the PasswordSalt in the return set.

I'm sure that I've missed something in my searching, or am simply expecting too much from the first production iteration of EF, but it would make sense that the Stored Procedure return values conundrum be just a little better documented in MSDN.

I'll be doing quite a bit more with EF over the next few weeks, and despite the hurdles, I can't help but like the abstraction that is created with EF, and I feel that it will make app scaling and changing schemas MUCH nicer (in relation to straight DB mapping, not other ORM tools), as long as the EF is flexible enough to fit the DB in its current state without too much effort.


Griff Townsend said...

From MSDN:

If the Return Type is set to a simple type, Visual Basic or C# is not automatically generated for the Function Import.

I'm going to have to build an extension class that will handle this, or I'll be taking a look at the EFExtensions framework.

Anonymous said...

How would you actually write that code yourself? I can into the problem with a stored procedure that returns int. ExecuteFunction returns an ObjectResult T, where T has to be enumerable; which int isn't. Do I always have to go for e.g. a List int even though I know the result is a scalar?

Griff Townsend said...

Well, the solution I came up with used extension methods. In general, if you are returning a single value in LINQ, the First() or Single() methods work. Single will throw an exception if you're returning a list, however.

Actually, if you take the extension methods, it's also possible to extend them to be of T. I'm actually using a version that is more strongly typed than the one mentioned in the blog.

Anonymous said...

Thank you! Yill try those extensions.