Thursday, March 31, 2011

SQL vs CODE, Where is the balance?

As a CRUD developer one of the trade offs that needs to be made is the decision of how much of the work should be done in SQL on the server, and how much ought to be done on the client end in code.

How do you decide where the fulcrum goes? What factors go into your decisions? What kind of mistakes have you made? What works well?

[EDIT] I am kind of surprised in the low volume of responses to this question. I think this is a basic issue for all CRUD programming. Where the balance is set is a trade off between performance and maintainability.

The other issue I didn't really consider is the fact that the domains vary enough, that the answers probably need to be expressed in sections for each domain. That's kind of what I tried to do in my answer.

From stackoverflow
  • For me databases are a necessity due to a technological limitation: main memory is small and does not persist. So for me anything that is not related to persistence is in code, only when I need to get data back and forth I resort to SQL (actually not even that, my DAL does that for me).

  • Depends of the kind of the application. If application is the only one that is using the database, then the code is the master. If tha DB is likely to outlive the application then use DB as much as possible.

  • My decision of code vs. sql typically depends on which seems the easiest/quickest to implement. Once it's there, I move on. Later, if it seems like it should get moved, it gets moved. <shrug> I try not to sweat it too much early on, though.

    If it's something heavily tied to the database -- especially something involving lots of working with database records -- it just makes sense to go into sql. If it's something that's tied closely to the application, it goes there.

  • I try to do databasey things in the database. so, ordering, grouping, filtering. that's all done in the query (usually a sproc). I use code to execute the query and then do whatever we need to the code. I also try to employ the "get the data once, and only once" mentality when it's prudent in order to cut down the round trips to the server.

  • A database is a place to hold multi-million row tables, to read and write effectively against those tables.

    A database is not a place to perform string manipulation or date logic. It is not a place to calculate balances or to generate mailings. These concerns are better addressed with other tools (such as code).

    rmz : Agreed completely :)
    annakata : But nor is "CODE" the place to do data aggregation or querying. Right tool for the right job.
    David B : If "read and write effectively against those tables" covers "data aggregation or querying", we're on the same page.
    Nathan Koop : @David B. When you state "not place to perform... date logic" do you mean filtering? For instance, I'd certainly use SQL to determine all orders placed in Feb. Do you agree?
    David B : That (orders in Feb) is clearly a filtering task. Find for me the age of a person, given the birth date. Find for me, the number (and date values) of the sundays in a date range. Yes a database can do these things, but that is not the core competency of the database.
  • Typically on projects I've been involved with, most of the data is handled by the Database Server, and most of the presentation is done with the code.

    The exception is this: SELECT col1, col2, col3 FROM table

    The presentation is: col1, col2, col3, col1/col2, col1/col3, ...

    If I get all the operands in an equation, I'll do it in the code, if not, I'll do it as part of the query (select col1, col2, col1/col4...)

  • In my opinion, if it deals directly with data then it should be done in the database; if it deals with business logic it should be done in code.

    For example filtering data by a specific field should be up to the database. Aggregating totals for display on a report should be done in the database. Minor data-related logic (e.g. triggers that enforce data integrity) should be done in both (don't shoot me for that comment!) to make sure your application is secure.

    Actual business logic, such as applying a discount if the order total is over $500, should be done in code and have the final value only stored in the database. If the amount of the discount varies based on the customer type, however, then the amount of the discount should be stored in the database but not the logic that does the calculation.

  • My priorities:

    1. Minimize database trips. The code should do most of the work, if possible, and only visit the database when it has to. When it does, it should get as much needed for the current operation as possible.

    2. Minimize SQL complexity. Even if we should visit the database less, this doesn't mean building an overly complex SQL query that is non-performant and does too much. The query will still need to be maintained, and if two simpler queries will save the headache of developing and maintaining one mega-query, then extra code should be used rather than more database work.

    3. Minimize code list iteration Code is great. A nice place to put the business logic, and lots of neat library functions. Code is awesome. But if using code means having to iterate over and over again through lists returned from the database where some simple database where clauses or joins could have eliminated that work, then the SQL needs to be improved, and the code minimized.

    It's generally a case by case basis. For long-lived applications, maintenance is a huge cost and simplicity will often dictate my design.

  • Anything that relates to the integrity of the data should be done in the database. The reason for this is that data can be affected by mulitple sources not just the user interface. So databases should store key relationships, constraints on the data that is allowed in a field, default values etc. Triggers are must if the constraints are too complex for a regular constraint. Datatypes should be carefully thought out and serve in some ways as constraints. If the data is intended to be used in math calulations some type of numeric datatype should be used (not float or real), it the data is a date, always use a datetime data type to prevnt non dates from being stored. If the data is numeric but not intended for math calculations (SSN for example), the store it in a string type of data but put a constraint on it to make sure all the values stored are numbers.

    This doesn't mean that you shouldn't check to see if it is a valid date in the GUI before sending it. You should perform checks on the data being sent to the database by the GUI to avoid sending data that won't be inserted, but the database should always be set up to prevent incorrect data from being inserted no matter how it is sent. No sense wasting valuable netwrok and database resources processing known bad data.

    Report aggregation can often be done faster at the GUI, but if too many records need to be returned, it may be better to do at the database side or you should create an OLAP reporting database to imporve speed.

    StuffMaster : I agree, although I would say the importance of this advice scales with the size of the project.
  • In my experience, it is easier to perform complex validation of the data in code as opposed to SQL, so the answer lies in the type of application & data I'm dealing with. If the validation necessary is fairly benign, I have no problem putting this logic in the database tier in the form of stored procedures.

    One definite drawback to this approach, however, occurs when upgrades to your application are necessary. Changes to the schema can be more challenging to implement than changes to client side code, especially if you are shipping code to clients as opposed to developing apps for in-house use.

    Additionally, it generally requires more work to put together change scripts for database schema changes (e.g. identifying changed objects, making the change scripts idempotent) as compared to performing a build of your application code. This is largely due to the tooling (or lack thereof) available on the database development side of things.

    Tom H. : Although it may be easier to perform complex validation in code (and I don't think that's always the case - just the case for front-end coders) it's not as secure. Your application may not be the only thing accessing the DB
    Tim Lentine : I disagree with the notion that this is any more secure (if we're talking sprocs vs middle tier or client tier code). A sproc (by itself) won't force another developer or app to use it.
    Tim Lentine : Maybe easier was a poor choice of words. In many cases your code is more expressive in terms of what it is doing that the equivalent SQL statement is. I've seen some pretty nasty SQL where the same sentiment could have been expressed much cleaner\neater in code.
  • Within our group there are a couple of cases.

    Processing that can be satisfied by a INSERT/SELECT

    In general we have the program write the insert select, based on user supplied parameters, which are then executed on the server

    Control Break processing

    The program builds the select, with the needed order by, and executes it in the context of the server. The result set is sucked back to the program, and processed as a control break, row by row.

    Complex Processing

    The program builds the select, and executes it in the context of the server. The result set is sucked back to the program, row by row, where code intensive processing occurs. The result goes to a file, or an UPSERT back into the database.

    Performance Critical

    The program builds the select, executes it in the context of the server. The complete result set is sucked back into memory, where code intensive processing occurs. The resulting processed information goes into a file, or UPSERTED back into the database.

  • Because my code runs on my clients computers and I don't want to need database admin access just to update my code, I put almost nothing in the database except data.

  • SQL is specialized code well-suited to manipulating sets of data, such as sorting, filtering, and joining. I think "SQL vs. CODE" is a misleading question. Instead, I suggest looking at your problem as a choice between multiple specialized languages. For this answer, I'll use the term "server-side" to refer to SQL code and "client-side" to refer to everything written in a different, more general purpose, language.

    In my experience, client-side code is easier to write and maintain because (1) I tend to be more familiar with the client-side language and tools, (2) client-side development tends to have better tools, editors, and source-control integration, and (3) maintenance is easier because I only need to look for logic in one place. So, I prefer to write client-side code rather than server-side code unless I have a compelling reason to do otherwise.

    But, server-side code, in the form of SQL statements, is essential for adequate performance. Joining, filtering, and sorting data almost always belong in SQL because the database will do that faster and easier than you can do on the client side. As a rule of thumb, if I'm writing code that uses SQL cursors to filter or sort recordsets, I'm probably doing something wrong and should move that logic into SQL, where it will be faster, easier to write, and easier to understand.

  • If doing it in SQL makes it more manageable, do it in SQL. If doing it in code makes it more manageable, do it in code.

  • We're using .NET 3.5 and LINQ to SQL - so we are 100% code and no hand-written SQL at all. It's bliss!

  • Here are some of the trade offs I would consider:

    • Set vs individual data manipulation
    • Large vs small amounts of data
    • Secret vs non-secret bits (open code)
    • Single vs multiple db platform targets
    • Possibly reused library code or project specific
    • SQL vs code developer skillsets

    ... I'll let you know if I think of anything else.

  • My rules for when to use sql:

    Don't make a database call when looping through a record set.

    Keep your logic together.

    Think of the other developers who will come behind you. Write it in the place it will be the clearest.

  • This is less of a SQL vs. Code question, and more of a Client vs. Server question. The whole idea of Client/Server is to let the server provide a service to the clients in a centralized manner. In practice this means:

    • If the server can do it quickly then do it on the server
    • If the server can return a small result then do it on the server
    • If you can hide complexity from the client then do it on the server

    Many of the issues involved are found in other design areas such as Object Oriented design. A class providing services to other classes should encapsulate the functionality and only expose what's needed by its clients. The same goes for a SQL server. If you're thinking of a SQL server as a mere data retrieval system then you're not taking proper advantage.

    Practical savings include: minimizing data sent over the network; centralizing/standardizing processing; allowing changes without redeploying client software; improved performance.

    As for where exactly to draw the line... there's no magic answer. The real answer is that the database schema + stored procedures + views define an API, and good API design is not trivial but it's worth the hassle. Use principles such as information hiding, encapsulation, cohesion, and all those other things you use in the rest of your programming and design work. They are appropriate here.

    EvilTeach : Yes, your observation is correct. I have changed the nature of the question to make it clearer that it is server vs client.
  • I'm not sure my experiences answer your question, but here's what we do in case helpful.

    We have a Stored Procedure for UPDATE. It has a parameter for Must exist, Must not exist or Don't care. So if an operator is using a CREATE process then the record must be new; if using a MODIFY process then the account must exist; if doing some sort of import maybe you want an UpSert, in which case you don't care.

    Our applications are Web based, so data is coming from Web Forms. Form fields will either contain data or be empty strings. There may be other fields, in the record, which are not on the form.

    Our update procedure has parameters for every field in the table, the parameters default to NULL - except the PK fields which must be provided :)

    An empty string field will therefore be provided to the procedure, and fields which are not on the form will not, and will default to NULL. It is not permitted to pass a form field as NULL - all fields that were on the form must pass a value.

    For existing records, being updated, we treat NULL as "no change", and a value as causing a change. However, we do not allow storage of empty-string, so passing a parameter of an empty string causes the field in that record to be changed to NULL. (Particularly relevant to Date types, for example, as only valid dates can be stored, not empty strings; and empty string for a number would probably be treated as Zero if we did not apply this rule)

    This has worked well for us and allowed us to mechanically generate such database procedures and the relevant Forms associated with them.

    Additionally we allow the Update Procedure to return a Validation Error code. In general validation of form fields is carried out by the application - firstly using Javascript at the Client to enforce dates as being valid, and so on; secondly the application can apply the same (for safety) and additional validations. And finally the database procedure can perform further validations - such as to check that an Order Item has a corresponding Order Header record; whilst this is also enforced by a Foreign Key constraint, we do not like having to catch (and interpret) the database error that results, so the Update Procedure checks, and returns an error number / message which is useful to the application (and corresponds to a message, maintained in an Errors Table, which can be displayed to the Operator so they know what the problem was). The Foreign key constraint is still in place as a long-stop catcher.

    We also mechanical generate the Get/Read procedure. This returns all fields for a single row. Parameter to the procedure is all PK fields. This can be used by CRUD forms to get any existing data. We use this, rather than a specific procedure for each form, in the knowledge that it retrieves all columns and some of them may not be required on the specific form. It is only a single row retrieve, and the likelihood is that all fields will be represented on the CRUD maintenance form. However, based on 80:20 ri;rule we are more circumspect on forms for records that have Text columns that are not on the form - clearly, retrieving many K of data which is not on the form is not good. For the remainder we feel that the consistency of the programming, and having relatively few exceptions, reduces errors and the cost saving outweighs any extra data retrieved.

    This approach also means that as the database schema changes the Procedures are in-step with the changes, and will bring to light problems that this now causes with existing forms. For example if a column is renamed the form will immediately fail - by trying to call the procedure with a parameter that is no longer valid, or by using a column that is no longer retrieved. We can insulate for this where necessary for backwards compatibility - allow the Update procedure to have an additional parameter with the original name and COALESCE that with the parameter representing the column's new name, and have the Get/Read return duplicate columns in the resultset with both New and Old names.

    Similarly for Delete. The procedure is mechanically generated, takes the PK as a parameter, and has the ability to return a Validation message (e.g. when trying to delete the Order Header if it still has child Order Item records).

    All our tables have fields for Creator, Create date, Updater, Update date, and also a EditNo - which is incremented whenever the record is saved. The EditNo value is is given to the Form (in a Hidden Input field) and thus passed to either Update or Delete procedure. It must match the exist record's value, otherwise the record has been changed by another operator and the later update is rejected (again, Update procedures provide a useful message to the operator - including who the other updater was, and at what time).

    For most data tables we also have an Archive table. This stores a copy of the "old" record when an update is made. We do not store the New record - because that is in the main table, and thus reduce the amount of data we store in the Archive.

    This has all columns in the main record, plus an Action - Update or Delete - and the Audit Date. Records are inserted into the Archive by a Update/Delete Trigger on the main table.

    We also have mechanically generated Find procedures. These have parameters which match fields in the table, but can be for Start/End points (e.g. an order date range), an Exact match, or a "contains" match - such as "Name like XXXX". These return only the specific fields which are actually used by the display, and the mechanically generated procedure has a suitable WHERE clause using the defined parameters. In practice these procedures get changed manually to be hand optimised etc. but are useful when first making an application to provide a flying-start for user query of data.

  • The balance is the point you get highest maintainability with acceptable performance.

    All business logic should be done within application. You may start with Sql Server, but if policy change to Oracle. You just migrate your data to Oracle, and no need to rewrite any codes.

    Or some applications, must support multiple db, so you just done all codes in app. When you have issues, you just fix issues only in application. No need to fix issues in multiple db.

    To do this, all CRUD codes must be generated within application. That means you need a good ORM framework (Linq, Hibernate) to do the job.

  • The only sensible answer to this question is, it depends.

    1. Access the database as few times as possible.

    2. If you end up discarding data retrieved from the database, your query needs to be reworked and made more specific.

    3. The script is responsible for manipulating the data, unless an index exists that makes it easier for the database.

    4. The database should be able to exist without a script. Cron jobs are nice, but a database should be able to survive if a script doesn't execute at a specific time.

    5. Never put a query in a loop. There is always a way to produce the same result in one query (aka join).

    6. The database just exists to hold and retrieve raw data as fast as possible. Leave any formatting or math for the script.

    To conclude, stress the database as few times as possible. Use as few queries as possible related only to data insertion and retrieval. Make the script carry more of the load, and make sure that the database isn't the first to go down.

  • Which do you have:

    • Great DB developers that can crank out complex and efficient stored procs and views and maintain the integraty of data with nicely designed foreign keys and properly normalized tables.

    OR

    • Great coders that have mastered the language and frameworks used to develop the app.

    If the answer to the above is DB developers are your strong point then do the work in the DB. A really good DB guy can provide the functionality so your code can be drop dead simple. Might be a few more round trips (and that can be killer in some situations) but performance can be very good. The big problem is there are not many truly talented DB people around.

    If coding is your strong point keep the DB simple and use super simple parametrized queries to get and update your db. Only use things like triggers and stored procedures when you hit a bottle neck. You do have to have some skill in DB management. Basic things like deciding on index and the maintenance of those indexes. Also knowing when to normalize and when NOT to is a necessary.

  • I've done CRUD programming as consultant for 10+ years in the business world. I can tell you that I put as much business logic in sprocs & views as I can (and makes sense). Requirements tend to change every time the winds blows and having logic in the database makes it easy to change and (with decent comments) it's self documenting. Plus sprocs make for good security and for easy code reuse. In business, sprocs ARE the application.

    Rich Andrews : I agree - I'm also a programming consultant in the business world and under fast paced business logic change, with app deployment handled by a third party the db is the most sensible place to keep the business logic. I'm interested in the opinions of whoever downvoted?
    Booji Boy : Thanks. I was disappointed that I got down-voted because I thought I made a good point backed my own experience.
    le dorfier : There's a certain group here that reflexively downvotes any suggestion in favor of stored procedures. Unfortunately, we live with it.
  • The answer is driven by the strength of the developers involved. In the end, you need to be productive, so you have to balance solving today's issues with long term maintenance considerations AND while taking into account that your team may be all app developers and not SQL heavies.

    How are you going to communicate changes to the team, and what can everyone understand versus the guy in the corner you'll always run to who will play genius for the next few years? If your code comes back with errors, how quickly can each team member get the root of the issue?

    All that "management speak" said, there are some basics that others have detailed here that I'll echo as well:

    • Lack of good DB design can not be overcome cool complexity in code
    • Data Integrity is the job of the database
    • Solve complex problems with procedural code - there's nothing like reading well commented, structure code.
    • Reporting from domain objects can be extremely difficult, and the skill set for writing reports generally coincides with greater DB experience, not DDD.
  • Here's a bit of advice I heard along these lines: if a project has a budget of less than $1 million, put SQL statements in the client code. If the budget is greater than $1 million, create accessor classes etc. See this blog post. I don't know whether I agree with the specific dollar amount, but I do agree that budget (and implicitly project size, importance, etc.) should influence how formal you are in your development.

  • I use SQL for queries (declarative), and Python for procedures (step-by-step operations).

0 comments:

Post a Comment