Document Archive

Document Archive was introduced in Agresso 5.5 to replace the document store in previous versions which was handled by Compello. Document Archive was a great improvement over the 5.4 version because it was far more robust, feature rich and gave users options around where to store their documents. This article will concentrate on the storage of documents directly within the database as well as on shared storage i.e. outside the database.
Document Archive – Database Storage

When Agresso 5.5 was released the advice was to store images in the database. All example “Document Types” were set up this way and it meant that there was zero configuration required because all the data was stored in one place.

The Advantages:

  1. Zero Configuration – It is incredibly easy to get working, which is especially advantageous when you are in the middle of an implementation or an upgrade.
  2. Everything is in One Place – At the beginning this is fantastic for many reasons; it is easier to backup and when you restore to a test server all the images carry over. However, this will likely turn into a disadvantage later down the line.
  3. Security – Some would argue that storing images in the database is more secure. Indeed, this was certainly the case before Agresso 5.6.3 was released which featured a new Document Archive service.

The Disadvantages:

There is only really one disadvantage to storing images in the database but it is a big one!

Storing images in the database will be your single biggest use of database space within Agresso and will only grow over time. It will usually dwarf all other Agresso tables put together and in some cases this can include your reports and logs stored as BLOBS. Over time the table storing the images as blobs grows in size, especially if you do not keep tight control of how large your scanned or uploaded images are.

  1. Disk Space – Database files especially in production systems will be hosted on relatively speaking expensive high performance disks either locally within the server or via a Storage Area Network (SAN). The retrieval of images does not benefit from this level of performance and is an expense that is not required.
  2. Local Backup – Over time as your database grows so will your backups, QuickThink Cloud recommend holding around 3 days backup locally usually for easy restore to Test or to quickly recover a production database.

Having a large database means having large backups, SQL Server does have backup compression built in (from SQL Server 2008 R2 Standard onwards) and you should use it, however it is unable to compress images stored as BLOBS. Therefore if you have a 200 GB database with 100 GB being transactional data and the other 100 GB being images then you can expect the data part to compress to around 10-15 GB and the image part to have very little compression. This means you will have a backup file of 115 GB approximately.

This obviously has consequences to how many backups you can host locally for restoration to test systems and possibly to production without having to restore from an off site source and also the cost of storage.

A question to ask your business is, if disaster struck and you needed to restore data fast, is having your Document Archive available immediately when the system is restored critical? Would you rather the database is restored in 5 hours with images or less than 1 hour for just data and then in the next 4 hours images will come online?

  1. Off Site Backup: Off site backups share many of the same issues as local backup, however it is more expensive per GB to store data off site due to paying for a service or software solution to do this.

The biggest barrier when restoring from an off site backup is the how long it takes, a tape needs to be requested and read, a download over the internet happens or it could be over a WAN connection if you host your own backup storage. So the question is how quickly can you restore your system to a known good place with images in the database vs not? Lets take a look at some examples from an example cloud based service that we use in our QTEC for Agresso solution.

This following table gives you an example of the best case scenario for downloading your off site backup to either your production or DR site in the event you need to restore a database. As you can see the download alone will set you back a working day just to retrieve the data if you have a very steady 50Mbps Internet connection. After the download is complete then the process of restoring and preparing the database for use will be set in motion and before you know it, 2 working days have been lost.

In comparison if the images were stored outside of the database then the full process could take as little as 2 hours.

downloadtime

Some SQL

If you are currently using the database to store your document archive then below is some SQL to help you understand how much space is being used to store these.

The results will give you a detailed breakdown by document type and allow you to see the average image size you are uploading for each document type.

At the very least this may be able to help you curb the growth of your database by trying to reduce the image quality of certain scanned documents for example.

SELECT CASE 
         WHEN ( Grouping(d .doc_library) = 1 ) THEN 'Total' 
         ELSE d. doc_library 
       END 
       AS doc_library, 
       CASE 
         WHEN ( Grouping(d .doc_type) = 1 ) THEN 'Total' 
         ELSE d. doc_type 
       END 
       AS doc_type, 
       Round(( Sum (Cast(f.file_size AS FLOAT)) / 1024 ), 2) 
       AS total_size_kb, 
       Round(( Sum (Cast(f.file_size AS FLOAT)) / 1048576 ), 2) 
       AS total_size_mb, 
       Round(( Sum (Cast(f.file_size AS FLOAT)) / 1073741824 ), 2) 
       AS total_size_gb, 
       Count(d .doc_guid) 
       AS number, 
       Round(( Sum (Cast(f.file_size AS FLOAT)) / 1024 ) / Count (d. doc_guid), 2 
       ) AS 
       average_size_kb 
FROM   adsdocument d 
       JOIN adspage p 
         ON d.doc_guid = p. doc_guid 
       JOIN adsfileblob f 
         ON p.file_guid = f. file_guid 
GROUP  BY d.doc_library, 
          d. doc_type WITH rollup 
ORDER  BY d.doc_library, 
          d. doc_type 

Summary

Hopefully this has blog post has given you some insight into the Pro and Cons of storing images in the database. It weighs heavily on the disadvantages however that does not mean storing images in the database is bad for everyone.

If you do not scan high volumes of images and do not upload many items to master files then this maybe the most straight forward method for you with little impact caused on disk space or backups. For those with large processing requirements then I would recommend looking into implementing File Storage for Document Archive.

Part 2 of this blog post will cover the File System approach to Document Archiving and the pro’s and con’s involved.