Block files by MIME content type while Sitecore upload

As per Web Security best practices, while media upload in Sitecore, we should block upload of EXE, DLL, BAT, ASPX, ASP, etc. files on server. Do you think, it is enough? I think, No.

We should also block files by checking their MIME content types because someone can also upload Exe/Dll files by renaming them as Jpg or any other extension is allowed. So, this can be a serious threat too.

So, checking MIME content type is equal important as checking file extensions.

Why checking only file extension is not enough?

We implemented a module to restrict certain extensions, provided by Yuriy Yurkovsky from Sitecore Support, Prevent files from being uploaded which is working absolutely fine. Michael Reynolds also nicely presented restricting file extensions on his post Restrict Certain Extensions From Being Uploaded.

Later on, while testing for security threats, we found two issues while implementing blocking extensions.Thanks to our QA Analyst Chirag Patel for finding such nice scenarios and also shown us how it is harmful.
  1. What if I upload file as "setup. EXE" instead of "setup.EXE"? (Just add a space after dot)
  2. What if I upload file my EXE file by renaming as JPG? (Setup.JPG instead of Setup.EXE)
Yes, in both cases we were able to upload EXE contents which should be blocked by us. See below image, how EXE file uploaded as JPG behaves when client requests. This can be a serious threat to our application.

EXE file uploaded as JPG - Security Threat

For case 1, we updated the code given in above module by removing the space between dot and file extension.
For case 2, we can use below approach.

How to restrict upload of certain MIME content types

As per the case 2, users can upload EXE  files by renaming them as JPG file. So, we can block them by their content type. Let's see how we can block content types, which is equal important as blocking files by extensions.

Below can be the patch configuration file, for better understanding, I used same format as Michael Reynolds' post to restrict extensions:

Here, two kind of content types are blocked:
- application/octet-stream (Used for bin, dms, lha, lzh, exe, dll contents)
- application/zip (Used for zip content)
 
<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <processors>
      <uiUpload>
        <processor mode="on" type="SitecoreTactics.Pipelines.Upload.CheckForRestrictedContentType, SitecoreTactics" patch:before="processor[@type='Sitecore.Pipelines.Upload.CheckSize, Sitecore.Kernel']">
          <restrictedcontentTypes hint="raw:AddRestrictedContentType">
            <!-- content types to restrict -->
            <contentType>application/octet-stream</contentType>
            <contentType>application/zip</contentType>
        </restrictedcontentTypes>
        </processor>
      </uiUpload>
    </processors>
  </sitecore>
</configuration>

You can get more content types from:
http://www.freeformatter.com/mime-types-list.html
http://www.dailycoding.com/Posts/mime_contenttypes_with_file_extension.aspx


Below can be the source code to block certain content types defined in above config file.
namespace SitecoreTactics.Pipelines.Upload
{
    public class CheckForRestrictedContentType : UploadProcessor
    {
        private List<string> _RestrictedContentType;
        private List<string> RestrictedContentType
        {
            get
            {
                if (_RestrictedContentType == null)
                {
                    _RestrictedContentType = new List<string>();
                }

                return _RestrictedContentType;
            }
        }

        public void Process(UploadArgs args)
        {
            foreach (string fileKey in args.Files)
            {
                string fileName = args.Files[fileKey].FileName;
                string contentType = args.Files[fileKey].ContentType;

                if (IsRestrictedContentType(contentType))
                {
                    args.ErrorText = Translate.Text(string.Format("The file \"{0}\" cannot be uploaded. Files with an content Type of {1} are not allowed.", fileName, contentType));
                    Log.Warn(args.ErrorText, this);
                    args.AbortPipeline();
                }
            }
        }


        private bool IsRestrictedContentType(string contentType)
        {
            return RestrictedContentType.Exists(restrictedContentType => string.Equals(restrictedContentType, contentType, StringComparison.CurrentCultureIgnoreCase));
        }

        protected virtual void AddRestrictedContentType(XmlNode configNode)
        {
            if (configNode == null || string.IsNullOrEmpty(configNode.InnerText))
            {
                return;
            }

            RestrictedContentType.Add(configNode.InnerText);
        }
    }
}

I feel, now my Sitecore application is more secured!

Show PDF Thumbnail as Icon in Content Editor

Sitecore shows PDF icon as a thumbnail, so it becomes very difficult to find out a PDF file from a big list of uploaded files. Just imagine, life would be so easy when Sitecore provides PDF thumbnails as the icons just like images!!

It is quite possible and easy to show PDF thumbnails in different dimensions just by overriding the MediaRequestHandler of Sitecore. See my earlier post, PDF Thumbnail Handler blog. You can also find PDF Thumbnail Handler on Sitecore MarketPlace.

Use of PDF Thumbnails Handler

Once the concept of PDF Thumbnail Handler is understood, we can achieve this easily. Do following:
  1. Install PDF Thumbnail Handler to your Sitecore and make it up and running.
  2. Update PDF item's Icon field. Replace ~/media to ~/mediathumb
  3. Now, check Sitecore Content Editor will show PDF thumbnails as icons.
By default PDF icons are available as below image:

Sitecore shows PDF icon as thumbnail
The Icon has value: ~/media/36C02213E38441D9BA1AA82DB86A80E0.ashx?h=16&thn=1&w=16, which will load icon of PDF which is defined in the sitecore itself.

As per PDF Thumbnail Creation Handler, by using ~/mediathumb handler by updating its value to: ~/mediathumb/36C02213E38441D9BA1AA82DB86A80E0.ashx?h=16&thn=1&w=16. See below image which shows how PDF thumbnail is shown as icon.


We can show PDF thumbnail as icon like this



 Let's make PDF thumbnails working in Content Editor

Our requirement is to show thumbnails like below image:

Show PDF thumbnails by overriding MediaProvider


Override MediaProvider of Sitecore, for that you need to do changes in web.config file.
   <!-- override Sitecore MediaProvider -->
   <mediaProvider type="SitecoreTactics.MediaProvider, SitecoreTactics"/>

Below is the code required in MediaProvider class. In the GetMediaUrl function, when the request of any PDF file is there, then replace existing ~/media/ handler with ~/mediathumb/.
namespace SitecoreTactics
{
    public class MediaProvider: Sitecore.Resources.Media.MediaProvider
    {
        public override string GetMediaUrl(MediaItem item, MediaUrlOptions options)
        {
            string mediaUrl;
            mediaUrl = base.GetMediaUrl(item, options);

            // When item is PDF and Thumbnail is requested
            if (item.Extension == "pdf" && options.Thumbnail)
                mediaUrl = mediaUrl.Replace(Config.MediaLinkPrefix, "~/mediathumb/");

            return mediaUrl;
        }
    }
}



Wow, let's enjoy easier life with PDF thumbnails in Content Editor!!

Related Posts:
- PDF Thumbnail Handler
- Sitecore HTTP Custom Handler

PDF Thumbnail Creation Handler in Sitecore

I recently published a Sitecore Marketplace module PDF Thumbnail Creater Handler. Basically it allows to generate thumbnail on-the-fly (dynamically) for the uploaded PDF in sitecore by passing width and/or height. This will allow user to request thumbnail for any height or width and the thumbnail will be stored as a media cache in Sitecore.

Requirement

Suppose user uploaded PDF in Sitecore.
  • User want to generate thumbnails of the fist page of uploaded PDF. 
  • User can choose height and/or width of thumbnails without any configurations.
  • If user replaces a new file instead of that PDF, it should serve thumbnail of newly uploaded PDF. 
  • Similarly the thumbnail URL should work if user moves/copies/deletes the PDF item.
  • Finally, the conversion process should be scalable and quick enough so it does not affect performance and the thumbnails should be cached as media cache too

Below are the outputs of same PDF but different size thumbnails: 

PDF Thumb Original
http://sitecore/~/mediathumb/Files/MyPDF.pdf
PDF Thumb Width=300
http://sitecore/~/mediathumb/Files/MyPDF.pdf?w=300
PDF Thumb Width=150
http://sitecore/~/mediathumb/Files/MyPDF.pdf?w=150

Theoretical Concept

I just thought to achieve this overriding Sitecore MediaRequestHandler. Sitecore allows to generate different sized thumbnails of uploaded image files, See how. Why should not I use the same concept to generate thumbnails from PDF? Only concern I looked was to convert PDF to JPG only, but was not that much easy. 

So, what I wanted to achieve:
 
PDF / Thumbnail Details PDF / Thumbnail Path/URL
PDF Sitecore Path /sitecore/media library/Files/MyPDF
PDF URL http://sitecore/~/media/Files/MyPDF.pdf
PDF Thumbnail URL http://sitecore/~/mediathumb/Files/MyPDF.pdf
or
http://sitecore/~/mediathumb/Files/MyPDF.jpg
PDF Thumbnail URL
Width = 100 px
http://sitecore/~/mediathumb/Files/MyPDF.pdf?w=100
PDF Thumbnail URL
Height = 200 px
http://sitecore/~/mediathumb/Files/MyPDF.pdf?h=200
PDF Thumbnail URL
Width = 100 px
Height = 200 px
http://sitecore/~/mediathumb/Files/MyPDF.pdf?w=100&h=200

How this achieved

PDF to JPG conversion can be done using GhostScript (With GPL License, which is free), which is very efficient and gives flexibility with many other options.

You can read my older post regarding Sitecore Custom HTTP Handler, I have described there in detail.

I created own Sitecore Custom Handlers (SitecoreTactics.ThumbnailManager.PDFThumbnailRequestHandler) to generate thumbnails of media(PDF) items. See below config changes this requires:
    
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>

    <!-- Define Custom Handler -->
    <customHandlers>
      <handler trigger="~/mediathumb/" handler="sitecore_media_thumb.ashx"  />
    </customHandlers>

    <!-- Define Media Prefix -->
    <mediaLibrary>
      <mediaPrefixes>
        <prefix value="~/mediathumb" />
      </mediaPrefixes>
    </mediaLibrary>
  </sitecore>


  <!-- Define Web Handler -->
  <system.webServer>
 <handlers>
     <add verb="*" path="sitecore_media_thumb.ashx" type="SitecoreTactics.ThumbnailManager.PDFThumbnailRequestHandler, SitecoreTactics.ThumbnailManager" name="SitecoreTactics.PDFThumbnailRequestHandler"/>
 </handlers>
  </system.webServer>
</configuration>    
Handler's source code to process thumbnail and use media cache as below.
namespace SitecoreTactics.ThumbnailManager
{
    public class PDFThumbnailRequestHandler : Sitecore.Resources.Media.MediaRequestHandler
    {
        protected override bool DoProcessRequest(HttpContext context)
        {
            Assert.ArgumentNotNull(context, "context");
            MediaRequest request = MediaManager.ParseMediaRequest(context.Request);

            if (request == null)
                return false;

            Sitecore.Resources.Media.Media media = null;
            try
            {
                media = MediaManager.GetMedia(request.MediaUri);
                if (media != null)
                    return this.DoProcessRequest(context, request, media);
            }
            catch (Exception ex)
            {
                Log.Error("PDF Thumbnail Generator error - URL:" + context.Request.Url.ToString() + ". Exception:" + ex.ToString(), this);
            }

            if (media == null)
            {
                context.Response.Write("404 - File not found");
                context.Response.End();
            }
            else
            {
                string itemNotFoundUrl = (Context.Site.LoginPage != string.Empty) ? Context.Site.LoginPage : Settings.NoAccessUrl;

                if (Settings.RequestErrors.UseServerSideRedirect)
                    HttpContext.Current.Server.Transfer(itemNotFoundUrl);
                else
                    HttpContext.Current.Response.Redirect(itemNotFoundUrl);
            }
            return true;
        }

        protected override bool DoProcessRequest(HttpContext context, MediaRequest request, Sitecore.Resources.Media.Media media)
        {
            Assert.ArgumentNotNull(context, "context");
            Assert.ArgumentNotNull(request, "request");
            Assert.ArgumentNotNull(media, "media");

            if (this.Modified(context, media, request.Options) == Sitecore.Tristate.False)
            {
                Event.RaiseEvent("media:request", new object[] { request });
                this.SendMediaHeaders(media, context);
                context.Response.StatusCode = 0x130;
                return true;
            }

            // Gets media stream for the requested media item thumbnail
            MediaStream stream = ProcessThumbnail(request, media);
            if (stream == null)
            {
                return false;
            }
            Event.RaiseEvent("media:request", new object[] { request });
            this.SendMediaHeaders(media, context);
            this.SendStreamHeaders(stream, context);
            using (stream)
            {
                context.Response.AddHeader("Content-Length", stream.Stream.Length.ToString());
                WebUtil.TransmitStream(stream.Stream, context.Response, Settings.Media.StreamBufferSize);
            }
            return true;
        }

        private MediaStream ProcessThumbnail(MediaRequest request, Sitecore.Resources.Media.Media media)
        {
            MediaStream mStream = null;
            
            ParseQueryString(request);

            mStream = MediaManager.Cache.GetStream(media, request.Options);

            if (mStream == null)
            {
                string tempPath = Settings.TempFolderPath + "/PDF-Thumbnails/";

                tempPath = MainUtil.MapPath(tempPath);

                if (!Directory.Exists(tempPath))
                    Directory.CreateDirectory(tempPath);

                // Prepare filenames
                string pdfFile = tempPath + media.MediaData.MediaId + ".pdf";
                string jpgFile = tempPath + media.MediaData.MediaId + ".jpg";

                string resizedJpgFile = tempPath + media.MediaData.MediaId + "_" + request.Options.Width.ToString() + "_" + request.Options.Height.ToString();

                if (!File.Exists(jpgFile))
                {
                    // Save BLOB media file to disk
                    MediaConverter.ConvertMediaItemToFile(media.MediaData.MediaItem, pdfFile);

                    // Convert PDF to Jpeg - First Pager
                    MediaConverter.ConvertPDFtoJPG(pdfFile, 1, jpgFile);

                }

                // Resize Image
                MediaConverter.ReSizeJPG(jpgFile, resizedJpgFile, request.Options.Width, request.Options.Height, true);

                // Convert resized image to stream
                MediaStream resizedStream = new MediaStream(File.Open(resizedJpgFile, FileMode.Open, FileAccess.Read, FileShare.Read), "jpg", media.MediaData.MediaItem);

                // Add the requested thumbnail to Media Cache
                MediaStream outStream = null;
                MediaManager.Cache.AddStream(media, request.Options, resizedStream, out outStream);

                if (outStream != null)
                {
                    // If Media cache is enabled
                    return outStream;
                }

            }

            // If Media cache is disabled
            return mStream;
        }

        public void ParseQueryString(MediaRequest mediaRequest)
        {
            HttpRequest httpRequest = mediaRequest.InnerRequest;

            Assert.ArgumentNotNull((object)httpRequest, "httpRequest");
            string str1 = httpRequest.QueryString["as"];
            if (!string.IsNullOrEmpty(str1))
                mediaRequest.Options.AllowStretch = MainUtil.GetBool(str1, false);
            string color = httpRequest.QueryString["bc"];
            if (!string.IsNullOrEmpty(color))
                mediaRequest.Options.BackgroundColor = MainUtil.StringToColor(color);

            string str2 = httpRequest.QueryString["dmc"];

            mediaRequest.Options.Height = MainUtil.GetInt(httpRequest.QueryString["h"], 0);
            string str3 = httpRequest.QueryString["iar"];
            if (!string.IsNullOrEmpty(str3))
                mediaRequest.Options.IgnoreAspectRatio = MainUtil.GetBool(str3, false);

            mediaRequest.Options.MaxHeight = MainUtil.GetInt(httpRequest.QueryString["mh"], 0);
            mediaRequest.Options.MaxWidth = MainUtil.GetInt(httpRequest.QueryString["mw"], 0);
            mediaRequest.Options.Scale = MainUtil.GetFloat(httpRequest.QueryString["sc"], 0.0f);
            string str4 = httpRequest.QueryString["thn"];
            if (!string.IsNullOrEmpty(str4))
                mediaRequest.Options.Thumbnail = MainUtil.GetBool(str4, false);

            mediaRequest.Options.Width = MainUtil.GetInt(httpRequest.QueryString["w"], 0);
        }
    }
}

You can get full source code (of older version) of this module from Sitecore Marketplace.
Update:
Module available on Sitecore Marketplace contains older code, having a bug on media cache that when someone overwrite media files (using detach/attach), it was serving older thumbnail. This bug has been fixed in above code, and will be available on marketplace very soon. Meanwhile, you can download the source code (excluding DLLs) from https://drive.google.com/file/d/0B1otw7vE3rGTQmU1U1l2TTJHQTA/view?usp=sharing:

Benefits of this approach

  1. Dynamic conversion of PDF to Thumbnail when requested
  2. Allows to convert different size thumbnails
  3. Repeated thumbnails will be served from media cache.
  4. Conversion is fast using GhostScript and media cache adds more power.


Related Posts:
- Show PDF Thumbnail Icons in Content Editor
- Sitecore HTTP Custom Handler

Render Sitecore Content Item with name of language

One issue reported in Sitecore SDN forum regarding rendering an content item under Home which has name of a language. I mean, there is an item /sitecore/content/Home/uk. Now, Sitecore consider /uk/ as an language, so it will render the Home item with uk (Ukranian) language (even if uk language is not added to Sitecore Languages). Its expected behavior is to render /Home/uk item in default language itself.

See what's the issue

We have an item named uk under Home.




Now, see what happens when we access http://patelyogesh.in/uk

Sitecore content item with language name

Why this happens

StripLanguage processor in preprocessRequest pipeline checks the FilePath and extracts expected language name from it. If the FilePath contains a valid language, it will assign it to Context.

So, in our URL: http://patelyogesh.in/uk, it will get FilePath as uk. So, assumes the uk item as uk language and set it to Context Language and Home is set to Context item.

How to solve this

One solution came in my mind to write a custom processor in httpRequestBegin, which will over-write Context Language and Context Item both, using it it is possible to solve. But, as John West suggested in forum, this might be possible using StripLanguage. So, thought to implement this and yes, it solves and is easy too. We can solve this issue overriding StripLanguage processor in preprocessRequest.
<!--<processor type="Sitecore.Pipelines.PreprocessRequest.StripLanguage, Sitecore.Kernel"/>-->
<processor type="SitecoreTactics.StripLanguage, SitecoreTactics">
  <ignoreLanguages hint=”list”>
    <language>uk</language>
    <language>en-gb</language>
    …
  </ignoreLanguages>
</processor>
To override the StripLanguage, we need to modify only one function named ExtractLanguage, which extracts the language. Now, as per our requirement, as need to ignore uk and en-gb languages. So, we will not consider language when it is uk or en-gb.

Below code should be implemented for this, but it does not contain code to read above ignoreLanguages setting. Here I have hard-coded these languages for better understanding:
public class StripLanguage : PreprocessRequestProcessor
{

    // Other methods
    ......

    private static Language ExtractLanguage(HttpRequest request)
    {
        Language language;
        Assert.ArgumentNotNull(request, "request");
        string str = WebUtil.ExtractLanguageName(request.FilePath);

        // Our code starts here
        // If the found language is uk, then set it to Empty, 
        // so sitecore will consider this as no language
        if (str == "uk" || str == "en-gb")
            str = string.Empty;
        // Our code ends here

        if (string.IsNullOrEmpty(str))
        {
            return null;
        }
        if (!Language.TryParse(str, out language))
        {
            return null;
        }
        return language;
    }

    ....
    // Other methods
}
Now, see below screen, referring /Home/uk page now refers to uk page itself with en language instead of Home page with uk language.



Finally, StripLanguage worked, we might need some extra code when implementing Multisite environment!!

Save Sitecore Media Item to Disk file

Once we required to convert the Sitecore Media Item to a disk file (Save media item as a physical file on server). Sitecore does not provide any API to do this directly.

Below is the code to do it, thought to post it if can help others..
    string mediaItemPath = "/sitecore/media library/Images/myimage";
    string diskFolderPath = @"D:\Sitecore-Media\";

    MediaItem mediaItem = (MediaItem)Sitecore.Context.Database.GetItem(mediaItemPath);
    ConvertMediaItemToFile(mediaItem, diskFolderPath);


    public static void ConvertMediaItemToFile(MediaItem mediaItem, string folderName)
    {
        if (mediaItem.InnerItem["file path"].Length > 0)
            return;

        string fileName = folderName + mediaItem.Name + "." + mediaItem.Extension;

        var blobField = mediaItem.InnerItem.Fields["blob"];
        Stream stream = blobField.GetBlobStream();
        if (stream == null)
        {
            return;
        }

        string relativePath = Sitecore.IO.FileUtil.UnmapPath(fileName);
        try
        {
            SaveToFile(stream, fileName);
            stream.Flush();
            stream.Close();
        }
        catch (Exception ex)
        {
            Log.Error(string.Format("Cannot convert BLOB stream of '{0}' media item to '{1}' file", mediaItem.MediaPath, relativePath));
        }
    }

    private static void SaveToFile(Stream stream, string fileName)
    {
        byte[] buffer = new byte[8192];
        using (FileStream fs = File.Create(fileName))
        {
            int length;
            do
            {
                length = stream.Read(buffer, 0, buffer.Length);
                fs.Write(buffer, 0, length);
            }
            while (length > 0);

            fs.Flush();
            fs.Close();
        }
    }


Sitecore media and browser cache

Have you ever faced issues like your media items are not getting reflected to your page or you are still referring to older media files after media publish? Or your media files are not getting cached when accessing through revere proxy? Or your media files are not getting cached on browser level? Here is the solution in Sitecore itself, that is using Media Response Cacheability.

Media response cacheability is served using cache-control header, read more on topic 14.9 regarding cache-control header.

In web.config, you can define media response cacheability options in settings section like below:
    <!--  MEDIA RESPONSE - CACHEABILITY
    The HttpCacheability is used to set media response headers.
    Possible values: NoCache, Private, Public, Server, ServerAndNoCache, ServerAndPrivate
    Default value: public-->

    <setting name="MediaResponse.Cacheability" value="public" />

Here are six different settings to define media response headers, using which Sitecore manages media on client or browser level caching:


Media Cacheability Option Description
NoCache Browser cache is not created while using this option, so, every time media is served from server to device. This is not a good idea, when you want to improve performance by serving media files faster. This will slow down page speed.
Private This option allows browsers to store media cache. But, the response is cacheable only on the client and not by shared (proxy server) caches. Suppose, the ISP is having a invisible proxy between user and internet, then the user can not get benefit of media caching.
Public On step ahead than Private, using this option, response is cacheable by clients and shared (proxy) caches. So, anybody can use its caching mechanism. This option is mostly preferred to get optimum performance gain.
Server The response is cached only at the origin server. Similar to the NoCache option. Clients receive a Cache-Control: no-cache directive but the document is cached on the origin server. Equivalent to ServerAndNoCache.
ServerAndNoCache Applies the settings of both Server and NoCache to indicate that the content is cached at the server but all others are explicitly denied the ability to cache the response.
ServerAndPrivate Indicates that the response is cached at the server and at the client but nowhere else. Proxy servers are not allowed to cache the response.


Let's go back to solve these problems.
1. Media files are not getting cached?
    - Use public or private depending on your need, described above.

2. Caching is on from Sitecore, still cache is not getting generated on device or browser.
    - Chances are use of any proxy before reaching to you. Check, your settings might be set as Private. Set it as Public.

Sitecore Intelligent Publish - The most optimized Approach


Sitecore publishing becomes headache for us when we have any of below situations:

- Sitecore application slows down due to frequent publication or frequent cache clearing.
- Publishing being queued for users due to slow & repetitive publishing.
- It is becoming difficult to monitor your publishing and related consequences.

There are FIVE thumb rules to get optimized publishing, which solves all above problems.
  • Stop frequent publishing
  • Publish only those items which are actually modified
  • Optimize publishing operations
  • Distribute load of publishing
  • Cache Tuning

Below is an approach described theoretically. I'll be posting its practical implementation soon!!

1. Use Intelligent Publish

Smart Publish and Republish both have their own pros & cons. Can't we produce an intelligent publish mechanism which can join all pros of both approaches and without getting their cons? This approach I named Intelligent Publish.

Smart Publish checks Publish Status of individual at time of publish, which play a bigger role in slowing down publishing. In it, only modified items will get published. In Intelligent Publish, we will list down which items to be published and then send these items to publishing . So, this will save much time at time of publish. Also, these items will be sent as Republish.

Intelligent Publish shares all pros of both approach without sharing cons. But yes, it is not easy to implement this approach. Below table shows difference between them.

Actions Republish Smart Publish Intelligent Publish
Operations on UI Site
1. Collect items Collect items Collect items Collect items with references
2. Filter Items NA NA Filter items which are to be published or excludes all items already published
Actual Publish Started and added to Publish Queue
3. Invoke Publish Invoke publish for all items Invoke publish for all items Invoke publish for all filtered items in Step - 2
4. Check Publish Status NA Yes NA (Already done in Step - 2)
5. Publish items Publishes all items Publish only modified items checked in Step - 4. Publishes only filtered items in Step - 2.

2. Use Publish Basket

It goes worst when user needs to select items and publish them one-by-one. Don't you think it increases client clicks and wastes time and frequency of publishing? Smart publish the site is also not a solution here.

To prevent this situations, we can allow users to use Publish Basket. Users can add n number of items in basket and send them to publish in one go. We might need to use an external DB to store basket items and send to publish. See below snap, shows mockup for Publish Basket.

Sitecore Publish Basket - Sitecore Tactics

3. Publish items with reference items

We can allow referenced items to get publish along with the selected publishable item. The references can be all referenced media as well as all items which are selected in fields like Multilist, subtree, etc.

This will reduce frequency of publishing, that will reduce frequency of clearing HTML cache.

4. Schedule Publish

Suppose, your client has to do publish after few hours for many items. Is it good that your client will remember the exact timings of publishing, say publishing at mid-night?

Now you feel, how important role the Scheduled Publish can play with Publish Basket functionality. We can allow user to do publish at specific date and specific time. See below snap, shows mockup for scheduling items.

Sitecore Scheduled Publish - Sitecore Tactics

5. Use Separate Publish Instance

Using separate publish instance can give many benefits if we are getting slowness on CM server at time of publish. In this case, all load of heavy publish will be taken by PI and CM can work without worrying that much about Publishing going on.

Below snap shows how Publishing will work with a separate Publish Instance.




Read How to setup Sitecore Publish Instance.

6. Use Multiple Publish Instances

Are your clients complain about queue stuck up while publishing? You may have faced issues like many users have set publishing so important publishing gets queued for a long period of time.

To prevent this kind of situation, we can share load of publishing by having multiple publish instances. We have successfully implemented multiple PI and working great without any problems since July, 2012.

Read more about Multiple Publish Instances or Parallel Publish in Sitecore.

Must read posts for Sitecore Publish

- Sitecore Publishing Facts
- Setup Publish Instance
- Sitecore Parallel Publishing using Multiple Publish Instances

Some Sitecore Publishing Findings

Well, I do not want to waste your time in explaining how Sitecore publishing works, there are many blogs explaining What is publishing, publishing modes, ways to publish content, versions, target databases, etc. So, I would like to share some hidden secrets about Sitecore Publishing Mechanism, which I learn while exploring it.


Below are the questions mostly Sitecore developers always eager to know:

Why my publish fails or when publish:fail event is called?

While publishing any runtime error occurs like SQL Connectivity loss/Timeout, any SQL Exception, etc. the publish fails. The items already published, cannot be rollback and the items pending to publish needs to be published again. Means, Sitecore publishing does not maintain Transactions.

Check the publishing code from Sitecore.Publishing.Publisher.Publish. The code also contains comments which can explain how publishing works.
public virtual void Publish()
{
    object obj3;
    Monitor.Enter(obj3 = this.GetPublishLock()); // Locks the publishing. That's why publishing is a sequential process.
    try
    {
        using (new SecurityDisabler())
        {
            this.AssertState();
            this.NotifyBegin(); // Raises publish:begin event
            this.PerformPublish(); // Sends the current publish job to start publish
            this.NotifyEnd(); // Raises publish:end event
            this.UpdateLastPublish(); // Updates last publish date in source database
        }
    }
    catch (Exception exception)
    {
        // This function raises publish:fail event.
        this.NotifyFailure(exception);
        throw;
    }
    finally
    {
        Monitor.Exit(obj3);
    }
}

Why my publish expires or when publish:expire event is called?

You might get below configuration setting in web.config file.
     <setting name="Publishing.TimeBeforeStatusExpires" value="02:00:00"/>
If your publishing is taking more than time specified in the setting, the publishing job expires. Here, if your publishing job is taking more than 2 hours, it will get expired. Increasing its value to 5 or 10 hours can solve your problem of expiration if you have to publish thousands of items or heavy media items in one go.

Why my items not getting published?

There can be many reasons behind it:
  • The user publishing the item has not having its rights. Giving rights to the user can do publish.
  • When default/anonymous user's access rights are removed for the item or its parent. Giving rights to the user can do publish.
  • If still you cant find out issue, enable event timing level to high. Now, check logs on publish instance while doing publishing, this will log all events in details and help to identify the cause of problem in publish.
       <events timingLevel="high">
    
  • If you have set a publish instance, check Publish Instance is running or not. It should be up & running.
  • If it is not above case, and still publish not happening, then you need to enable eventqueue on CM as well as PI.
  • If first publish is going on after restart, it can be slow too. It needs to generate more cache and that's why it taking more time. Once cache generated, it will start publishing normally.

What are the reasons for slower publishing?


  • Publishing is a slower process by default as per its process. It requires a lot of processing time, so it consumes lots of CPU resources on Publish Instance. It needs constant updates on web database, so while publishing many insert, update or delete queries are executed. Many caches are cleared.
  • You are publishing many as well as heavy items (items with more size like media files)
  • Have you recently updates Access Viewer to give rights to any Role?
  • Have you checked your target database server is performing well? Checked its IOPS(Input/Output Operations Per Second)?
  • Many publishing or other jobs are in queue. Check below setting in your web.config on your PI.
    <setting name="MaxWorkerThreads" value="20" />
    
    This config determines how many worker threads can be running simultaneously. If your publishing or any other jobs have occupied these (in the example 20) threads, next queued job/publishing has to wait till it gets a free thread. Also, in this situation, your publishing might get stuck.

    You can increase its value as per need. Also remember, greater value can allow more jobs to execute, so may slow down the whole instance.

    You can use Publish Queue Viewer to know how many jobs are running on your instance on:
    - Sitecore Publish Queue Viewer - 1
    - Sitecore Publish Queue Viewer - 2

What can be the optimized publishing approach?

To get best performance with publishing, you might need to take care below things:
  • Proper Cache Tuning on PI as per need
  • Prevent frequent and long publishes
  • Allow Publish Basket facility, means adding media and other items' references
  • Allow scheduling publishing
I'll be posting regarding this very soon!!

How can I setup and use Sitecore Publish Instance?

Refer my earlier post regarding Publishing Scalability or Setting Publish Instance. This post describes how we can set separate Publish Instance and how it works.

Can I use multiple publish instance to support parallel publish?

Many have asked me the question, is it really possible to create Multiple Instances which can do publish parallel? The answer is YES. Although, Sitecore does not recommended this approach as per its architecture. But, Sitecore architecture is so scalable, we can still achieve it.

Refer my earlier post regarding Multiple Publish Instance or Parallel Publishing

Must read posts for Sitecore Publish

- Intelligent Publish in Sitecore - The most optimized approach
- Setup Publish Instance
- Sitecore Parallel Publishing using Multiple Publish Instances