Thursday, 20 October 2011

Fixing a hanging author server

So, we had a problem where the author server would hang.

It would hang for 2-3 hours after startup ("the Loading" would be displayed in the main content area). Then it would work for up to 24 hours. Then the problem would come back and last forever.

The resolution was to apply hotfix 36021 - which had just a few pre-requisites. Ones in bold were already installed but needed re-installing in the correct order.

Do not implement the FineGrainedISMLocking performance change on CRX v2.x!

Installation instructions for the 27 Adobe hotfixes

After each hotfix, slowly and carefully :-

  • Check whether the bundles have stopped, if they have wait 10 minutes for them all to re-resolve & restart.
  • Look at which jar versions have been installed and see if the new version number is now listed in the bundles list.
  • Check if the Author application displays lists of web pages as you browse in the navigation tree
  • Check if the DAM tab displays lists of assets as you browse in the navigation tree.
  • Check that a web page (eg the homepage) displays ok in authoring mode.
  • Check the log file for errors.

NB, A hotfix number in bold means that it is probably already installed & needs just needs re-installing.

i.

Pre-requisite: Journal & Bundle Cache configuration changes are applied.

ii.

Initial Step: Turn off the replication agents

1.

HF

28211

12.03.10

2.

HF

29626

08.06.10

3.

FP

28358-1.2

06.07.10

Add -/tmp as per package description.

4.

FP

30015-1.0

24.07.10

5.

HF

30084

29.07.10

6.

FP

29944-1.0.1

03.08.10

Restart required after this one. The shutdown always hangs.

Check that all the bundles start when the server comes back up. You might need to start these bundles:-

org.apache.sling.api, org.apache.sling.commons.osgi, org.apache.sling.jcr.resource, com.day.cq.workflow.cq-workflow-console

7.

FP

30397

17.08.10

8.

HF

30518

23.08.10

9.

FP

30553

27.08.10

Restart required after this one.

NB this was not required on Test1 but we did it anyway.

10.

FP

31852

14.10.10

11.

FP

30035-2.0

15.10.10

A restart was required here (on test2) to pick up the new versions of the bundles.

12.

FP

29995-1.1

18.10.10

This stops 3 bundles (compat) - this is expected.

13.

FP

30532-2

17.11.10

This one takes ~10 minutes to recover the bundles.

14.

FP

31905

18.11.10

15.

FP

32186

29.11.10

16.

HF

32460

02.12.10

Restart required after this one. Is workflow-impl at 5.3.26 after 20 minutes (after the 1st restart)? Restart again 30 minutes after the first restart has returned.

There was no need to do the 2nd restart on Test1.

The “workflow-impl” jar needed manually starting after the restart.

17.

FP

31902-1.0

09.12.10

This one takes ~10 minutes to recover the bundles.

18.

FP

30249-3.0

10.02.11

A number of tagging & personalisation bundles stop here & then resolve themselves. Also there was a “Zip file closed” error in the logs.

19.

FP

31033-2.1

24.03.11

20.

HF

34460-1.0

25.03.11

Get a problem where the main content area in the author is not populated - this gets fixed by 34697 …

21.

FP

30815

05.04.11

22.

FP

34697-4.0

28.04.11

Many bundles stop & restart themselves here. Let it settle for 10-20 minutes.

23.

FP

34334-2.0

13.05.11

Cq-dam-core did not update to its new version number. Restart required.

It needed starting manually after the restart but it had picked up the right version.

24.

STOP & TAKE A BACKUP!

25.

FP

34901-3.0

13.05.11

There is no need to implement the ‘eventadmin.jar’ workaround anymore.

Wcm-core was at 5.3.68 and should go to v5.3.72.

There is no need to restart, since 34901-3 (3 Oct 2011).

26.

FP

34071

19.05.11

On Perf, had to wait 10 minutes for “the Loading symptom” to disappear.

On test1, 2 bundles stopped and had to wait 5 minutes to resolve themselves.

27.

FP

36021

Got a 500 error (TopLevelComponentContextImpl) when viewing web pages.

A restart fixes this.

28.

33200-9.0

29.

Restart the server for good measure!

30.

Final Step: Turn on the replication agents