Joseph Ryan Glover

Random stuff.



I found the following poem in my grandfather’s papers. His name isn’t on it but I’ve searched excerpts in google and received no results, so I think it’s safe to say he wrote it. I like it because it’s an authorial effort beyond his typical mountaineering work and therefore seems more personal. I also appreciate any work thematically dedicated to the premise that time destroys everything. I did some research and I was able to identify the sports stars he name checks in the second verse– some known to me and some not– and what they all have in common is that they were active in the early 1930s. Unfortunately, I’m not able to identify the sports hero who let him down by getting old. Structurally, while I’m not quite as taken with alliteration as Joey, I applaud his flair for word selection, “bepaunched” being the stand out.

Was he not ten feet tall?
What deft Apollo of some forty years flown by
Whose feats the frenzied fans in terraced roar acclaimed
In long gone golden days by Summer suns sustained

His peers were Louis, Lindrum, Owens, Bradman, Budge:
And other giants whose names in banner headlines blazed
Ousting from pride of place – dirt – death – and musty politics.
HE was the grand occasion – HIS the immortal hour.

But who claims this vacant visage, glum and grizzled-grim,
That now empanelled, prisoner in this soulless box,
Mutters and mumbles, tepid, tame, ineptitudes?
Is this the foot, the canny hand, the cunning eye
That once wrote living history history in the record books?
Records ‘tis true, now long since dead, deserted dust;
Bests bettered by a lesser breed of men propped up
By scientific guile and gimmicks, more advanced techniques.
His were the noble arts of nature, native born;
Practiced perfection gained by simply playing the game.

He rambles round in tangents; fumbling, out of touch
With every question – answer: with future, present, path.
His mind’s a blank whereon vague ghosts of by-gone years
Bring back to life their half-forgotten phantom rivalries.

He falters, pauses, brooding, gnaws his pallid lip – and then
In one sad, senile, hotchpotch introspection
Entangles cricket, football, tennis, boxing, golf.
Has Well at Wembley, Lynch at Lords,
And Lovelock race ‘gainst Snead;
And Perry partner Peterson – in Wightman Cup!
His “In my day” and “When I was a boy”
Grate on the slate of schoolboy recollection.

Where has this shadow been since nineteen-thirty-nine
When heroes to the greater contest lent their lives?
The name’s the same – but nothing else, alas, survives.

Would that I had not switched my channel choice,
For ‘Frisco cop, or wrestling, kitchensink,
Were preferable to this bepaunched, lamenting goat
Who bleats and natters now in resurrected fame.
This whining, whinging, wan, would-be conquistador
Now tilts at tinsel windmills, splintered lanes askew.
Here is no hero – gone is the golden knight I knew.

Oh what a sorry, somber senseless sight is here;
My graceless, grumbling, greybeard God of yesteryear.

Analysis of Ontario Primary School Class Sizes Part One: Verifying Ministry Claims


By Guinness323 (Own work) [CC-BY-SA-3.0], via Wikimedia Commons

To practice my skills as a data analyst I decided to look into some of Ontario’s Open Data data sets. The primary school class size data set interested me because I have children in the primary grades and the topic is often in the news. 

Summary of Analytical Findings

  • The Ministry of Education website claims all 2013-14 Grade 1, 2 and 3 classes have 23 or fewer students. The data reveals 7 classes in the 2013-14 school year with more than 23 students.
  • The Ministry of Education website claims that 90% of all 2013-14 Grade 1, 2, and 3 classes have 20 or fewer students. An analysis of the data reveals that 89.89% of classes have 20 or fewer but this result requires the exclusion of 412 JK/K/Grade 1+ split classes and 2,213 Primary/Grade 4+ split classes. Including the Primary/Grade 4+ classes reveals that 83.71% of all classes have 20 or fewer students.

Introduction to the Analysis

Like many parents of children in the primary grades (JK, K, Grades 1, 2 & 3) I am familiar with the Ontario Government’s 2003 class size reduction initiative (PDF) which sought to have 20 or fewer students in 90% of all primary classes and 23 or fewer students in every primary class by the 2008-09 school year. To monitor their success the Ontario Ministry of Education maintains a Class Size Tracker website for visitors to browse class size data by school and school board. On the home page of the website (as of August 20, 2014) is the following claim for the 2013-14 school year:

Screen Shot 2014-08-20 at 9.12.49 PM

The Ontario Government also supports an Open Data program from which they serve over 1000 data sets related to Government operations. On February 1, 2014 they published the latest Primary Class Size data set in the form of an Excel spreadsheet that lists the class enrolment details for every primary class in Ontario going back to the 2007-08 school year. I decided to check if the published data supported the claims on the homepage.

All primary classes have 23 or fewer students

The first claim, that all 2013-14 classes have 23 or fewer students was straightforward to investigate because each class in the Open Data Excel file indicates how many students of each grade are in the class. Class size is calculated by summing these numbers. After sorting the new class size column I found 5,438 classes with more than 23 students.

I was confused until I noticed that the Government excludes full-day kindergarten classes from the analysis because these classes have both a teacher and an early childhood educator and are permitted to go over 23. To account for this I excluded every class that had a JK or K student and in doing so also excluded 412 Grade 1+ split classes that had JK or K students.

Further close reading of the Ministry website indicated that only JK through Grade 3 classes are considered primary grades. The Open Data data set includes a column labelled G4To8 which is the number of grade 4 and up students that are split with primary students. By excluding the 2,213 classes that include at least one grade 4+ student I was left with a final count of 19,586 classes down from the original 33,523.

Sorting these remaining classes by class size I found 7 classes with more than 23 students which means that 99.96% of the 2013-14 classes defined as primary classes by the Province have 23 or fewer students. Re-running the analysis and including the Primary/Grade 4+ split classes resulted in 14 classes having more than 23 students and 99.94% of classes having 23 or fewer students.

Using all the data results in the aforementioned 5,438 classes with more than 23 students which translates to 83.78% of classes with 23 or fewer students, which is a better percentage than is published in the Ministry’s FAQ where they claim the number is 73.1%.

Screen Shot 2014-08-20 at 9.35.33 PM

I understand the reasoning behind excluding JK and K classes from the calculation; two supervising adults should allow the number to be higher. I can also understand the exclusion of the JK/K/Grade 1+ splits if those splits also have an early childhood educator. I don’t think it is reasonable to exclude a primary class (typically a Grade 3 class) from the calculation because it happens to have some Grade 4 students in it. Admittedly, the difference between the two results is tiny for this claim but it plays a larger role in the next one.

90.0% [of primary classes] have 20 or fewer [students]

The second claim required a frequency analysis. After first eliminating the JK and K classes, the JK/K/Grade 1+ splits and the Primary/Grade 4+ splits I created a histogram of classes sizes. From that histogram I created a cumulative distribution function chart of class sizes.


The labelled data point is for a class size of 20 and the chart states that, for 2013-14, 89.89% of classes have 20 or fewer students. I suspect that the Ministry analysts rounded this 89.89% up to 90% (never round up to important breakpoints) to support their claim.

Re-running the frequency analysis but including the Primary/Grade 4+ split classes results in the following chart:


Here the class size of 20 occurs at 83.71%, which is meaningfully different from 90%. The percentage point drop occurs because of the inclusion of all 2,213 Primary/Grade 4+ splits but even if I only include those Primary/Grade 4+ split classes with as many or more primary students than Grade 4s — 1,120 of the 2,213 — the 20 or fewer breakpoint occurs at 86.66%. It seems that only by excluding primary classes split with Grade 4s can the province make its 90% number.

This analysis just scratched the surface of the data set and revealed that you have to look closely at the assumptions behind the numbers to really understand what’s going on. I’m planning a second analysis that looks at the prevalence of split classes in the coming weeks. 


"BarnesmoreGap" by Detroit Photographic Company - Library of Congress Prints and Photographs Division. Via Wikipedia -

“BarnesmoreGap” by Detroit Photographic Company – Library of Congress Prints and Photographs Division. Via Wikipedia –

I found the following story in my grandfather’s climbing papers. The David in the story is David Ballantine Glover, my late father. The Alyson is Alyson Glover nee Bolton, my late grandmother. Kim is my grandfather’s faithful dog and climbing companion. Also, I assume, late. Editorially, I have added some paragraph breaks to the first part of the story to make it read better; I believe my grandfather would forgive me this indulgence as I have not otherwise changed the text. I find this story amusing because, at the time of this blog post, I have a 7-year old son. I suspect that my city boy would not fare as well as my father did in this story, if I could get him off his Nintendo and out to a mountain in the first place. Also: ‘Consternation!!’ is my new curse word in polite company.

SATURDAY, 28th July, 1956.



Shortly after lunchtime, Barbara, Liz, Edgar and Joey set out for a climbing expedition in Barnesmore Gap. A last minute addition to the party was Joey’s offspring David, aged seven. The A.50 proceeded uneventfully via Strabane and Ballybofey and came to rest as the west end of the Gap and in a short time all five were straggled out on the slopes of Croagconnellagh.

Before long Barbara stopped at a climb which she and Denis had tried out unsuccessfully last year. Joey went to the top to bring her up on the rope but first he instructed David to head on quietly upwards and to stop at the cairn if he was first to arrive at the summit. With Edgar’s help Joey then proceeded to bring Barbara up but the climb proved a tough nut to crack and Barbara was eventually hauled up by the combined efforts of J. E. and L.

Joey then decided it was high time to get a sight of David and dropping the rope he headed off the top at high speed. He was horrified to see how much mountain there was but relieved to see young David going over the skyline accompanied by Kim – though he was not heading for the cairn but heading N.W. along the top. Soon J. passed over the skyline also and strolled over to the cairn. No. David! Consternation!!

Shouts were unavailing as the high and cold wind carried them away. Joey now sped off N.W. along the top but saw nothing. After half an hour or so he dropped down along the S. side to contact the others. Barbara promptly organized the quartet into a search party but two hours of assiduous searching produced no David – or Kim. It now being past 6.00 p.m. it was resolved to organize a proper search party so Barbara and Liz descended to the A.50 (and Joey also for his boots – he had been in tennis shoes) and these two sped into Dongeal town to contact the Garda.

Meantime Joey climbed up once again to the rejoin Edgar and continue the fruitless search. The clouds had now descended and the weather was steadily deteriorating and about 9.30 p.m. they were about to give up when long blasts of the horn on the A.50 from the Gap attracted their attention. On descending they heard that David was safe and well and was in the Central Hotel. The search now appeared to be for Joey and Edgar!

Later, the details of this horrible afternoon were pieced together and it appeared that David had reached the cairn but fearing the clouds were about to come down he had headed off, not towards the Gap and the main party, but northwards towards Croaghaniwore. Descending to the Barnesmore River he had eventually followed Kim when he turned left and reached a rough bog road which led to a cottage. Here he accosted a Mrs. Martin and explained that he was lost and that lady and her husband drove him into Donegal town where he was delivered to the Garda Barracks.

While all this was going on, and while Joey and Edgar were floundering about Croagconnellagh, Barbara and Liz had contacted the Garda after some difficulty. Sergeant Martin Hughes took complete charge and in a very short time had got together several carloads of searchers. Almost immediately after these had set out David arrived and the search was called off. Those heading for the north and west of Croaghconnellagh searched for some time before they were contacted but eventually the question was “are Joey and Edgar now lost?”. However, all ended well and about 10.00 p.m. Joey and David were reunited and the whole party headed for Derry where they arrived at 12.50 a.m. on Sunday to find that Alyson had been busy ringing up various hospitals!

N.B. The Club Members may be in need of practice at searching for “lost” members but Joey feels that realism is not called for to this extent! However, this will teach him to keep an eye on his offspring!!


Across the Nephin Beg Range


This story was found in my grandfather’s documents on 13 pages of photostat-ed onion sheaf paper. The first page has a hand written note that  says “3,300 words uncut — too long!” which leads me to believe he wrote it for publication and needed to edit it for length. Throughout the document long passages have been crossed out in pencil but I have included the entire text here since electrons, unlike newspaper inches, are free. The type-written pages have several spelling corrections pencilled in and I made those corrections in the text below. I endeavoured to keep original spellings intact (e.g. pharaphanelia) for authenticity. The story was unfortunately undated but he does mention “ten years of hill walking” so this possibly places it in the early 1960s. Unusual aside: on the reverse of the last page of the story is written “9 eggs”. Hungry man’s breakfast?


Several years ago when perusing Bartholemew’s ¼” map of Galway and Mayo I noticed that a track was marked from Bangor Erris through the Nephin Beg range to Newport. My experience of these ¼” maps had taught me that these dotted lines represented anything from quite driveable roads to fantasies of the Batholemew mind – so beyond a thought that it should provide a good tramp if it existed I paid no further attention to the track. More recently I was examining the 1” O.S. Map of the Ballyoroy area and was most interested to see that the track existed “officially” – fifty years ago anyway. Later still I bought the ½” O.S. Map of North Mayo and here again the track was shown. This fine map showed the contours of the mountain range very clearly and as I studied the map a wild plan began to take shape in my mind. Would it be possible to walk all the way from Bangor Erris to Mulrany along the mountain tops – had it ever been attempted or accomplished?

Last Autumn I started to study the possibilities and difficulties of this project and to plan a route. At this time, in some ten years of hill walking, the longest route I had ever covered had been about fifteen miles – through semi-civilized mountain country – that is, country where the possibilities of “escape” routes existed at several points on the main route. The country in central Mayo however, is probably the wildest and loneliest in all Ireland and Nephin Beg range itself appeared to be hemmed in on all sides with the worst type of bogland and no roads existed for miles around.

The first thing to decide on was the starting and finishing points. I decided that for the benefit of the future climbers who might tackle the same project it would be better to establish definite, easily-found places. I felt that something like “about three hundred yards west of the second bridge on the main road to Ballina” would be unsatisfactory so I decided quite early on that Bangor Erris and Mulrany were the obvious starting and finishing points.

The next question was “which way round?”. This did not take long to decide after a brief study of the ½” map. North to south was the better way because (1) the dullest part of the route would be covered first (2) the views ahead would definitely be better going from North to South.

The total distance to be covered appeared to be about 27 miles with some 20,000 feet of ascent and descent and it would be necessary to make three major descents and re-ascents. It was also obvious, in my case anyway, that I would not accomplish the task with certainty, in daylight. I therefore decided that the best plan was to start late in the evening, walk all night and as much of the next day as was necessary. The alternative was to start at dawn and take a chance of finishing during the following night. I did not fancy that as I preferred no sleep at all to a short spell before starting.

The next problem was transport to the starting point and from the finishing point and I did not anticipate much difficulty about this as I felt that other members of the N.W.M.C. would willingly co-operate and indeed I expected that my trip would not be a solo one but that there might be three of us with a car-driver co-operating. Indeed this part of the plan did not take long to finalise and early in 1959 four of us were planning to tackle the route at the full moon in min-June, my companions being Barbara and Denis Helliwell.

Long before this I had been in touch with the Hon. Sec of the Dublin Section, to see if I could be put in touch with anyone who knew the area. I was given two contacts and one of these, Pat McMahon of Tuam subsequently gave me invaluable advice. He had, in fact already accomplished the feat in a south to north direction camping overnight midway along the route. IT differed very slightly from the reverse of the one I was planning, omitting only one top – Knocklettercuss, just south of Bangof Erris. I also received advice about bridges and other matters from Sergeant Hogan of the Gardai Barracks at Bangor so as May grew near I felt that I was as fully equipped with information as I could ever expect to be. I should mention, by the way, that it had never been my custom to plan very far ahead on my mountain outings but I felt that this was one where it might be dangerous to set out all “airy-fairy-like”!

The morning of Friday, May 22nd dawned very dull and threatening in Derry. I rang the Barracks at Bangor and found that the weather there was uncertain. I felt a fit as I had ever been and all keyed up but I realizes that unless the weather was settled and the night to come clear and moonlit tackling this route would be foolhardy. At 11.45 a.m. we four had a consultation and somewhat reluctantly agreed that it would be best to postpone our journey west until mid-June. As it happened the weekend that followed was as good as any this year! – but then that is typical of Irish weather is it not?

The next month crawled past and during it our driver dropped out of the project. We could not raise another one at short notice and since Barbara did not feel experienced enough a driver to handle a strange car it looked as if Dennis would have to tackle the trip from south to north while I tramped the other way. He was quite resolute in his intention of camping overnight before starting at 6.00 a.m. but this did not worry me as I knew he could comfortably cover three yards to my two and the time discrepancy would soon be gobbled up. Unfortunately Denis was off colour when June 19th dawned but he and Barbara insisted that I should go ahead as planned as they were quite willing to drive to west May and back to help me – such is the true mountaineering spirit! We had one other passenger as we set out from Derry at 2.30 p.m. – my little eleven-year old fox-terrier Kim, a faithful mountaineering companion for several years in many parts of Ireland.

I had planned a 9.00 p.m. start from Bangor and estimated I should arrive at Mulrany about 3.00 p.m. on Saturday. I should still have eight hour of daylight if I needed it. However, we rather ambled across Ireland, making business calls here and there and stopping for a meal twice. As a result it was nearer 10.00 p.m. than 9.00 when we finally drove into Bangor Erris. The weather when we left Derry had been a trifle dull and unreliable but as we drove westward it gradually grew brighter and more settled. At Bangor it was near perfect for such an expedition though perhaps a trifle on the cloudy side. So, at 9.50 p.m. precisely my dog Kim and I set off from the middle of Bangor Erris as the light began to fade and as Barabar and Denis made their plans to camp about a mile back down the Crossmolina road.

I had made a very exact plans of various parts of my route, basing these on the contours of the ½” O.S. Map and during the next five hours I was to learn how every accurate these maps are and the folly of heading over what I THOUGHT looked the best route (in moonlight) instead of sticking to the one planned.

Leaving Bangor I crossed the stone bridge over the Owenmore river and immediately found Kim was missing – maybe he had a premonition of what lay ahead! After he rejoined me I bore left across a playing field and very soon joined the rough track (Bartholemew’s!) climbing steadily across the west slopes of Knocklettercuss. The path was VERY rough and made heavy going so I took to the hillside rather sooner than intended, heading for the broad ridge-top. After some fifty minutes I was within sight of the top and here turned to look north to where in the gathering dusk I thought I could discern the spot where D. & B. were setting up their tent. I flashed our Club “call-up” signal and sure enough the car headlights flashed in reply. Somehow it made me feel less lonely and cut-off from civilization as I headed for the summit of Knocklettercuss (1,208’).

At 10.50 I arrived at the summit and sat for a minute trying to pick out my planned route across the bog to the east towards Maumykelly; this was to be probably the worst and dullest part of the route. I also took a last look at the sunset (indifferent I’m afraid) and at distant Carrowmore Lough before setting off on a more direct approach to Maumykelly than the one based on the contours. From where I stood I could see the great crescent of the hills I was to traverse and far away in the gathering dusk I could see Slievemore and the other heights of Achill.

Off I headed to the west and far the next hour I was time and again deceived in the half light and features (such as they were) that looked quite near proved to be disconcertingly far off. Land that looked like fairly level bog proved to be full of little dips with streams in them. I should have kept considerably father to the left than I did for the temptation to head straight for Maumykelly led me into trouble several times. However, after about 45 minutes I arrived at the bottom of this outpost of the Slieve Car group and crossed the last stream I was to encounter for hours. It was now quite dark and the moon being still quite low to the south-west and being hidden by the surprisingly step slope of the MaumyKelly (1,205’). This was one of those annoyingly hills which seemed to have an unattainable top – at least, I imagined on numerous occasions that “another few yards would do it” but I was wrong on all but the last occasion.

Arriving at the top (slightly to the east side) just before midnight I paused to watch various cars with headlights ablaze heading along the Ballina – Bangor road about four miles away. Behind me now lay several small loughs, just visible in the gloom but not at all visible when I had been amongst them half an hour before. The moon had gone behind a cloud, it was more than cool and fresh breeze was blowing. I shivered and not altogether from cold! I decided that this was my point of no return, early and all as it was on my route, because from here escape, although an irksome undertaking, was fairly straightforward. However I hadn’t come from Derry to give up quite so soon although I knew that what lay behind was simple compared to what lay ahead. I paused to debate my position for a moment. Night walking in bare open country, even by bright moonlight is quite a different matter from jaunting along in the bright sunlit day.

Slieve Car loomed far ahead and as it seemed a quite straightforward tramp to the huge cairn on top I set off. I was somewhat upset when I found myself descending into a distinctly dampish bogland and every step forward confirmed my opinion that I should have kept left. It was now about 1.00 a.m. and the wind sweeping over this lonely upland was getting colder and colder. Then Kim left me again and I stood whistling for him and wishing myself anywhere but where I was. Was it too late to turn back? Yes; Kim suddenly appeared from nowhere and off I set again hopping up and down through big – not quite of the jig-saw variety but tending that way. I kept to the east site of the broad top ridge overlooking a lough far below, and then another. Suddenly I felt very tired and dispiritied and I sat down for a good rest. 2.00 a.m. – would I never get to this so-and-so cairn. Up and on again – odd lights twinkling here and there, miles and miles away. Suddenly the slope steepened, the drop off to the left became quite abrupt and then the immense cairn appeared and at the same time the clouds cleared away and the moon shone bright and clear.

I scrambled to the top of the cairn (a burial mound I should imagine) on top of Slieve Car (2,369’) at 2.40 a.m. but did not stay long. Ahead to the south was a long stretch of high-level big and stoney ground which, according to my maps dropped fairly steeply off at the edges. The next forty minutes I spent circumventing myriad small pools and bogy patches but even now a pale, pale light began to appear in the east. Far ahead I could see the Nephin Beg and away behind and to the right I could just see the rolling mass of hills forming the latter part of my route. Achill was even visible as I pushed steadily forward across the plateau.

Eventually I reached the southern edge at Corsleeve (1,785’) and there was revealed one of the most strange and memorable sights I have seen in fifteen years of hill walking. A thousand feet below and stretching for miles to the south west and west were hundreds of little bog pools and streams with the bright moonlight using them to cut a glittering path through the black bogland. It was a sight never to be forgotten and one I had not expected.

Ahead and below me lay the twin pools of Lough Scardaun and beyond I could see the Nephin Beg wreathed in early morning clouds. Hoping these would soon disperse I pressed on and soon arrived at the west end of the Lough, forgetting Pat McMahon’s advise that I should keep well to the left. I was to regret this as there was an obvious ridge of higher ground leading on to Nephin Beg. However, I pushed on steadily. N.B. was another of these tiresome hills on which the top never seemed to get any nearer. Several times I saw sheep “on the skyline” but when I reached them I had just as far to go as below. I was disconcerted to se that the clouds had increased and they seemed to be blowing over top at a prodigious rate. The light was quite good now and I could see a great deal of the country around as I finally came up on to the large level top of the Nephin Beg (2,065’). Here I encountered the full force of the wind and found it a half gale. I certainly wasn’t going to be able to face into this sort of thing indefinitely and I became more than a trifle worried; I was not yet half-way. However I knew that just beyond the half-way mark the old track crossed my path and that this would serve me if I had to abandon the project.

By the time I had located the true top of the Nehphin Beg (a cairn of about five small stones!) I was well and truly off line so – out with the compass and map. Off I headed and after covering about three hundred yards I came down below the clouds. Nothing made sense – where there should have been mountains ahead there were valleys. No feature was recognizable and I knew that to make a major mistake now could be serious. Back I went to the cairn again – and the clouds! This time I headed slightly right (i.e. S.W.) instead of going S.S.W as I had first time. When I came out of the clouds again I seemed at first to be in as much trouble as ever but it was soon easy to distinguish one lough by its shape and then all at once everything made sense – moreover, the clouds suddenly lifted quite a bit and away to the east Nephin itself was plainly visible. The truly lonely majesty of these hills was most impressive.

Now I went down slightly and across to a subsidiary top at 1,356 feet. There was bright sunlight and the wind had vanished. Everything for miles was now clearly defined and as I sped down towards the old track now clear-cut in the valley below I suddenly felt exhilarated for the first time. Half way just passed, feeling fit though hungry and glorious hours of sunshine ahead – no need to hurry. The scenery was grand and in front to the S.W. lay Glenamong with its top wreathed in a small dark and persistent cloud.

It was 6.45 when I crossed the old track and paused for breakfast on the bank of the Bawnduff river. I took time to survey the scenery round me before setting off again at 7.00 a.m. on the long, long ascent to the top of the Glenmong (2,067’). The little cloud very slowly faded away and after heading slowly upward for about 1½ hours I finally reached the surprisingly broad top and had a chance to see what lay ahead.

There was a brief drop and then a rise to higher ground again surmounted by a very well erected and preserved cairn. Then there was a long gradual drop and a steady climb to the top of Cushcumcarragh (2,343’) from which a grand lateral ridge ran miles to the S.E. over Bengorm and the other tops terminating near Lough Furnace. There appeared to be climbing possibilities on the N.E. side of this ridge.

For some time now I had been keeping an eye open for Denis and Barbara. By prior arrangement they were bringing the car round to Mulrany and were to come eastwards along the ridge to meet me. As it was not yet 11.00 a.m. I was perhaps being a trifle optimistic! Doubtless I should see them from the next top. This was about 2,230 feet in height and lay less than a mile away. However, in between lay the most interesting obstacle I had yet encountered. A fine arête about 100/200 yards long. Although I had been feeling very jaded I decided this was too good to miss and my passage along the sides and top of this made the next fifteen minutes most exhilarating.

This ridge led up to the nameless top which we were later to call “the Green Monster” – a very apt name if one climbed it from the west as Denis and Barbara did. From the summit I was able to see most of the remainder of my route and I sat thinking deeply. It was obvious to me that one of two things would happen; either I would go too fast so as to finish in reasonable time and thus become completely exhausted or I would go slowly, not over-exert myself – and then fall asleep because I had been too long without any!

Just before mid-day I hurried off down the long western slope of the Green Monster searching ahead all the time for Denis and Barbara. Suddenly I saw them just west of the 1,446’ top above Glen Thomas. About mid-day we made contact and the feeling of relief heartened me no end. It was wonderful to have company again! For the last few hours Kim had been taking a poor view of the proceedings but the sight of a tin of dog-food, produced by Denis, heartened him too!

We sat and chatted about our experiences for fifteen minutes and then Denis and Barbara decided to take full advantage of the possibilities offered. They had intended to accompany me back to the car but we all felt it would be a pity if they had to retrace their steps when so much fine country offered itself. Off they set for the top of the Green Monster their intention being to push on to Cushcumcarragh and set off along the Bengorm ridge to the main Newport road about 2½ miles west of that town. Off I staggered, Westward Ho, finding that as I started uphill again that I was only just going to be able to make it.

The next top, 1,646’ seemed attainable. I would pause and consider carefully the alternatives of climbing down ten feet into a dip and up the far side, or walking thirty yards on the level to avoid this. The temptation to lie down and have a good long rest was almost irresistible – Kim succumbed to it several times!

The day was now brilliantly fine, there was no wind and it grew hotter and hotter. Stripped to shorts and encompassed all around by my pharaphanelia I struggled on. Going downhill even required stern resolution while each step upwards was torture.

Presently I reached Glebbanaddy Lough having covered 1½ miles in as many hours. The last top Claggan Mountain lay ahead and it seemed to have several tops in fact. I was mentally very weary too and changed my mind three times. First I decided to stick strictly to the main ridge right down to Mulrany – then I thought it would be more interesting to go out to the point overlooking Bellacragher Bay and come down to the main road. Having back-tracked for a time and reached this point I then decided I was better off where I had been so – back to the main ridge again! This became so irksome and bitty that I finally decided to take the shortest possible route to the road to the west. This was another mistake! – briars and later, man-made obstacles, such as barbed wire, made misery of this decision. Eventually however, I reached the road and in a heatwave set off on the last two miles to the car. How I made this I do not know – it was the worst part of the journey.

At probably 4.07 p.m. arrived at the car half dead from exhaustion and lack of sleep – and elated with the realization that I had achieved my ambition – and with the thought that I had to drive the car eight miles towards Newport!

I fell asleep innumerable times during that journey and eventually reached a stage when I had to stop the car every time traffic approached me from the opposite direction! And when I reached my “contact” point there was no-one there! I turned off the engine, settled down – and found that now I could not sleep at all. Denis and Barbara arrived about 5.30 p.m. having themselves covered about 15 miles in 8 or 9 hours.

Off we all set for Westport where I indulged in a hot bath and followed this with a steak and early bed. Surprisingly I felt fine now and not a bit sleepy but soon I was “off” and into the depths of twelve hours slumber.

Next day instead of setting off direct for Derry nothing would do us but a detour to Achill and Slievemore – just to taper off nicely.

A long story about a longish walk – try it sometime I can guarantee it is not overcrowded!

J.B. Glover
36 Strand Road

Joseph “Joey” Ballantine Glover

“… maybe someday they will see, they slay the land they strive to free”
Alan Tees

Joseph “Joey” Ballantine Glover (30 JUN 1916 – 23 NOV 1976) was my paternal grandfather. I never met him because he was killed by two teenaged gunmen on November 23rd, 1976 at his place of work, the Ballintine Timber Company, in Londonderry. He was likely killed by the Provisional IRA in retaliation for the shooting of a Catholic business owner the day before by the Ulster Freedom Fighters. He was shot nine times in the neck and chest. He was 60.

I started this post with Joey’s assassination to get it out of the way. His death was a brutal tragedy but it does not eclipse his legacy. Joey was a musician, a sportsman, a businessman and a leader in the community. He gave of himself for the public good, serving as the President of the Londonderry Chamber of Commerce and the Treasurer of the City of Londonderry. He was an accomplished organist, composer and arranger and his skills as an accompanist were sought throughout the North-West of Ireland. He was an early member of City of Derry Drama Club and the week of his death he was to be in a production of Antigone. He was a cultured man with many and varied interests.

Above all (quite literally) he was a mountaineer. He loved to climb the mountains of Ireland, Scotland, England and beyond and was a founding member of the North-West Mountaineering Club (NWMC). Formed in 1955 Joey was the spiritual leader of the club, pushing his fellow members with his passion for the sport, informed by his “intimate knowledge of the terrain of Donegal” and his irrepressible zeal. In his wonderful book, From High Places: A Journey Through Ireland’s Great Mountains, Adrian Hendroff writes that Joey was “an eccentric man of charisma and tenacity” with a “profuse enthusiasm and unflagging vitality for the hills”. These statements certainly characterize the man captured in the photos found on the NWMC website, which depict Joey as a quintessential outdoorsman, at home in his element and content with the world. I have gratefully reproduced a few of them below that I sourced from their photo gallery titled Times Past. The family has been so pleased to find these images as there are so few extant photos of granddad.

Joey 1955



The “brick-red sweater” that Hendroff describes in his book can be seen in the following photo, where Joey stands front and centre for the group shot. This photo is also precious to me because the first woman in from the left, the one wearing the brown coat, is my late grandmother Alyson Glover and the young woman in front of her is my dear aunt Lorna.


Joey was a fastidious recorder of his achievements, as befits a man who chose accounting as a profession (as has my sister!). In his journals he dutifully recorded each of his ascents, the mountain, the approach and date. When he was featured in a Sportsman of the Week article in the Londonderry Sentinel he concluded that Errigal must be his favourite mountain as he had climbed it 82 times. That data-driven answer belies my grandfather’s love for that mountain and it is on its peak that his cairn and grave marker were placed and his ashes were spread.


I have lifted this photo of Errigal (it is the centre peak) from Simon Stewart’s website about the Glover Highlander Walk. The Glover is a memorial trek first organized by the North-West Mountaineering Club that consists of a 20km walk from Muckish Mountain to Errigal over 8 peaks and with 2000 meters of combined ascent. In recent times the NWMC has been forced to curtail the Walk over concerns for erosion and other environmental impacts of having 300+ people tromp the course. While it seems that the walk hasn’t been officially run in a few years I did find a bulletin for the 2014 edition of the Walk so it still remains a popular challenge for ambitious hikers. I hope to one day visit Errigal and take the Walk but looking at the official walk profile (also sourced from Simon’s site) I’ll need to do some serious training before I attempt it.


I have written this blog post because I never knew the man whose name I carry and as I have gotten older, and had sons of my own, I’ve become more reflective. My father seldom spoke of Joey while he was alive and I was too young to understand why I should have pressed the issue. Thankfully, I have in my attic an old suitcase that is full of documents outlining my grandfather’s life. I intend to scan or transcribe some of them to share with the family (and whoever else may be interested) and when I do I will link them below for posterity.

Joseph “Joey” Ballantine Glover Fonds

ACROSS THE NEPHIN BEG RANGE – A transcribed first-hand account of a gruelling night-time walk of 27 miles written by Joey.

THE CHRONICLES OF BARNESMORE GAP or “THE DAY DAVID GOT LOST” – A transcribed short story about the time Joey’s son David (my father) got lost on a hike.

ON SEEING A BOYHOOD HERO ON T.V. – A transcribed poem in which Joey laments the corrosive effects of time.

Performing a Chain Analysis in Excel

Chain Analysis is the name I’ve given to the act of finding sequential, potentially related phone calls in phone records. The job is to identify if a particular number called a second number and if that second number called a third number within a specific time frame, say 10 minutes. I call these sequences of calls a chain and it can be used to identify whether someone might be issuing orders or instructions via third parties. The analysis can be done in Excel and while it involves some moderately-complicated Excel functions, once the spreadsheet is set up, subsequent analyses can be done easily.

Getting Started

To start, the first thing you need to do is arrange your data properly. In the screen shot below I have arranged some sample data where column A lists the calling number, column B lists the called number and column C lists the date and time of the call. In order for this technique to work properly it is important that the data set be sorted on the date time column.

For simplicity sake (and ease of reading) I’ve replaced the phone numbers with letters but really any string will work in the analysis. Likewise, I only have 7 rows of data but you can have a whole bunch more and the analysis will work just fine.

While we’ll need a few extra columns for functions there is no reason you can’t keep additional data in the rest of the columns. As long as you sort everything properly the rest of your information will tag along as you perform the analysis.


Notice in the list that there are some chains that are easily identified without resorting to a function. Notice how, on row 3, AA calls CC and then CC calls DD a minute later. Similarly, on row 6, GG calls AA and then a minute later GG calls BB. This is also a chain to be identified. The trick now is to write a function that is able to flag these rows so that we don’t have to rely on picking them out manually.

The Look-Ahead Column

To flag the row we’re first going to create a new column that, for each row, “looks ahead” at the rows coming up. What we want the function to do is look ahead to see how many subsequent rows fall within the time frame window we’re interested in. In the discussion above I mentioned 10 minutes so, for each row, we want to count how many subsequent rows occur within 10 minutes of the date and time of the current row. The screen shot below captures the function that I wrote for cell D2.


The very first thing to notice is that the function is wrapped in curly braces like this: {<function>}. This means that the function is what’s known as an array formula rather than a traditional scalar formula. To enter an array formula you click in the cell like you normally would but instead of hitting enter when you’re done you need to hit ctrl-shift-enter (at the same time) to tell Excel that it should be considered an array formula.

This particular function needs to be used as an array formula because we are asking the function to compare each subsequent cell to the current one in the IF statement, so we’re comparing an array of values rather than just a single one. Also, when you’re typing the function into the cell, don’t include the curly braces, Excel will add those after when you correctly use ctrl-shift-enter to complete the array function.

Let’s examine the IF statement next. The inner most part of the function is a request to check and see if the difference between cell C3 and cell C2 is less than 600 (our 5 minute cut-off). Since the date and time values in the C column are in Excel time we need to convert them to seconds before we do the comparison, which is why I multiply them by 24 (hours) and then 3600 (seconds) to get the difference into seconds. In the IF statement, if the difference between C3 and C2 is less than 600, then we return a 1 and if not we return a 0. Now, the tricky part is to wrap your head around the idea that, because this is an array function, we can use the expression C3:$C$8-C2 to do all of the comparisons at the same time. This means that we are doing C3-C2, C4-C2, C5-C2, etc, all the way to C8-C2. For each one of them we do the x24 x3600 conversion and comparison against 600 seconds. For each one of them we are getting either a 1 or a 0. The power of the array function method is that we can do all of those calculations at the same time, in the same cell.

Have a look at the SUM function that wraps the IF statement. Knowing that this IF statement is multivalued makes the SUM function make more sense. The role of the SUM is now to sum up all of the 1s that were returned by each of the IF statements. Have a look at the value of D2, it’s 3. It’s 3 because the date/time in C3, C4 and C5, are both within 10 minutes (600 seconds) of the value in C2.

Array formulas can be filled down just like regular functions so I have filled down the rest of the rows with this function. The key thing to remember is that you only need to compare the current row to the all the subsequent rows, and so the important bit of the function, the C3:$C$8-C2 bit, will always have the current row in the C2 position and the next row in the C3 position. As an example, in cell D4 the function is =SUM(IF((C5:$C$8-C4)*24*3600<600,1,0)) . Note that for row 4, C4 is used to subtract and the range of cells to check starts at C5.

Figuring Out the Look Ahead Row Number

The next step in the analysis is create a new column to hold the end row value for the look ahead. If the values in column D tell us how many rows to look ahead then the values in column E tell us what row that actually is. Have a look at the function in cell E2 in the screen grab below.


This function uses the ROW() function to first tell us what row we are currently in. Then the value of D2 is added to that to arrive at the final value. For cell E2 the value is 5 since the ROW() function returns 2 and the value of D2 is 3. What this means is that we only need to look ahead to row 5, from row 2, to find all the phone calls that occurred within the 10 minute window. Anything beyond row 5 is more than 10 minutes after the call in row 2. The end row is an important component of the next part of the analysis as we need to be able to specify exactly what cell range we should be looking at for determining chains.

Flagging the Chained Calls

Have a look at the screen grab for the final step of the analysis, the step where each row is flagged as part of a chain or not.


The heart of the function is the use of the MATCH function to compare the value in column B, the Callee column, with the values in the Caller column for all the rows up to the end row identified in column E. In order to do that we need to use the INDIRECT function, which is used here to assemble a cell range using the ROW() function and the value from column E. Inside the INDIRECT is (“A” & (ROW()+1) & “:A” & E3). If you evaluate this we can see that ROW()+1 becomes 4 (since ROW() is 3 and 3+1 is 4) and the value in E3 is 5, which leaves (“A” & 4 & “:A” & 5). If you slam all that together (which is what the &s are for) you end up with A4:A5, which is the range of cells we want to test our MATCH against.

Something else you need to know: when MATCH can’t find a match it returns NA, which is why the entire MATCH call is wrapped in an ISNA function call. If MATCH returns an NA value, then ISNA will return true, otherwise it returns false. I then feed that true or false into an IF statement (which is only interested in evaluating things to true or false anyway). If the value in the IF is true, that is, MATCH returned NA, then IF prints nothing, “”, to the cell. However, if MATCH returns a value and therefore ISNA returns false, then the IF statement will print “CHAINED” in the cell. For good measure, I wrap the whole thing in another IF statement to check and see if the look ahead rows in cell D3 are greater than 0. If not, there is no way there can be a chain, so the function just prints the empty string “” and we skip the whole MATCH, ISNA, IF process.

So let’s have a look at the results the function spit out. We can see that two chains have been identified. The first is on row 3 where AA called CC and then CC called DD within 10 minutes. This is what the CHAINED in cell F3 is identifying. The second CHAINED is for the chain that begins when CC calls DD and then DD calls FF (on row 5) within 10 minutes. The take away here is that chains are identified on the row in which they begin so it’s up to you to follow the chains from the starting point where they are identified.

Bonus Chains

The above example looks at Caller-Callee chains but almost the same function can be used to flag Caller-Caller chains, the ones where a caller calls a number and then within the time interval the same caller calls another number. In the screen shot below Caller-Caller chains have been identified for the second row and the sixth row and the only difference between the functions in column F and column G is that the MATCH in the column G version is being done on the values in column A (e.g. A2) rather than column B (e.g. B2). All the same logic applies but the change in the matching column changes the chaining subject.


As can be seen in the chart, row 2 is a Caller-Caller chain because AA calls BB and then within 10 minutes AA calls CC. Similarly, on row 6, GG calls AA then GG calls BB with 10 minutes. Just like with the Caller-Callee chains above, the CHAINED flag is listed on the row where the chain begins and it’s up to the analyst to follow the chain through.


While the functions in this post can seem a little hairy at the start I think, with a little exploration, that you can become comfortable enough with them to trust the results. While it’s true that a custom VBA function could be written to perform this analysis I think it’s important to find solutions that first attempt to use the built-in Excel functions before resorting to custom code. I hope this post has demonstrated how a fairly sophisticated kind of analysis can be performed with just the tools that come in standard Excel. Good luck!

Evaluating the Quality of Sales Leads Generated From an Online Contest

This post has nothing to do with law enforcement analysis but I want to get it up on the Internet for anyone who might be interested. Perhaps it will be helpful to our friends in competitive or business intelligence (yes, I know they are very different things).

I recently completed an analysis project for a client that ran an online contest. The contest was intended to generate leads for sales calls so it made sense to adopt a single-entry model for contest entrants (you only need a lead’s information once, after all). The contest was deliberately simple in scope and implementation running for a single month and eschewing instant win and daily prizes in exchange for a single large prize awarded in a random draw after the contest closed. Because the contest was focused on lead generation I felt it would be interesting to analyze and assess the quality of the leads. These are the findings.

The Entry Profile

The contest ran for 31 days and received over 7,000 entries. The following chart illustrates the daily entry profile for the time frame of the contest.


A few days after the contest launched, entries began to ramp up and peaked on the fifth day with over 1,000 entries. After the initial peak, there was a steady but decreasing cycle of entries until the end of the contest. Given the single-entry model of the contest the decrease in entries after the initial surge was expected.

A notable feature of the entry profile is the resurgent spike of entries beginning on the day 23 that carries momentum through to the end of the contest. To understand this behavior it is important to know that the contest was run nationally for a Canadian audience and was therefore conducted in both our official languages: English and French.  The following chart splits contest entries by language and reveals that the majority of the entries on day 23 and day 24 were made through the French version of the site and that the additional boost of French entrants is responsible for the observed pattern.


While the initial spike in entries can partly be explained by a media buy early in the contest period it does not explain the later in spike French language entries. I wanted to determine what other factors might explain this observed behavior. Having developed contests in the past I speculated that the spikes could be connected with contest clearinghouse websites where enthusiasts who enter contests as a hobby meet and share information about new contests.

Contest Clearinghouses

“Contest clearinghouse” is the name I have given to community-driven websites that focus on aggregating links and information about contests that are being run online. I have given visitors who frequent contest clearinghouses the name “contest enthusiasts” as these individuals have typically made it their hobby to enter as many contests as possible in the hopes of winning prizes. Contest clearinghouses are characterized by forums where enthusiasts:

  • can swap tips and advice on new contests
  • brag about contest victories and discuss how to maximize their chances of winning
  • use web-based tools to track which contests they have entered and to create reminders for such things as which contests offer once-a-day entries (so that they can remember, for example, to re-enter every day to maximize the likelihood of winning).

To study the impact of contest clearinghouse websites on entries I analyzed traffic patterns from the contest website analytics and discovered that, for the contest entry page, eight of the top ten referring sites, including the top two, were identifiably contest clearinghouse websites (while the remaining two referrers were social media links and were likely contest-centric). While I expected a certain number of visits and entries in the contest would come from contest enthusiasts what these results indicate is that the contest page views were completely dominated by visitors coming from contest clearinghouses.

At the top of the referrers list was and extracting the daily referrals for reveals the following traffic pattern.


It is clear from this chart that on day 5 of the contest the community became aware of the contest and sent a spike of traffic our way with over 400 unique visitors. While the entry profile above reveals that peak, on day 5 the site had over 1,100 entries and an analysis of the IP addresses of entrants indicates that nearly 400 entries came from IP addresses with multiple entries (the only limited criteria for a ‘unique’ individual was that their email address be unique, many entrants entered several family members from the same computer) which implies that site visitors, which are only counted once, were very likely entering the contest more than once and therefore generating multiple entries.

A similar analysis of website traffic on day 23 reveals that the number one referrer was, a French contest clearinghouse, and their referral profile reveals a similar behaviour to Prior to day 23 of the contest there was no traffic from and then a significant spike that continued to drive traffic until the close of the contest. The likely scenario again is that the contest clearinghouse first became aware of the contest on day 23 and drove its community to the site so that they could submit their single entry.


The take away from both of these referrer profiles is the realization that the major peaks in contest entries are directly connected to spikes in contest clearinghouse referrals.

More Analytics

The volume of referrals from and, while illustrative, are not unique. Web analytics revealed that fully 90% of the unique page views for the English contest page came from visitors that were landing on the contest page or who were, in other words, being linked directly to the contest (a common practice for contest clearinghouse sites). And while the volume of traffic on the French contest page was smaller the pattern of behavior and the proportion of directly linked visitors was essentially the same as the English site. These results indicate that the vast majority of contest page views were dominated by contest enthusiasts.

Perhaps more damaging than 90% of unique visitors cutting straight to the contest page is the fact that 80% of the visitors to the contest thank you page (reached after a contest entry is submitted) proceeded to exit the site. This reveals that, for the majority, entering the contest was the only purpose for their visit. Furthermore, of the remaining 20% of thank you page visitors who didn’t immediately exit the site 17% returned to the contest page, presumably to attempt to enter the contest again. This means that the rest of the site content promoting the brand and various products was ignored by 97% of the contest page visitors.

Other metrics support this interpretation as they reveal that the visitors who landed on the contest page spent an average of two and a half minutes on the site, which is just about enough time to complete the contest form. The analytics also reveal that those same users visited an average of 2.5 pages per visit (keep in mind that the contest entry page and the thank you count as two pages) which suggests that the majority of users landed on the contest page, entered the contest, and left while a smaller core of visitors, the 17% mentioned above, cycled between the contest and thank you page effectively driving up the pages per visit stat.

Despite the high volume of contest enthusiasts there is one possible saving grace, and that’s if contest entrants honestly used the opt-in/opt-out checkbox for indicating whether they agree to further contact from the manufacturer. While 61% of contest entrants did agree to further contact this number is unfortunately suspect. There is anecdotal evidence to suggest that contest enthusiasts harbor a number of (perhaps founded) superstitions that if they do not “play the game” and agree to further contact they will be (illegally and unofficially) excluded from winning the grand prize. While this was not true for this contest the fact that such a high percentage of entries that clearly originated from contest clearinghouses agreed to further contact suggests that entrants were willing to chance a sales call in exchange for being counted as a compliant entrant. Unfortunately for sales people this practice further dilutes the pool of leads by burying legitimate expressions of interest amongst those entrants who have no intention of purchasing the manufacturer’s product but indicated otherwise.


It seems clear, based on this analysis, that contest visits and entries were completely dominated by contest enthusiasts. Recognizing this, the first question that needs to be asked is: are entries from contestants connected with contest clearinghouses a source of viable leads? The problem as we see it is that hobbyists connected with contest clearinghouses enter contests because they enjoy winning prizes and when a product or service is free, the brand is a secondary concern. Essentially, they want a product when it’s free, but they want everything when it’s free, and their expression of interest should not be taken as particularly sincere.

As a client looking for sales leads these results raise a host of troubling questions:

  • Would the type of people who enter any and all contests in order to win free stuff be amenable to a sales call to purchase your product?
  • How many sales calls do they field as a result of their hobby and would they even remember that they entered a particular contest considering that their MO is to enter as many contests as possible?
  • Finally, if the leads cannot be considered qualified are the names and phone numbers gathered through the contest any more valuable than simply cold calling the phone book?

Based on everything I’ve seen, I’m forced to conclude that the quality of leads from this online contest, and any contest dominated by contest enthusiasts, is extremely poor.

Month-over-Month Crime Stats Aren’t Useful (and two alternatives)

Do you prepare a monthly crime report? Good. Do you break down the number of incidents by crime type, for example assaults, break and enters, robberies, etc? Good. Once you have this month’s numbers, do you prepare a plus/minus over last month’s numbers, maybe with a splash of red or green to indicate direction? Pointless. I’ve seen too many crime reports, comp stat presentations, power point slides, what have you, that compare a current month’s crime (or week’s crime) to the prior month (or week). This is not a helpful exercise because crime has, as any large scale analysis of your incidents will indicate, an annual cycle.

The number of crimes reported to the police has three distinctive cycles. The first is daily, you get a different number of crimes being reported depending on the hour of the day. The second is weekly, depending on the day of the week, you’re going to get more calls. The third and final cycle is annual, you get more calls in the summer and fewer in the winter (in the northern hemisphere anyway). The following diagrams illustrate these cycles.





Now, I realize that day of week cycle doesn’t look like anything legitimate but keep in mind that the volume of incidents is high enough that the difference is statistically significant. I encourage you to perform an analysis on your own data to demonstrate to yourself that what I’m saying is true.

Because crime has an annual cycle that means that if your crimes bottom out in January and peak in July then every month from January to July, on average, your current month’s stats are going to be higher than the previous month’s. Similarly, after you reach the peak your current month stats are going to be less than the previous months. The problem, of course, is that when these stats are reviewed each month commanders are going to be chewing out underlings and developing strategies to counteract the perceived rise in incidents during the first half of the year and celebrating a job well done in the second half of the year. Of course, this is an exaggeration, cops know as well as anyone that they get busier in the summer than they do in the winter but to the untrained observer (the media, the public, etc) this kind of stat reporting is not helpful.

Fortunately, there are a couple of alternatives:

The first is to compare this month to the same month from the previous year. This is a solid indicator since, barring drastic changes in the long-term trend (which you should quantify!), the numbers should be viable for comparison. The number is viable because it compares two points that are on the same position in the annual crime cycle.

The second alternative is to only compare a year-to-date (YTD) stat and forget the whole month-over-month nonsense. Like option one above, a YTD stat will provide you with a legitimate yardstick to compare one time period to another. The yardstick is legitimate because it takes into account the annual cycle by summing up everything since the beginning of the newest iteration.

So there it is: don’t perform month-over-month stat comparisons. There are two equally easily calculated stats that will fill the spot in the report and provide actual insight. The key is to remember the natural crime cycles.

Performing a Statistical T-Test in Excel

The value of the t-Test is that it can tell you if a difference you are seeing in two measurements is legitimate or if the difference is likely to be just from randomness. In this blog post I am going to use the t-Test to determine if the difference in the number of daily motor vehicle collisions during the winter is meaningfully different then the number during the summer. While this is kind of a no-brainer example it will help illustrate the value of the t-Test and give you some ideas on how you might apply it to give some statistical rigour to your analytical findings.

Before we get into the test let me provide some background to help understand what it is we’re trying to do. In statistics we are always comparing two hypotheses. The first is named the null hypothesis and the second is named the alternate hypothesis. We almost always want to reject the null hypothesis and accept the alternate one because in that situation it means something interesting has occurred. For our motor vehicle collision example we want to determine if there is a meaningful difference in the number of daily motor vehicle collisions during the first quarter of the year (i.e. the winter) versus the third quarter of the year (i.e. the summer). Now, you may be thinking “obviously there’s a difference, the weather” but let’s, for the sake of education, assume that we need to demonstrate statistically that there is a difference. In this study the null hypothesis would be that “there is no difference between the quarters, the  number of collisions are the same” while the alternate hypothesis would be “yes, there is a statistically significant difference between the two quarters”.

The term “statistically significant” needs some explanation. When we want something to be statistically significant we want to be able to say, with a particular level of confidence, that the results we are seeing are not just due to chance. First we pick our “confidence level” or how sure we want to be and then, through the magic of statistics, we are provided a number that our results have to beat in that we can be that sure. In this example we’re going with the 95% confidence level which means that if we find a difference between the average number of MVCs during the winter and summer we want to be 95% sure it’s not due to chance. Or, in other words, we are willing to wrongly reject the null hypothesis only 5% of the time. We can up the confidence interval to 99% if we want but, as we’ll see, we’ll need to adhere to even stricter conditions. Whatever we choose just remember that the “significant” part in statistically significant doesn’t mean “important” (as significant typically means in everyday conversation) it just means “not due to randomness”.

How do we go about demonstrating that the two quarters have real, non-random differences in the number of daily motor vehicle collisions? The first thing we do randomly select from the first and third quarters 30 different days worth of MVC counts. The screen grab below shows my randomly selected MVC data.


Notice that we there are two columns of data, one marked Winter and one marked Summer and each has 30 entries. Also notice that at the bottom of each column is a number labeled “Mean”. In those cells I have used Excel’s AVERAGE function to find the mean of all the numbers (add them all up, divide by 30). And behold, the means are different. That proves the quarters are different, right? Not quite. We still have to deal with the peskiness that is statistical significance. It’s possible, after all, that it’s just through chance those means are different. As discussed above we want to be 95% sure that they aren’t.

To prolong your agony I’m actually going to show you two ways to perform the t-Test: the (relatively) quick way using Microsoft’s Data Analysis ToolPak Add-in and the longer, manual way that uses built in Excel functions.

First, using the Add-In. 

The Analysis ToolPak is a free Add-In provided by Microsoft as part of the default installation of Excel. Add-Ins are a kind of Excel file, typically with a .xlam extension, that package together a bunch of related functionality. To get started with the Analysis ToolPak Add-In we need to check if it is installed. You may or may not have it already installed but to check if you do you click the “Data” tab in Excel and look at the far right. If you see “Data Analysis” as an option, it’s already active (see screen shot).


If you don’t see it that means we need to turn it on. First, click the green “File’”tab (or the Excel jewel if you’re in 2007) to call up the Save/Open/Close menu. Look around the option named “Options’”and click it. This will open up a window with a menu on the left. Click on the menu option named “Add-Ins” on the left hand side. The screen will change and at the bottom, beside the word “Manage”, will be a select box and a button labeled“Go”. Click the“Go”button to open yet another window and you will see a series of check- boxes. Fingers crossed that one of them says “Analysis ToolPak” and is unchecked. Click the checkbox and click ok. Now go back to Data tab, look at the far right and you’ll see “Data Analysis’”. You’re set.

Now that the ToolPak is installed click the “Data Analysis” button and a modal window will open up. Click “t-Test: Two-Sample Assuming Unequal Variances” in the list and click the “OK” button and this will open up a new window (check the screen shot below) with some options. For the “Variable 1 Range” click the little arrow button and select all of the “Winter” data, which is in B2:31. For “Variable 2 Range” select all the “Summer” data from C2:C31. For the “Hypothesized Mean Difference” enter 0 (since we’re trying to prove they are the same, the difference of the mean should be 0) and everything else can stay the same. However, notice that there is field named “Alpha” that has the value 0.05. It’s not a coincidence that 0.05 = 1 – 0.95. Alpha is another way of asking how sure we want to be.


Click the “OK” button and Excel will open up a new worksheet populated with a bunch of labels and values just like the one in the screen shot below.


So what are we looking at? Well, there are the means we saw. Next comes the variances, which are measures of how spread out the values are, and they are quite different (which is why we needed to use the test that assumed unequal variances) and a little further down we see our “t Stat” and it has a value of about 4.35. Skip down two lines to the value labeled “t Critical one-tail” and note that it is about 1.68. That’s good news for us because the rule is that if your “t Stat” is larger than your “t Critical” value then the null hypothesis can be rejected and our results are therefore statistically significant. This means that we can now state, with the power of math backing us up, that we are 95% sure that the daily average number of MVCs in the winter are greater than the daily average number of MVCs in the summer.

Second, the manual way. 

So we used a built-in tool to do a bunch of heavy lifting for us (as tools should) and we read an answer off a table. It works, but how did it work? How can we calculate these numbers if the ToolPak isn’t around but we still, for some reason, have access to Excel?

First, have a look at this screen shot because it will help to explain each of the steps I used to perform a manual unequal variance t-Test.


On the left are the two original columns of data, one for winter and one for summer. Notice that there are 30 entries for each season but that I used “freeze panes” so that I can show the bottom of the data (starting in row 32). For both seasons I calculated three additional variables, the mean (using Excel’s AVERAGE function), the standard deviation (using Excels SDEV function) and the count of the number of data points (using Excel’s COUNT function). These will all be needed in the coming calculations.

On the right hand side of the screen shot I produced my own little table and I’m going to walk through each of these variables and explain the thinking behind them.

The first needed variable is the Sum of the Squared Deviations for which I used Excel’s DEVSQ function as a short cut. Think of the variable like this: you have a mean value for  a season and you have 30 different data points that are either going to be a little bit more or a little bit less than the average (as data points usually are). For each data point, subtract its value from the mean and square it. Now sum those squared differences all up and that’s the sum of the squared deviations (or differences). Notice that I added together the sum of the squared deviations for both the winter and summer seasons together into a single value.

Why did we need the sum of the squared deviations? Because we want to calculate the pooled sample variance. Why pooled? Because we put both the winter and the summer values together. But what’s variance? The average of the squared differences from the mean. This is easy to calculate, just divide the sum of the squared deviations by the number of points we have in our two seasons, minus 2 (1 for each season) to account for Bessel’s correction which is a correction for the bias in the estimation of the population variance (just trust me, subtract 2, Excel does).

The next variable is the Standard Error of Difference in Means and to calculate it we take the square root of the pooled sample variance times the sum of one over the counts for the winter and summer data points (i.e. 1/30 + 1/30). But what is the standard error of difference in means? It’s just the average expected difference between the means for two samples that have this many data points. We’re trying to determine, after all, if two means are significantly different from one another and, given the size of the two sample groups, we should expect that there would be some difference. Now, imagine that we run this little study over and over again with other, randomly selected groups of MVCs, each pair would have their own difference between the means. That sounds like a lot of work, so instead of running this study over and over again we can instead use the formula for the standard error of difference to answer the question: “what’s the expected average of that difference?”.

But what do we need that for? To calculate our t-statistic. And to perform that calculation we just divide the actual difference in the means (winter mean minus summer mean) by the standard error of difference in means to end up with approximately 4.35, just like the ToolPak did.

But we’re not done, we still need to determine our critical t-score at the 95% confidence interval. To do that we cheat a bit and use Excel’s built in TINV function. The first input is the probability that we want to test, in this case it’s 0.05 (i.e. 1-0.95) or the alpha from the ToolPak. The eagle-eyed will note that in the calculation in the screen shot 0.05 is actually multiplied by 2 and this is because we’re interested in replicating the “one-tailed” result from the ToolPak and since TINV is “two-tailed” we need to double up the probability to account for that.

TINV also takes a second parameter, degrees of freedom, and usually degrees of freedom is pretty straightforward (it’s just the sum of the two counts, minus 2, like for the pooled variance) except when we’re dealing with an unequal variance scenario. In order to handle this scenario we need to break out the more elaborate calculation defined by Moser and Stevens 1992 and spelled out in the ugly Excel formula:

=(1/B34 + F8/C34)^2/(1/(B34^2*(B34-1)) + F8^2/(C34^2*(C34-1)))

Where the value in cell F8 is calculated as =C32^2/B32^2 or the ratio of the squared standard deviations for each season. God, it’s a mess, I know. But just dissect the formula and you’ll see that it’s just a complicated formula using the counts for the two seasons in multiple places. One thing you may notice is that the resulting degrees of freedom is not an integer and that we have the approximate value 48.58. The ToolPak results above have a degrees of freedom of 49. This is because Microsoft rounds. You can choose to round or not, the results are pretty much the same.

With degrees of freedom we can finally run the TINV function and when we do we get a critical t-stat of 1.68, just like the ToolPak (although the ToolPak number is slightly different further into the decimals because I didn’t round my degrees of freedom). And just like with the ToolPak, our t-stat is way bigger than that critical t-value so we can reject the null hypothesis and assume that we have statistically significant results at the 95% confidence level. Yes!

And with that we are finally done our odyssey of manually calculating the unequal variance t-test. We’ve calculated the same results as the ToolPak and we got right into the nitty-gritty of it. Well done.


If you made it this far (and didn’t just skip the manual section) then bravo, that was a real grind. Either way, I hope you learned something about the t-statistic and how it can be used to add some rigour to pronouncements about means. I do, somewhat unfortunately, find that a lot of the time the t-stat plays spoiler in that decision makers see a difference in averages and conclude that something is working when the reality is that, statistically speaking, there is no significant difference. Nevertheless, it’s important to be able to back up your analysis with some math and t-stats are a great way to provide that oomph. Good luck.


A Quick and Dirty Phone Toll Technique Using Excel Pivot Tables

Analysts are often asked to perform phone toll analysis, which is the act of analyzing call records to see who called whom, when and how frequently. There are commercial tools for performing such an analysis but I’ve found Excel can easily get the smaller jobs done. In a typical phone toll analysis you have some numbers of interest, suspect numbers or the numbers of known associates, that you want to study in depth. The technique I am going to demonstrate below is not applicable to that kind of analysis. Rather, I am going to detail how to use Excel pivot tables to perform a different kind of phone toll analysis useful for when you are interested in number discovery.

The scenario is as follows: there have been three incidents with the same MO and it is likely that the same suspects are committing the crimes. You don’t have any suspect descriptions but you do have the exact location and time of each of the incidents. Working on a hunch that the suspects are using their phones during or shortly after the crimes your detectives secure a warrant for all cell activity in each of the three incident locations for each of the three occurrence intervals from all the major cell carriers. They then email you this data and ask you to do an analysis.

The first step is to sort out the data. Every carrier organizes their data differently but the common elements between them all are called number, calling number and the date and time of the call (sometimes combined as date time). To normalize the data first create a new spreadsheet with columns for date, called number, calling number, carrier and  location. These last two will need to be manually added to the spreadsheet based on data from the carrier and are used to differentiate the data between carriers. The intention of this step is to create a single ‘super sheet’ that combines all the calls made across all the carriers into one big list.

The stripped down super sheet in the screen grab below lists only 12 calls across three carriers but it’s quite possible that in a real life example you could be looking at thousands upon thousands of calls (depending on the size of your time interval and the size of your cell phone market).


After normalizing the data the second step is to make two copies of the super sheet (leaving the original as a source file). In the first copy, delete the ‘Called’ column and in the second copy delete the ‘Calling’ column. Finally copy all the data from the second sheet and paste it at the bottom of the data in the first copy and rename the column with the phone numbers in it as ‘Number’. The point of this step is to wipe out the difference between called and calling number since, when we’re fishing for common numbers, we don’t want to just rely on one or the other in case during one incident the suspect was the caller but in another the suspect was the called number.

In the screen short below I pasted in the content from the ‘Calling’ sheet to the bottom of the data in the ‘Called’ sheet and even though you can’t see the column named in the shot I renamed column 2 to ‘Number’.


The third step is to create a pivot table from the combined work sheet with the single ‘Number’ column. Highlight the whole thing and insert a pivot table using the PivotTable button under the ‘Insert’ tab (see screen shot below).


In the new pivot table work sheet drag the ‘Number’ column to the ‘Row Labels’ box and the ‘Location’ field to the ‘Column Labels’ box. Finally, drag the ‘Date’ field to be the ‘Values’ box to get something similar to the screen shot below.


What you’ll likely see (depending on how many sources numbers you have) is a very long pivot table with a lot of blank spaces. What you need to do is copy the entire table and paste its values into a new worksheet so that it can be manipulated. After pasting, highlight the data and apply a data filter so that each column has the drop down arrows for filtering out values.


Now it’s time to search for common numbers. If, as we hope, the same crew committed all the crimes and they used their phones during the different crimes then there should be numbers that have non-zero values in more than one column. In order to more easily identify the numbers use the filter drop down on the first location column (123 Main Street in the example) and deselect the (Blank) option. This should shrink the list considerably. Now, do the same thing on the second column be deselecting the (Blank) option. Any phone numbers that remain were used in both locations during the incidents.


By re-selecting (Blank) for the second column and de-selecting (Blank) in the third column you can check for common numbers between the first and third locations as well. And finally, by re-selecting (Blank) in all the columns and then de-selecting it for two and three you can find common numbers for those two addresses. You only need to do those three combinations as all other combinations (e.g. three then two, three then one, etc) are just reversals of what’s already been done and you’ll get the same results.

By noting down what numbers are common and then applying a filter to the original super sheet that maintained the called/calling split it is possible to find the carrier. You can then examine the original carrier documents to get more details and provide your detectives with a list of numbers they can follow up on.


One note, try googling the common numbers that you find before you send them off to your detectives. Often common numbers are things like the carrier’s voice mail number or a public service phone number for the municipality. The good thing is that if the number is one that the general public might use it will likely be heavily promoted on websites and show up near the top of search results. By pre-identifying common numbers you can save your detectives the potential embarrassment of asking a carrier for details on their voice mail service.