Tuesday, December 7, 2010

Psigh... A Response to Daryl Bem's Work Regarding the Existence of Psi

Clippy Versus the Spoon Benders:
An Analysis of Bem’s Retroactive Facilitation of Recall Using Microsoft Excel
J. Bruce Launey, Jr.
December 7, 2010
I recently read part of Daryl J. Bem’s 2010 paper, Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect.  After reading the portion of Bem’s paper analyzing the results of his experiments regarding the retroactive facilitation of recall, particularly experiment 8, I had some doubts as to the statistical significance of the data he achieved.  I was not entirely convinced that a T-test was the appropriate tool to analyze his data.  It seemed unlikely to me that a DR% of 2.27% (using Bem’s terminology) would truly represent an event that would be associated with a p score of .029.  In other words, I had a gut feeling that Bem’s results were simply the result of random chance and that it would not be all that unlikely to find results that appear to be even more significant purely due to chance.
In the interests of full disclosure, I do not believe in psi, precognition, clairvoyance, ESP, or anything similar.  As I understand it, nothing in science has ever supported the presence of psi.  By all accounts, the only experiments that have supported psi have not been repeatable.  As such, these experiments have not produced anything that could be called good science.
Further in the interests of full disclosure, I am not a scientist nor am I mathematician.  I have a bachelor’s degree in mathematics and am currently studying to be a high school math teacher, but I would be the first to say that does not make me a mathematician.
One last personal note – I respect the efforts by Dr. Bem in establishing good protocols for performing experiments on the possible existence of psi.  It appears he has created an opportunity for repeated experimentation on the topic.  He has been willing to step forward with controversial results while providing the method to test those results.  He, no doubt, expected a critical response from many in the scientific community but came forward nonetheless.
In order to attempt to test the significance of Bem’s results, I attempted to generate some data using a purely randomized approach through the use of Pseudo Random Number Generators (PRNG).  To do so, I created an Excel spreadsheet and relied on the built-in RANDBETWEEN function of Excel to generate pseudo random numbers representing the number of practice words a theoretical participant might have correctly recalled or preconceived through the use of precognition as well as the number of control words the participant might have recalled. 
Because Bem’s experiment 8 involved 100 actual participants, I created 100 columns of data in Excel, each column representing a single randomly generated participant.   For each randomly generated participant, a random number was generated to represent the number of practice words the participant recalled and another random number was generated to represent the number of control words the participant recalled.  The precise formula I used was RANDBETWEEN (0,24) for each of these.  This formula generates a random number between 0 and 24, inclusive, which simulates the possible range of outcomes for the number of practice words and control words that may be recalled in Bem’s experiment.  Where P represents the number of practice words recalled and C represents the number of control words recalled as in Bem’s paper, I found both P-C and P+C for each randomly generated participant as Bem did for each actual participant in his study.  I also applied Bem’s formula of (P-C)*(P+C)/576 to find Bem’s DR% for each randomly generated participant.  Finally, I averaged all of the DR% values obtained as Bem did during his experiment.
In addition to those calculations, I also performed an additional calculation.  I determined the net difference of P-C for all participants by simply summing these differences for all of the columns.  This would indicate whether the theoretical participants performed better or worse than chance as a group.  Since the expected value for each individual theoretical participant of P-C is 0 in this purely randomized (or pseudo-randomized) approach, the sum of all individual P-C values would also have an expected value of 0.  Thus, if the sum of these values for all participants was positive, it would indicate that the theoretical participants performed better than expected as a whole, and, of course, if the sum was negative, it would show they performed worse than expected as a whole.  The spreadsheet I used to perform the calculations noted above is Bem Data Tests.xls.
I ran 100 trials of the 100 randomly generated participants.  The overall results of these 100 trials are contained in the file entitled Bem Data Results – Final.xls.  In short, the results of these 100 trials show that randomly generated groups of participants in this purely mathematical version of Bem’s study achieved DR% scores greater than the 2.27% score of Bem’s actual participants in 34 of the 100 trials.  In other words, in a run of 100 trials of randomly generated numbers, over one third of the trials yielded results that appear more significant than Bem’s results. 
It is also worth noting that Bem’s weighted DR% calculation method produced 8 instances in which the overall net sum of P-C values had a different sign than the DR% value.  In 2 instances, the DR% was positive while the net sum of P-C values was negative.  In 5 instances, the DR% was negative while the net sum of P-C was positive.  In 1 final instance, the DR% was positive while the net sum of P-C was negative.  These trials indicate that the net effect of retroactive facilitation for the whole group of participants can be in direct contrast with the DR% value Bem calculated.
Even more significantly, one of the occasions in which the DR% was negative while the net sum of P-C was positive involved a DR% of -2.20%.  In all fairness to Bem, the net sum of P-C for all 100 participants in this trial was only 1.  Nonetheless, this trial demonstrates that a group of participants in the experiment could evidence little to no net effect from retroactive facilitation beyond that which chance would be expected to yield (as previously noted, the net sum P-C value for all participants has an expected value of 0) and yet still generate a DR% very similar to that of Bem’s experiment.  The absolute value of this and that of Bem’s experimental DR% value are very close indeed.  I would be interested in seeing the actual net of the sums of the P-C values from Ben’s experimental data to see if his results are similar to those found through randomly generated trial that yielded the DR% of -2.20%, i.e., whether, taken as a whole, the participants in Bem’s experiment performed at nearly precisely the expected results based on chance.
Observations Regarding the Results
It is worth noting that my assumptions in regards to the bounds of the randomly generated numbers are somewhat flawed.  I don’t believe these flaws bias the results either in favor of validating Bem’s findings or in favor of invalidating his findings.  In randomly selecting values between 0 and 24 for both the P and C values, it was possible to generate instances in which both P and C were 0.  In the actual experiment this would only occur if a participant was unable to recall even a single word from the list of 48 words which seems incredibly unlikely, and I would be quite shocked if this actually occurred during Bem’s experiment.  In fact, it is probably reasonable to assume that every participant recalled at least 4 or 5 words, but that is purely speculation on my part.  In creating my randomly generated data, it may have been more appropriate to set a lower bound for the randomly generated numbers at 1 or 2; however, it would not be impossible, although fairly unlikely, for a participant to have recalled 5 or 6 numbers from one list while not recalling any from the other so setting a lower bound for the randomly generated numbers at something greater than 0 would be somewhat arbitrary.  In any event, leaving the lower bound for both P and C at 0 did not unfairly skew the results of either value.
On the opposite end of the bound for the random numbers, I also made a choice not to limit the possible values although doing so may be reasonable.  It is very unlikely that a participant in this experiment would remember all 24 words from either the practice list or the control list.  Setting an upper bound limit at something in the range of 12 to 20 may have resulted in a better representation of the actual experimental results.  I choose not to do so in part because any other bound I might have chosen would be entirely arbitrary but also because I felt that leaving both at the same value and running 100 trials of 100 randomly generated participants thus 10,000 randomly generated participants offered enough data that this problem would not be all that significant.  Furthermore, as with the lower bound, leaving both upper bounds at 24 did not unfairly skew the results of either value.
After running 100 randomly generated trials of 100 hypothetical participants, I have been able to demonstrate that in more than one third of these trials the resulting data would have led Bem to conclude that psi or precognition contributed to the performance of the participants.  I believe this is an indication that the p value of .029 for Bem’s participant group is far too low.  Regardless, my findings strongly suggest that it is much more likely that simple chance led to Bem’s results rather than the influence of psi, precognition, etc.
In the end, I believe with Clippy’s help through the use of Microsoft Excel and a few hours of work, I have demonstrated that the results of Bem’s experiment could have been easily obtained solely through the operation of chance.  This, of course, does not prove the non-existence of psi, but that is not my burden to carry.  It simply demonstrates that Bem’s experimental findings do not necessarily point to his conclusions.
A Note About the Title
For anyone lucky enough to be unfamiliar with Clippy and spoon bending, a brief explanation of my paper’s title is warranted.  Clippy, a slightly anthropomorphized paper clip, would appear suddenly and without warning while using elements of the Microsoft Office suite such as Microsoft Word, Microsoft Excel, etc.  Clippy’s assistance was rarely desired and his suggestions were infrequently helpful.
Spoon bending, which was all the rage in the 1970s along with Disco and 8 tracks, was a skill supposedly relying on psi or some similar parapshychological force or power.  James Randi and many others have demonstrated many times over that this trick can be performed without relying on such forces or powers.  Once again, in fairness to Bem, I do not know if he ever researched whether psi or some other force of the mind played a role in spoon bending, and I am not attempting to unfairly lump him in with those who have suggested spoon bending is the product of parapsychological powers if never made such a claim.  It is fair to say, though, that some spoon benders have claimed that psi or a similar power has been the force behind the spoon bending.
I think my work has demonstrated that while psi may be strong enough to bend mere spoons, Clippy turned out to be a more formidable foe.  What are the odds that Clippy will be crushed by the power of the mind?  I think it fair to say that they are less than the odds that practicing for an event that has already occurred will somehow retroactively improve your performance in that event.


  1. Your assumptions about the bounds of the random function do in fact bias your results. Bem used the t-distribution, which is bell-shaped. Your random function results in a large variance, which is to say that your bell shape is wide, so finding a result greater than 2.27 is not so unusual. If the bell is narrower (i.e., lower variance), then the same result will be more unlikely. This is bad news for the "common sense" view of the world - maybe the future can affect the present after all!

  2. Aidey is absolutely right about the effect of the range of possible values on the variance of the distribution. I have begun to revise the assumed range of possible values to see what effect it would have on the resulting DR% values that Bem generated. The narrower the possible range of values, the smaller the variance, and the smaller the average DR%. I would be interested to see how many words the participants in Bem's study actually recalled to see what assumed values for my randomized approach would be valid. I could then rerun my randomized analysis to see how it compares to Bem's results when the bounds are more accurate.

    Thanks Aidey! I appreciate your insight. I'm a little embarassed that I didn't think about the fact that my assumptions would impact the variance as significantly as it did.

  3. You could probably get something more realistic than your uniform random from 0 to 24 by using a random binomial instead: in other words, assume that they have a fixed probability p of recalling each of the 24 words in each case. This assumes the recollections are independent, which is a bit unrealistic of course, but not as unrealistic as the uniform distribution. It's probably also easier to assume that all the subjects have the same p, but you could also code some variation in there if you wanted to.

    Excel doesn't have a random binomial function, apparently, but you can make a list of 24 random numbers and count how many of them are less than p easily enough.