RISING STAR ! The ultimate source to ace your NYPD Sergeant, Lieutenant, and Captain Exam Visit www.RisingStarPromotion.com to subscribe to our mailing list and get info on the next Sgt, Lt. or Captain Exam!

Members Login
Username 
 
Password 
    Remember Me  
Post Info TOPIC: Statistics 101


Newbie

Status: Offline
Posts: 1
Date:
Statistics 101


There is a fundamental flaw in the scaling of the exam that any B student who took a statistics class in high school should be able to identify. I will use plainspeak and not terms like ANOVA so that this all makes sense.

DCAS is attempting to compare 2 populations (original and makeup) and find the mean score of each population, then scale up the scores of the population that had the lower mean score to match that of the population that had the higher mean score. The theory is that the mean score directly indicates the difficulty of the test. The first problem here is that DCAS assumes that both populations are capable of achieving the exact same mean score if they sat for the exact same test on the exact same day.

Case in point: on the original test day, what was the mean score of those who took their test in Queens compared to those who took their test in Brooklyn? For DCASs theory to work, both locations would have to have the exact same mean score with similar standard deviations based on how many took the test at each site. If Queens scores 2 points on average higher than Brooklyn, was the Brooklyn test harder? Of course not, the variation is in the random capabilities of individuals who made up that group since they took the exact same test under similar conditions.

Switch gears to comparing the original test takers to the makeups. How can DCAS compare a sample of say 800 original test day takers to a smaller group of say 25 test takers? That smaller group CANNOT be used as a reference because their smaller population will result in skewed results based on random variations in their population which will result in larger affects on their mean score. Do you think it is a coincidence that all 3 Sgt, Lt, and Captain exams that were scaled ended in the makeup having the higher mean score which scaled up the original test scores?

The idea behind what DCAS is doing is nice, but it is impossible to be done accurately and should therefore not be done that way.



__________________


Senior Member

Status: Offline
Posts: 143
Date:

Nerd

__________________


Veteran Member

Status: Offline
Posts: 58
Date:

RS2323 wrote:

Nerd

 

 

 

hillarious,

read 3 lines and was like why am I doing this

 



__________________


Veteran Member

Status: Offline
Posts: 47
Date:

I get what you are saying. However, the scaling methodology hasn't been completely clarified for you to make your assumptions. I don't know if we will ever find out the actual scaling methodology used to compare the original exam and the makeup exams. I am sure with your statistics 101 background; you will be able to figure out the fraction or whole number used to scale the two exams. I do believe that they hired an outside firm for this purpose. I am hoping the outside firm is a "professional company" who will take these variables into consideration. 

On a different note, why don't you suggest a better method to compare the original exam and the makeup exam. I think offering two different exams solves more problems that it creates. 



__________________


Senior Member

Status: Offline
Posts: 305
Date:

www.ets.org/Media/Research/pdf/RD_Connections16.pdf

The technique of scaling is used on all standardized tests. Its not just dcas making something up.

__________________


Guru

Status: Offline
Posts: 625
Date:

KingOfTheNorth wrote:

There is a fundamental flaw in the scaling of the exam that any B student who took a statistics class in high school should be able to identify. I will use plainspeak and not terms like ANOVA so that this all makes sense.

DCAS is attempting to compare 2 populations (original and makeup) and find the mean score of each population, then scale up the scores of the population that had the lower mean score to match that of the population that had the higher mean score. The theory is that the mean score directly indicates the difficulty of the test. The first problem here is that DCAS assumes that both populations are capable of achieving the exact same mean score if they sat for the exact same test on the exact same day.

Case in point: on the original test day, what was the mean score of those who took their test in Queens compared to those who took their test in Brooklyn? For DCASs theory to work, both locations would have to have the exact same mean score with similar standard deviations based on how many took the test at each site. If Queens scores 2 points on average higher than Brooklyn, was the Brooklyn test harder? Of course not, the variation is in the random capabilities of individuals who made up that group since they took the exact same test under similar conditions.

Switch gears to comparing the original test takers to the makeups. How can DCAS compare a sample of say 800 original test day takers to a smaller group of say 25 test takers? That smaller group CANNOT be used as a reference because their smaller population will result in skewed results based on random variations in their population which will result in larger affects on their mean score. Do you think it is a coincidence that all 3 Sgt, Lt, and Captain exams that were scaled ended in the makeup having the higher mean score which scaled up the original test scores?

The idea behind what DCAS is doing is nice, but it is impossible to be done accurately and should therefore not be done that way.


 I totally agree and have been saying this all along. But nutjobs like centurion dont want to face reality....so sad...



__________________


Newbie

Status: Offline
Posts: 1
Date:

TransitJoe wrote:

www.ets.org/Media/Research/pdf/RD_Connections16.pdf

The technique of scaling is used on all standardized tests. Its not just dcas making something up.


 I think that KingOfTheNorth still has a point. I read this link that TransitJoe posted and its saying stuff about how they compare difficulty by having "anchor" questions on both tests.   but king is saying how can you get an accurate idea of how hard the makeup test is if only 25 people took it?    Like 25 people is so small compared to how many took the original test that it can warp the results of even these anchor questions they use. I think if 500 people took the makeup and 800 took the original test then it would give a good cross section of people to test these anchor questions on.   but by using only 25 people they could by chance be 25 smarter people who did better on the anchor questions. Like how the heck does the sergeant test get scaled by what seems to be 9 full points! was the makeup test that much easier? that's insane and hard to believe! Also the makeup people had what this RD company calls exposure so propbably already had an idea where the test writers were going so even though they may not have had knowledge of the actual questions since the makeup was different questions except for these anchor questions. they may have had a feel of what areas to focus on during the extra time they had to study.

 

The solution is to evenly break up all the people who signed up for the exam into 3 different test dates so each test has the same number of people taking it and will include dates that work for makeups. You can't fricken compare 25 to 800. King is right!



__________________
Page 1 of 1  sorted by
 
Quick Reply

Please log in to post quick replies.

Tweet this page Post to Digg Post to Del.icio.us