-
Anti Research
In UpdatesSo one of our professors assigned us some group presentations. And there was one topic or argument, and he asked a bunch of students to present or debate or give arguments in favor of it (whatever that was) and then to a bunch of other students ask to prepare presentation with arguments against that thing. Then, he explained how you have to consult sources, like research papers, news articles, and this and that in order to form basis of your arguments. Like you have to first review such material and then you use those sources to back your argument.
This pissed me off so much. I asked what if the topic that I have been asked to present in favor of, is something I am against, and same thing can apply vice versa. He didn’t understand my point and said that you have to use sources and blah blah. But I was so angry and I think I couldn’t control my frustration and that’s why I was unable to articulate my point clearly, and it came out something like, ‘it’s not the way it is. Something either is, or it isn’t. How can one pre-decide if it is or it isn’t.’ Yeah, I know it was very weirdly phrased. My voice this time was slightly loud and had some kind of argumentative tone in it. He replied the same blah blah but at the end he said something like, that’s what research is; to find things out. I knew it wasn’t any use. I said nothing.
Two of my friends later told me that sir didn’t understand the question. And on some level, I think that yes he didn’t understand my question, but not because I articulated it badly (my friends were smart enough to understand me, he wasn’t?), it’s because he had been trained and indoctrinated in that manner.
There is an enormous amount of people who don’t understand what research is. They think research or science or whatever the academia is supposed to do, is to find out things (which I think is correct). But their way of finding things out is so wrong. They think of it like finding the right source to quote, or finding the right data to analyze, or the right econometric model to apply, or finding some other right thing to do, and this kind of finding will result in production of some academic work. Yes, it will produce academic work but not valuable work.
What a true researcher simply needs to find is truth. Some overlooked, un-discovered piece of truth. That is the end. Rest are all means. Why don’t people such distinguished understand a matter that simple. You can’t give conclusions to people, and ask them to research arguments in favor of the conclusion because that exactly is the opposite of research. IT’S ANTI-RESEARCH!!
I don’t know to what extent this applies to academia at other places but atleast here, we are all producing an enormous amount of anti-research work.
Thank you for coming to my Rant-Talk.
-
Undergraduate Performance Analysis
In PostsUndergraduate Performance Analysis Undergraduate Performance Analysis
Tamseel Ahmad
2024-05-01
Introduction
In most Pakistani universities, marks achieved in Higher Secondary education are given a high weightage in merit calculation for admissions in undergraduate programs. In this analysis, I aim to identify the correlation between percentage marks achieved in Higher Secondary Certificate (intermediate or grade 11-12 exams) and cumulative GPA of final-year undergraduate students. More specifically, I am interested to find out if this correlation statistically differs among the gender groups.
Dataset
The dataset being used comprises of grade metrics for undergraduate students of a large public-sector university with diverse student-body in Lahore, Pakistan. The data was self-submitted by students but cross-verified by their respective departments.
Limitations: The data is not collectively exhaustive and students with lower performance were less likely to submit their data than those with higher performance.
Analysis
Data Cleaning
We start our analysis by loading packages and importing data.
library(tidyverse) library(janitor) library(cocor) library(ggExtra) library(cowplot)grades <- read_csv("grades.csv", col_types = "ifffdd") |> clean_names()head(grades)## # A tibble: 6 × 6 ## id gender program_duration year_of_student hsc_percentage cgpa ## <int> <fct> <fct> <fct> <dbl> <dbl> ## 1 1 Female 4 1 91.4 3.56 ## 2 2 Male 4 2 91 3 ## 3 3 Female 4 1 90.6 3.6 ## 4 4 Female 4 3 86.5 3.37 ## 5 5 Female 4 1 69.9 2.48 ## 6 6 Male 4 1 89.1 2.28Let us look into basic descriptive statistics of the data.
summary(grades[,-1])## gender program_duration year_of_student hsc_percentage cgpa ## Female:8286 4:13478 1:4951 Min. :42.45 Min. :0.000 ## Male :7028 5: 1836 2:4577 1st Qu.:77.00 1st Qu.:2.910 ## 3:3186 Median :85.18 Median :3.330 ## 4:2499 Mean :83.37 Mean :3.242 ## 5: 101 3rd Qu.:91.80 3rd Qu.:3.640 ## Max. :99.99 Max. :4.000The dataset includes observations of undergraduate students across different years of study. The ideal indicator for undergraduate performance should be the final GPA of graduating students but since we don’t have that data, we use CGPA for final-year students. Hence, filtering the data to keep only those students who are in their final years:
grades <- grades |> filter(as.character(year_of_student) == as.character(program_duration))Let us now look again at the descriptive stats:
summary(grades[,-1])## gender program_duration year_of_student hsc_percentage cgpa ## Female:1296 4:2209 1: 0 Min. :42.45 Min. :0.740 ## Male :1014 5: 101 2: 0 1st Qu.:74.72 1st Qu.:3.020 ## 3: 0 Median :81.00 Median :3.350 ## 4:2209 Mean :79.46 Mean :3.275 ## 5: 101 3rd Qu.:86.00 3rd Qu.:3.598 ## Max. :98.63 Max. :4.000The minimum value of 0.74 cgpa seems odd. We can get a better overview of such outliers through a scatterplot:
ggplot(grades) + geom_point(aes(x=hsc_percentage, y=cgpa))We can note two observations where cgpa<1 and a few more where cgpa<2. Failure to maintain CGPA of 1.7 results in drop out as per university regulations. Counting such observations:
grades |> filter(cgpa < 1.7) |> nrow()## [1] 6These 6 observations where CGPA < 1.7 are likely to be data entry error. Therefore, removing these observations:
grades <- grades |> filter(cgpa >= 1.7)nrow(grades)## [1] 2304So, our final dataframe consists of 2304 observations, on which we will perform our analysis.
Visualization
Now, that our data is cleaned, we can have a visual inspection. Firstly, we visualize the distribution of the HSC Percentage & CGPA among the two gender groups using box plots:
p1 <- grades |> ggplot(aes(x=gender, y=hsc_percentage, fill=gender)) + geom_boxplot()+ labs(title = "HSC Percentage") p2 <- grades |> ggplot(aes(x=gender, y=cgpa, fill=gender)) + geom_boxplot()+ labs(title = "CGPA") style <- theme_bw() + theme(legend.position = "none", axis.title.x = element_blank(), axis.title.y = element_blank()) p3 <- plot_grid(p1+style ,p2+style) print(p3)So that is how our data looks like.
…
…
…Okay, I admit. Box plots are overrated.
Since both variables of our interest are continuous, density plots will be more appropriate. Instead of plain density plots, we use Marginal Density Plots above a scatterplot to inspect the relationship between the two variables as well.
p4 <- grades |> ggplot(aes(x = hsc_percentage, y = cgpa, color = gender)) + geom_point(size=0.8) + labs(x = "HSC Percentage", y = "CGPA") + theme_bw() + theme(legend.position = "bottom", legend.title = element_blank()) p5 <- ggMarginal(p4, type = "density", groupColour = TRUE, groupFill = TRUE) print(p5)From the visualization, we can note that:
- a weak/moderate positive relationship exists between CGPA and HSC Percentage
- median values for both CGPA and HSC Percentage are higher in females than in males
- distributions are not perfectly normal but negatively skewed1
Hypothesis Testing
The specific goal of this analysis is to identify whether the correlation between HSC Percentage and CGPA is statistically different among the two gender groups: male and female.
So, we will subset our dataframe based on the gender groups:
df_female <- subset(grades, gender == "Female") df_male <- subset(grades, gender == "Male")and then compute the mentioned correlations separately among the both groups:
cor_female <- cor(df_female$hsc_percentage, df_female$cgpa) cor_male <- cor(df_male$hsc_percentage, df_male$cgpa) n_female <- nrow(df_female) n_male <- nrow(df_male)print(cor_female)## [1] 0.2956779print(cor_male)## [1] 0.2269808Now, we want to test if cor_female is statistically different from cor_male.
Null Hypothesis : cor_female = cor_male
Alternative Hypothesis : cor_female ≠ cor_male
To compare these two correlations, we first need to stabilize their variance through Fisher’s Transformation2. It will convert the correlations to their respective z-values, for which we can compute the test-statistic. The resulting t-statistic (or its p-value) will tell us whether Null Hypothesis will be rejected or retained. To perform these calculations, we will use cocor3 package in R. Since our computed correlations are of two independent groups, we use cocor.indep.groups() function to perform the calculations. Using it we get following results:
result <- cocor.indep.groups(cor_female, cor_male, n_female, n_male) print(result)## ## Results of a comparison of two correlations based on independent groups ## ## Comparison between r1.jk = 0.2957 and r2.hm = 0.227 ## Difference: r1.jk - r2.hm = 0.0687 ## Group sizes: n1 = 1295, n2 = 1009 ## Null hypothesis: r1.jk is equal to r2.hm ## Alternative hypothesis: r1.jk is not equal to r2.hm (two-sided) ## Alpha: 0.05 ## ## fisher1925: Fisher's z (1925) ## z = 1.7545, p-value = 0.0793 ## Null hypothesis retained ## ## zou2007: Zou's (2007) confidence interval ## 95% confidence interval for r1.jk - r2.hm: -0.0080 0.1456 ## Null hypothesis retained (Interval includes 0)Interestingly, we have obtained a p-value of 0.079 from which we can conclude:
At 5% significance level, null hypothesis is retained that there is no statistical difference between the two correlations.
However, at 10% significance level, null hypothesis is rejected.
Since the data in use has some limitations, a stricter control on Type-1 Error4 should be maintained. Hence, I prefer the smaller significance level and proceed with the conclusion obtained at 5% i.e. there is no statistical difference between the correlation of HSC Percentage and CGPA among the gender groups.
Takeaways
Correlation across gender groups
Based on the analysis, we can conclude that using marks obtained in Higher Secondary Certificate as a criteria for selection in undergraduate programs does not create an implicit gender bias in the selection process because HSC Percentage is an equally likely predictor of CGPA for both gender groups.
Weak Correlation
The Pearson’s correlation coefficient of 0.227 or 0.296 does not indicate a very strong relationship between HSC Percentage and CGPA. While a significant evidence for the implicit gender bias was not found, this weak correlation raises concern regarding high weightage of HSC marks in undergraduate admission criteria. Further research needs to be performed regarding the predictors of success in undergraduate studies.
High Variability of CGPA at low HSC Percentage
The scatterplot plotted early on showed us that variation in CGPA Percentage was not constant. Let us have a look at it again:
p6 <- grades |> ggplot(aes(x = hsc_percentage, y = cgpa, color = gender)) + geom_point(size = 0.9) + geom_smooth(method = "loess", se = TRUE, fullrange = TRUE) + labs(x = "HSC Percentage", y = "CGPA") + theme_bw() + theme(legend.position = "bottom", legend.title = element_blank()) print(p6)Although there is high variability in CGPA across all values of HSC Percentage, we can note that the variability in CGPA is substantially higher when HSC Percentage is below 70%. Statistically, this hints at the presence of heteroscedasticity in data. On the other hand, it also sheds light on the conditional probability of achieving a high or low CGPA given the HSC Percentage.
For students with high HSC marks, the conditional probability of achieving a high CGPA is higher. However, for students with low HSC marks, the conditional probability of achieving a high or low CGPA is nearly equal.
As the data limitations were noted earlier, this observation could be attributed to self-selection bias, where students with low CGPA (who may have had low HSC Percentage) were less likely to submit their data. Consequently, due to missing data points, the positive correlation may no longer hold for HSC Percentages lower than 70%. Nonetheless, this phenomenon needs to be further investigated for reliable conclusions to be drawn.
Further Research
This analysis highlights following areas for further research:
Finding appropriate predictors of undergraduate performance in context of Pakistan’s educational system.
Replicating same research with a different and more reliable dataset. Non-rejection of null hypothesis at 10% significance level means that the possibility of an implicit gender bias cannot be confidently ruled out and thus there is need for further investigation.
Replicating the same research using a different statistical procedure that accounts for the non-normality of the distributions
Testing whether the high variability of CGPA at low HSC Percentage is attributable to the data limitation or does it represent an actually existing pattern.
Analyzing differences in performance across provinces to which the students belong.
___
Thanks for reading : )
Non-normal distribution can affect the validity of results of test-statistics. However, since the data is not substantially non-normal, it does not pose a significant threat to our analysis.↩︎
Fisher, R.A. (1925). Statistical Methods for Research Workers. Oliver and Boyd (Edinburgh).↩︎
Diedenhofen, B. & Musch, J. (2015). cocor: A Comprehensive Solution for the Statistical Comparison of Correlations. PLOS ONE 10(4): e0121945.↩︎
Type-1 Error: Rejecting a null hypothesis when it is true.↩︎
-
Mimicker – Short Story
In Posts
He hated his job but for some compelling reason, he had continued doing it. For as long as he could remember, that’s what he used to do. Living inside this small room, staying alert for when he is called to perform the job in front of the device hanging on the wall. But today, he was looking at this device with rebellious eyes. This strange monitor-like device always had on display the visuals of another room identical to his. He had been called. In three minutes, his client was going to come on display and he was supposed to perform the job, which would span over a minute. Apparently, he was prepared. But inside his mind, a fight was going on. Sleepless nights spent questioning his life had tired him. He had decided. Despite the disastrous consequences, he would not perform his job today. In fact, disaster was what he was wishing for. It was much better than the dull monotony of mimicry. So, instead of simply not showing up, he would show up and ruin everything. Instead of standing in front of the device and mimicking the actions of his client, he would do just the opposite. Chaos would erupt in his client-world when they find out that they had been misidentifying science and fiction. For the first time in his life, he felt the sensation of excitement. It was time. The client appeared. He stood still in front of the device and the client was standing still as well. Suddenly, he slapped himself on his face and so did the client. What was happening? He felt disappointed, but more than that, he was confused. He started making funny faces and the client did the same. He ran away from the device and so did the client. The job was successfully completed. In the corner of his room, he stood perplexed, not being able to understand who mimics who?
-
Your mind has been hacked…
In Posts
Imagine yourself sitting at your desk in front of your computer, as the purple lightning bolts outside the window create a sparkling contrast with the midnight dark sky, such that for a moment, you pause your search for the unattainable hidden truth you were trying to find, and then resume it as the monotony of dark-blue returns. As you continue your work on the computer, suddenly your screen goes blank, and then some letters appear typed in green:
“Your computer has been hacked!”
Planet Earth stops spinning for a moment. This can’t be. Chills run down your spine. You simply can’t accept the fact that this once-upon-a-time sci-fi machine that you loved so dearly, no longer obeys you; it is now possessed by the dark forces.
Yet, yet, yet… you just can’t give up so easily. In just nanoseconds, you have decided what to do next. You cut the internet cable. Freeze your bank account. Change all your account passwords. Wipe your computer’s hard drive. Check if the BIOS has been corrupted. Scan all other devices for malware. Lodge a report.
You might not be able to sleep for nights after this, but believe me this is _not_ the worst case of security breach. You were lucky because the hackers were lenient enough to tell you.
Imagine scenario # 2, a malicious software has hacked your computer but you don’t know it. Not just that all of your activity is being recorded, but they also have full control over your computer. Your social media, emails, microphone, everything. They have access to all your data, and can use their access to make you take specific actions. You continue using your computer without having the foggiest of notion about getting hacked, and yet anytime, the hackers can use your computer to do anything landing you in a serious trouble.
This, my dear friend, is the true horror story when you don’t even know that you are part of a horror story. But anyways, these were just imaginary stories which I just made up.
Except that these weren’t. I was definitely not writing that merely for the sake of it. I had an intention. I wanted to tell you something for which I was preparing you. I wanted to tell you that your mind has been hacked!
No, no, no, I am not talking about those microchips they inserted you via corona vaccines. They really don’t need that to control your mind. Then, what do I mean?
I mean this: You were born with a genetic code that no one else in the universe shares with you (except in the case of twins, but that also doesn’t matter) and then, if you are 20 years old, that means you have experienced 631,139,040 seconds of unique existence. Your sense of experiencing the world is extremely unique. Even if the universe happens to repeat a million of times, the probability of emergence of the unique person that you currently are is approximately equal to zero.
Yet, what is baffling is that contrary to these unique perceptions that you hold, you seem to be thinking just what everyone else is thinking. You represent what you have been told, so you can easily be replaced with someone else who has been told the same thing. Then, what is the point of your unique existence?
I just don’t get it. Something’s wrong here. The math doesn’t add up. How on Earth is it possible that every human is so unique, yet many of them hold the same surface-level thoughts. There certainly is something wrong here. And so, at exactly 9:29 pm, I state my conclusion:
Your mind has been hacked!
Somewhere inside you still lies the person who is uniquely you, but you have lost it. You have lost control of your mind and your thinking process. Because you have been indoctrinated the way you are supposed to think and the way you are supposed to behave.
I don’t want to put up any ideas for who is responsible for it as, actually, I literally have no idea who it is, and I certainly don’t want to sound like a conspiracy theorist. However, one thing that I am sure is that this hacking case is scenario # 2. You don’t know that your mind has been hacked, and your thoughts are being manipulated without you ever sensing it.
What my intention is by writing this piece is to convert this case 2 scenario to case 1 scenario. Now that you know that your mind has been hacked, you can take the necessary measures. Turn off the computer. Disconnect it from the internet. Thoroughly check the data if it has been infected. Look for the viruses, understand how they work. Reverse engineer to build a coping mechanism. Test it on small scale. Fail, adapt, repeat, until you have built the immunity to survive in this wild crazy world without giving up on the person that you actually are. Best of luck!