Due to the nature of the data this project used, generic visuals will be utilized. This project consisted of a series of four Python scripts that were custom designed to read millions of lines of data, clean the data, measure particular metrics related to password strength, and create visualizations related to comparisons. Four different scripts were used in order to make error-catching and trial runs faster.
This project was challenging, and was completed over the course of my Spring 2025 semester, my 2025 summer break, and my Fall 2025 semester. A full rewrite was required in Fall due to a restriction involving data collection that was unknown to the class in Spring, but I quickly adapted and created something I am very proud of.
The hardest part of this project was the very first requirement: create something that hasn't been done before. From the beginning, I was interested in a project related to cryptography, and I eventually landed on password strength. Separate studies had been done on mnemonics in password, but they had not measured any effect they had on general password strength.
Once the concept of my project was solidified, the passion came easy. I chose my degree focus of Information Systems because I am deeply interested in the side of Computer Science that mixes the equipment with a human element.
Passwords are a sore spot for many older members of my family, and admittedly sometimes myself. I realized that while there are systems that lessen the need for intense password security, many systems could still benefit from a method that created strong, memorable passwords without needing to use risky password managers.
As we neared our final hurdle before earning our degree, our professors reminded us that our capstone project needed to contribute something to our field, even if it was in a minor way.
This project helped me to understand the importance of professional research. My findings are only a small stepping stone for the field. However, the results made me feel proud and as if I had truly contributed to the field of computer science.
The first script I wrote was the most challenging. Through many iterations, I was able to create a script that accepted only passwords utilizing ASCII, excluded forbidden characters, and limited chosen passwords to 64-character strings.
Although many might find this process tedious, I enjoyed the incremental improvements to the cleaned dataset, and eventually landed on 30 million usable passwords.
A custom list of just over 200 password mutators was utilized in conjunction with the cleaned passwords to create mutated passwords. The mutators were based off sayings and statistics that might be memorable to the user, such as 17% of women becoming "17%oW" or #1 album on spotify becoming "#1aos."
Mutators were randomly capitalized and appended to the beginning, middle, or end of each password at random with one of three appendage characters (think hyphens or underscores). This created a set of data that could be compared against the original data to determine the increase in strength when adding simple mnemonic additions to existing passwords.
Several metrics were recorded from both the cleaned dataset and the mutated dataset. These metrics included average entropy, which measured the average Shannon Entropy score of each dataset. Average complexity was also recorded, which compared character types (lowercase, uppercase, digit, and symbol) found in passwords on average in each dataset.
Two other comparisons that were made included average password length and average character distribution (how often a character type appears in the password). Length and character distribution contribute heavily to Shannon Entropy, so appending mnemonic mutations to each password helped improve their scores immensely.
Finally, the comparisons made in the previous script were compiled into visuals that helped the audience to understand the significance of the changes made by the mnemonic mutations. Python is an excellent data visualization tool and turned this step into a very easy process.
Due to ethical concerns, all password data needed to be deleted after completion of the project, but the audience was pleased with the visualizations provided.
Approximately 30 million lines of sanitized passwords were utilized after cleaning the main sets. The vast majority of these passwords showed a clear gap between passwords that were secure and the passwords being created by the average user.
Strength was measured using Shannon Entropy, and the initial passwords had very low entropy scores on average.
After applying mnemonic mutators, password entropy scores improved dramatically. Many measurements for important metrics doubled and passwords became far harder to simply brute force.
This project scratched the surface of what may lead to interesting results. However, several limitations imposed on us as students hindered higher quality results, and I believe there are several areas could be improved through further research.
Qualitative data from surveys, the ability to test the mnemonic passwords against modern security tools, and fully-developed mnemonic systems could drastically change how the results of this project are interpreted.
Because the password data needed to be wiped after academic use, my project felt much lighter before presenting. However, the impact I made during my presentation packed the expected punch.
My professors were pleased, and my project earned an A grade. Much like my other academic projects, I feel that the value of learning to independently research on a professional level, critically examining my own work, and contributing to the world of science was far greater than the value of the grade.