Cheating in student programming tasks manifests in various forms; sharing the same piece of code is one such way. While there are a number of ways of detecting source code similarity, using Moss (Measure of Software Similarity) [1]—a plagiarism detection SaaS system—is one potentially viable and effective method of doing so. It is “fast, easy to use, and free” [1, 2].
Moss presently supports 25 programming languages—“c”, “cc”, “java”, “ml”, “pascal”, “ada”, “lisp”, “scheme”, “haskell”, “fortran”, “ascii”, “vhdl”, “perl”, “matlab”, “python”, “mips”, “prolog”, “spice”, “vb”, “csharp”, “modula2”, “a8086”, “javascript”, “plsql”, “verilog” [3].
Using Moss
Setup
Setting up Moss is a two-step process:
1) Registration
- A valid Moss account is required
- Registration is email-based—please see detailed instructions on project page [1]
2) Submission script
- Once registration request has been completed, an email will be sent through with most current submission script
- The most current submission script is also accessible here [3]
Running Moss
Submission request are sent to the central Moss server using Moss client submission scripts.
- The official Moss submission script is a simple Perl script
- The only requirement—when using the official submission script—is to have Perl installed on the client machine
- Executing the client Perl script is relatively easy; you basically execute it with appropriate/desired parameters
- The submission process is a three-phase process; once client Perl script is executed
- The payload is checked and uploaded to the central server
- Once all the files are uploaded, a session query is submitted
- The central server returns the results in form of a URL string (e.g. http://moss.stanford.edu/results/XXXXXXXX)
Submission Script Usage
usage: moss [-x] [-l language] [-d] [-b basefile1] ... [-b basefilen] [-m #] [-c "string"] file1 file2 file3
- -b Used to specify reference/base source code file; base source code would normally contain code snippets commons to source code being compared, and so is ignored by Moss
- -c Used to specify a comment string when submitting request
- -d Used to tell Moss that source code being compared is in directories; supports wildcards
- -x Enables you to use experimental server in preference to production server
- -l Used to specify the programming language used in the source code to be processed
- -m Specifies maximum number of occurances of code before Moss ingores it—the default is 10
- -n Used to specify the total number of file recordsets to show on results page—the default is 250
Important Notes
- The results are only available for 14 days
- A query could of course be re-issued in need be
Case Study: First Year Python Programming Course
I am Teaching Assistant for a first year Python Programming course; one of my assigned task is to detect plagiarism by running student programming assignment submissions past Moss. The submissions are done via an institution-wide Learning Management System—Vula [4], a Sakai instance [5]. Incidentally, the programming assignment submissions are automatically marked by the ‘Automatic Marker’ [6, 7]—a robust tool integrated with Sakai.
- An ZIP archive containing all student submissions for a specific tutorial is downloaded from Vula
- Each of the 77 student submissions are contained in directories corresponding to student student IDs
Extracting and Preprocessing the Payload
- The extracted submissions are preprocessed to ensure that the code in located at the same directory level since I run Moss with the ‘-d’ parameter
- Remove resource forks [8]
- There are normally cases where ‘some’ students will place code into nested directories; this process ensures that nested code is moved to root student directory
Sample Directory Structure Mismatch
./xxxxxx001 ./xxxxxx001/match_mod.py ./xxxxxx001/mongoose_mod.py ./xxxxxx001/fixed_mod.py ./xxxxxx001/sum_mod.py ./yyyyyy001 ./yyyyyy001/New folder (4) ./yyyyyy001/New folder (4)/Assignment6.pdf ./yyyyyy001/New folder (4)/mongoose_mod.py ./yyyyyy001/New folder (4)/test_mongoose_mod.py ./yyyyyy001/New folder (4)/test_equals.py ./yyyyyy001/New folder (4)/broken_mod.py ./yyyyyy001/New folder (4)/test_match.py ./yyyyyy001/New folder (4)/test_fixed_mod.py ./yyyyyy001/New folder (4)/test_sum_mod.py ./yyyyyy001/New folder (4)/test_query.py
Bash Script for Cleaning Payload
#!/usr/bin/bash # clean up resource folks for tmp in `find . -iname "__MACOSX"`; do echo "Removing... "$tmp; done for tmp in `find . -iname "__MACOSX"`; do rm -rf $tmp; done # move source code to root directories IFS=$'\n' for student in `ls`; do for files in $student/*; do if [ -d "$files" ]; then for subfiles in `find "$files"`; do if [ -f $subfiles ]; then mv "$subfiles" $student/; fi; done; rm -rf $files; fi; done; done
Running Moss submission script
- This is the easy part for me, I just run the submission script with appropriate parameters and pipe output to a log file
ubuntu@ip-XXX-XX-XX-XXX:~$ perl -l python -d /home/ubuntu/Projects/work/2015/uct-csc1010h/tutorials/3/raw/*/*py > csc1010h-tutorial3-moss_results.log
Sample case with highest detection
ubuntu@ip-XXX-XX-XX-XXX:~$ perl moss.pl -l python -d /home/ubuntu/Projects/work/2015/uct-csc1010h/tutorials/3/raw/*/*py Checking files . . . OK : Uploading /home/ubuntu/Projects/work/2015/uct-csc1010h/tutorials/3/raw/xxxxxx001/TwentyFour.py ...done. Uploading /home/ubuntu/Projects/work/2015/uct-csc1010h/tutorials/3/raw/xxxxxx001/VehicleClassifier.py ...done. Query submitted. Waiting for the server's response. http://moss.stanford.edu/results/XXXXXXXX ubuntu@ip-XXX-XX-XX-XXX:~$
- Accessing the results URL yield the sample output below

- Drilling down to suspecious cases

Musings
Effectiveness
While I personally do not have a frame of reference to compare Moss with, when used appropriately, it certainly does the job. I especially find useful that it picks up common student tricks such as changing variable names.
Efficiency
- Processing times
- With a total of 77 students and an average number of 3 Python scripts for each assignment
- It takes an average of 915.25 msec to submit
- It takes an average of 37.17 sec to process the query and get results
- Of course other variables, such as programme size—lines of code—probably influence processing times
- With a total of 77 students and an average number of 3 Python scripts for each assignment
Miscellaneous
- Timeout issues behind proxy
- Requests sent from within my university network timeout; as a workaround, I issue my requests from an EC2 instance
- I have also experimented with re-running the script once it terminates; I have found that re-running it an average of five times yields desirable results
Summary
Overall, my experience using Moss has been good. Using is is certainly relatively easy. I am yet to experiment with other implementations of submission scripts—there are presently Java [9] and PHP implementations [10].
As a student, if you don’t want to be accused of plagiarism or if you don’t want to have to resort to plagiarism just to pass your course, you must learn to code on your phone.
Bibliography
[1] http://theory.stanford.edu/~aiken/moss
[2] http://www3.nd.edu/~kwb/nsf-ufe/1110.pdf
[3] http://moss.stanford.edu/general/scripts/mossnet
[4] https://vula.uct.ac.za
[5] http://dl.cs.uct.ac.za/projects/automark
[6] http://pubs.cs.uct.ac.za:1081/archive/00000465
[7] https://sakaiproject.org
[8] http://superuser.com/q/104500
[9] https://github.com/nordicway/moji
[10] https://github.com/Phhere/Moss-PHP