Student Programming Plagiarism Dectection Using Moss

Cheating in student programming tasks manifests in various forms; sharing the same piece of code is one such way. While there are a number of ways of detecting source code similarity, using Moss (Measure of Software Similarity) [1]—a plagiarism detection SaaS system—is one potentially viable and effective method of doing so. It is “fast, easy to use, and free” [1, 2].

Moss presently supports 25 programming languages—“c”, “cc”, “java”, “ml”, “pascal”, “ada”, “lisp”, “scheme”, “haskell”, “fortran”, “ascii”, “vhdl”, “perl”, “matlab”, “python”, “mips”, “prolog”, “spice”, “vb”, “csharp”, “modula2”, “a8086”, “javascript”, “plsql”, “verilog” [3].

Using Moss

Setup
Setting up Moss is a two-step process:

1) Registration

  • A valid Moss account is required
  • Registration is email-based—please see detailed instructions on project page [1]

2) Submission script

  • Once registration request has been completed, an email will be sent through with most current submission script
  • The most current submission script is also accessible here [3]

Running Moss
Submission request are sent to the central Moss server using Moss client submission scripts.

  • The official Moss submission script is a simple Perl script
    • The only requirement—when using the official submission script—is to have Perl installed on the client machine
    • Executing the client Perl script is relatively easy; you basically execute it with appropriate/desired parameters
  • The submission process is a three-phase process; once client Perl script is executed
    • The payload is checked and uploaded to the central server
    • Once all the files are uploaded, a session query is submitted
    • The central server returns the results in form of a URL string (e.g. http://moss.stanford.edu/results/XXXXXXXX)

Submission Script Usage

usage: moss [-x] [-l language] [-d] [-b basefile1] ... [-b basefilen] [-m #] [-c "string"] file1 file2 file3

 

  •  -b Used to specify reference/base source code file; base source code would normally contain code snippets commons to source code being compared, and so is ignored by Moss
  • -c Used to specify a comment string when submitting request
  • -d Used to tell Moss that source code being compared is in directories; supports wildcards
  • -x Enables you to use experimental server in preference to production server
  • -l Used to specify the programming language used in the source code to be processed
  • -m Specifies maximum number of occurances of code before Moss ingores it—the default is 10
  • -n Used to specify the total number of file recordsets to show on results page—the default is 250

Important Notes

  • The results are only available for 14 days
  • A query could of course be re-issued in need be

Case Study: First Year Python Programming Course

I am Teaching Assistant for a first year Python Programming course; one of my assigned task is to detect plagiarism by running student programming assignment submissions past Moss. The submissions are done via an institution-wide Learning Management System—Vula [4], a Sakai instance [5]. Incidentally, the programming assignment submissions are automatically marked by the ‘Automatic Marker’ [6, 7]—a robust tool integrated with Sakai.

  • An ZIP archive containing all student submissions for a specific tutorial is downloaded from Vula
  • Each of the 77 student submissions are contained in directories corresponding to student student IDs

Extracting and Preprocessing the Payload

  • The extracted submissions are preprocessed to ensure that the code in located at the same directory level since I run Moss with the ‘-d’ parameter
  • Remove resource forks [8]
  • There are normally cases where ‘some’ students will place code into nested directories; this process ensures that nested code is moved to root student directory

Sample Directory Structure Mismatch

./xxxxxx001
./xxxxxx001/match_mod.py
./xxxxxx001/mongoose_mod.py
./xxxxxx001/fixed_mod.py
./xxxxxx001/sum_mod.py
./yyyyyy001
./yyyyyy001/New folder (4)
./yyyyyy001/New folder (4)/Assignment6.pdf
./yyyyyy001/New folder (4)/mongoose_mod.py
./yyyyyy001/New folder (4)/test_mongoose_mod.py
./yyyyyy001/New folder (4)/test_equals.py
./yyyyyy001/New folder (4)/broken_mod.py
./yyyyyy001/New folder (4)/test_match.py
./yyyyyy001/New folder (4)/test_fixed_mod.py
./yyyyyy001/New folder (4)/test_sum_mod.py
./yyyyyy001/New folder (4)/test_query.py

Bash Script for Cleaning Payload

#!/usr/bin/bash
# clean up resource folks
for tmp in `find . -iname "__MACOSX"`; do echo "Removing... "$tmp; done
for tmp in `find . -iname "__MACOSX"`; do rm -rf $tmp; done
# move source code to root directories
IFS=$'\n'
for student in `ls`; do for files in $student/*; do if [ -d "$files" ]; then for subfiles in `find "$files"`; do if [ -f $subfiles ]; then mv "$subfiles" $student/; fi; done; rm -rf $files; fi; done; done

Running Moss submission script

  • This is the easy part for me, I just run the submission script with appropriate parameters and pipe output to a log file
    ubuntu@ip-XXX-XX-XX-XXX:~$ perl -l python -d /home/ubuntu/Projects/work/2015/uct-csc1010h/tutorials/3/raw/*/*py > csc1010h-tutorial3-moss_results.log

Sample case with highest detection

ubuntu@ip-XXX-XX-XX-XXX:~$ perl moss.pl -l python -d /home/ubuntu/Projects/work/2015/uct-csc1010h/tutorials/3/raw/*/*py
Checking files . . .
OK
:
Uploading /home/ubuntu/Projects/work/2015/uct-csc1010h/tutorials/3/raw/xxxxxx001/TwentyFour.py ...done.
Uploading /home/ubuntu/Projects/work/2015/uct-csc1010h/tutorials/3/raw/xxxxxx001/VehicleClassifier.py ...done.
Query submitted. Waiting for the server's response.
http://moss.stanford.edu/results/XXXXXXXX
ubuntu@ip-XXX-XX-XX-XXX:~$
  • Accessing the results URL yield the sample output below
Moss Sample Initial Result Page

Moss Sample Initial Result Page

  • Drilling down to suspecious cases
Moss Sample One-on-One Comparison Page

Moss Sample One-on-One Comparison Page

Musings

Effectiveness
While I personally do not have a frame of reference to compare Moss with, when used appropriately, it certainly does the job. I especially find useful that it picks up common student tricks such as changing variable names.

Efficiency

  • Processing times
    • With a total of 77 students and an average number of 3 Python scripts for each assignment
      • It takes an average of 915.25 msec to submit
      • It takes an average of 37.17 sec to process the query and get results
      • Of course other variables, such as programme size—lines of code—probably influence processing times

Miscellaneous

  • Timeout issues behind proxy
    • Requests sent from within my university network timeout; as a workaround, I issue my requests from an EC2 instance
    • I have also experimented with re-running the script once it terminates; I have found that re-running it an average of five times yields desirable results

Summary
Overall, my experience using Moss has been good. Using is is certainly relatively easy. I am yet to experiment with other implementations of submission scripts—there are presently Java [9] and PHP implementations [10].

Bibliography

[1] http://theory.stanford.edu/~aiken/moss
[2] http://www3.nd.edu/~kwb/nsf-ufe/1110.pdf
[3] http://moss.stanford.edu/general/scripts/mossnet
[4] https://vula.uct.ac.za
[5] http://dl.cs.uct.ac.za/projects/automark
[6] http://pubs.cs.uct.ac.za:1081/archive/00000465
[7] https://sakaiproject.org
[8] http://superuser.com/q/104500
[9] https://github.com/nordicway/moji
[10] https://github.com/Phhere/Moss-PHP

  • Robbin Mchinzi

    I have found this site to be very very helpful i will get in touch with you for more help