Programming Language Classifier

An example of how to use CreateML and Xcode 10 to train a CoreML model that is used by the Natural Language framework to classify the programming language of source code.

let code = """
struct Plane: Codable {
    var manufacturer: String
    var model: String
    var seats: Int
}
"""

let url = Bundle.main.url(forResource: "Classifier",
                          withExtension: "mlmodelc")!
let model = try! NLModel(contentsOf: url)
model.predictedLabel(for: code) // Swift

Requirements

  • macOS Mojave Beta
  • Xcode 10 Beta

These are available for Apple Developer account members to download
at https://developer.apple.com/download/

Usage

This project includes a pre-trained programming language classifier model.
To see it in action, open Classifier Demo.playground,
run the playground with the Assistant editor showing the Live View,
and then drag and drop a source code file.
The model will predict the language of the file based on its contents.

Screenshot of Classifier Example

Training Instructions

  • Clone and setup the repository by running the following commands:
$ git clone https://github.com/flight-school/Programming-Language-Classifier.git`
$ cd Programming-Language-Classifier
$ git submodule update --init
  • Open Trainer.swift in an editor and fill in the placeholder values
    for destinationPath and corpusPath:
$ open ./Trainer.swift
  • Run Trainer.swift and wait for the model to be trained
    (on a 2017 MacBook Pro, this took a few minutes):
$ swift ./Trainer.swift
  • Compile the generated .mlmodel bundle using the following command:
$ xcrun coremlc compile path/to/ProgrammingLanguageClassifier.mlmodel .
  • Move the compiled .mlmodelc bundle into the Resources directory
    of Classifier Demo.playground, replacing any existing resource.

GitHub