Anything in here will be replaced on browsers that support the canvas element







What is Alligator?

Alligator is a free (libre) software for data classification.
Alligator stands for AnaLyzing maLware wIth partitioninG and probAbiliTy-based algORithms.




How does it work?



Alligator implements two phases: (i) a learning/training phase and (ii) a guessing/classification phase.

(i) In the learning/training phase, Alligator automatically selects the best combination of weights to apply to clustering algorithms, in order to better identify data of group 1 and 2. combination is computed from data already classified in two groups. Selected algorithms and weights are described in a file, written in an ad hoc Alligator script language.

(ii) The guessing/classification phase applies the script generated during the learning phase onto the list of samples to partition (called "guess cluster"). Alligator runs each clustering algorithms indicated in the script, with the appropriate weights, and outputs for each sample two scores: a score of resemblance to group 1, and a score of resemblance to group 2.


Is Alligator efficient?

We have applied Alligator to various data types, e.g., the classification of Android applications, and the classification of images (male/female, make up/non make up, etc.), including publicly available datsets. On all the datasets we've tried Alligator with, Alligator always performed at least equal, and most of the time, much better.



Documentation

Manuals

Presentations and papers

  • Ludovic Apvrille, Axelle Apvrille, "Pre-filtering Mobile Malware with Heuristic Techniques", Proceedings of GreHack'2013, Grenoble, Nov. 2013. paper bibtex slides

  • Ludovic Apvrille, Detecting Mobile Malware with Classification Techniques, Labex Sophia@UCN security day, Dec. 18th, 2013.

  • Axelle Apvrille, Ludovic Apvrille, "SherlockDroid, an Inspector for Android Marketplaces", Proceedings of 10th edition of Hack.lu 2014, Luxembourg, Oct. 2014. paper bibtex slides.

  • Ludovic Apvrille, Pitch and Poster presented at Bourse aux technologies - Big Data, Rennes, April 2015.

  • Neslihan Kose, Ludovic Apvrille, Jean-Luc Dugelay, "Facial Makeup Detection Technique Based on Texture and Shape Analysis", Proceedings of the The Eleventh IEEE International Conference on Automatic Face and Gesture Recognition (FG 2015), May 2015, Slovenia. paper slides

  • A. Apvrille, L. Apvrille, "SherlockDroid: a research assistant to spot unknown malware in Android marketplaces", Journal of Computer Virology and Hacking Techniques, vol. 11, No. 39, pages 1-11, pub. Springer, July 2015. paper bibtex online

  • Ludovic Apvrille, Axelle Apvrille, "Identifying Unknown Android Malware with Feature Extractions and Classification Techniques", The 14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom-15), Helsinki, Finland, 20-22 August, 2015. paper bibtex slides

  • Ludovic Apvrille, "Identifying Unknown Android Malware with Feature Extractions and Classification Techniques", Seminar at SAP Research, Nov. 2015. slides

  • Ludovic Apvrille, Lutte automatisée contre les virus au royaume d’Android (Automated fight against android malware), blogpost of Institut Mines-Telecom, April 2016. (In French).

Success stories



Download and installation

Alligator is distributed under the CeCILL license with which you have to agree before using Alligator.

Once you have carefully read the CeCILL license, click on the following picture to download the latest release of Alligator (Current release: 1.0 of Oct. 16th, 2014):


Once downloaded, simply uncompress the archive. A README file should help you to start. In particular, an easy-to-use example is provided on both the learning and the guessing phases.

FAQ

  • Can Alligator be used for something else than computer viruses? Yes.
  • Alligator does not know anything about malware, viruses. Alligator can be used to partition elements between any distinct set of elements. For instance, if we divide the world between smart people and lame ones, Alligator could help us decide whether person p is smart or lame (provided there is no intersection between the smart and the lame cluster).

  • Does Alligator have access to mobile virus samples? No.
  • Alligator only reads as input files of properties which qualify a given (clean or malicious) sample. The values of those properties are typically booleans, strings or numeric values. So, basically, alligator just analyzes a bunch of data and tries to make sense out of it, in terms of statistics.

  • Does Alligator understand this or that feature/property? No.
  • Alligator does not have this level of understanding. It is of no importance to Alligator if the 3rd property is a file size or a number of classes. Alligator only understands columns and expects the 3rd property to have the same type (e.g boolean, string, ...) across all clusters.

  • Can alligator be used for real-time sensitive applications? Yes ... and No.
  • All depends on your constraints, features, performance of your computer, classificaton algorithmes selected by Alligator during the training stage, etc. On the real case study performed with 50k elements in learning clusters clusters (50k elements), and for the classification of clean/malware applications, the classification of one unkonwn application takes a few milliseconds (average).


Support

Simply send an email to: ludovic.apvrille AT telecom-paristech.fr