Introduction to Weka Python
Weka Python makes you to use the Weka within the Python. The JavaBridge library was used to communicating with JVM and to start-up, shutting down the Java Virtual Machine in which to execute the Weka processes. In your classpath we can frequently include the entire Weka Packages. The Weka is an Open-Source Software which make available for the tools of various algorithms in machine learning, data pre-processing and visualization.
What is weka python?
WEKA – is open source software which offers tools for the execution of various algorithms in Machine-Learning technique, data pre-processing and visualization tools therefore you can extend the machine learning skills and also we can make use of it for real-world data mining problems. In Weka-Python allows to the thin wrapper in the order of essential functionalities of weka, by frequently can include the entire class-path in Weka Packages.
Using weka from Python
When using the Python within Weka, it has numerous benefits of library programs which the Python offers. We need to install the Python and the python-weka wrapper libraries for the usage of python. By using Weka-Wrapper 3 Python-3 library to right to use the most non GUI functions of Weka.
The Python and the Weka both are the tools which are broadly used in the analytics of data, by using the Python we can get resultant outcomes in the enhanced performance of finding the correct or incorrect instances, recalling the program and to precision of data.
Weka Python Example code
Let’s see the following examples in aspects of how to make use of the Python-Weka-Wrapper from Python,
To facilitate to use the library we have to maintain the JVM (Java Virtual Machine). Initially to start the libraries make use the following code,
>>> import weka.core.jvm as jvm >>> jvm.start()
When we want to call the ClassPath variables and to make use of the installed packages in Weka we need to use the following code to call,
>>> jvm.start (system_cp= True, packages= True)
if the Weka home directory is not available in wekafiles, then we have to specify in two options for alternative locations one is to make use of the WEKA_HOME environment variable or to make use the packages parameter to supplying the directory, let’s see the code below as follows
>>> jvm. start (packages="/my/packages/are/somewhere/else")
Generally we have to increase the size of heap maximum for JVM, for that purpose we need to reserves as 512 MB, like
>>> jvm. Start (max_heap_size="512m")
In the end we have to stop the JVM, for that follows the code as below
>>> jvm. Stop ()
The Option-Handling derived from OptionHandler it’s the module of weka.core.classes it allows to get and set the options via property options. There are two examples to instantiate a J48 classifier one is used for option property and another is for shortcut for constructor,
>>> from weka.classifiers import Classifier >>> cls= Classifier(classname = "weka.classifiers. trees. J48") >>> cls.options= ["-C", "0.3"] >>> from weka.classifiers import Classifier >>> cls=Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
By using the option property we can also get the current set options, as follows
>> from weka.classifiers import Classifier >>> Cls = Classifier (classname = "weka.classifiers.trees.J48", options= ["-C","0.3"]) >>> Print (cls.options)
By using Weka’s Data Generators we can generate Artificial data for example Agrawal classification generator,
>>> from weka.datagenerators import DataGenerator >>> generator = DataGenerator (classname= "weka.datagenerators.classifiers.classification.Agrawal", options= ["-B","-P", "0.05"]) >>> DataGenerator.make_data (generator, ["-o","/some/where/outputfile.arff"])
Loaders and Savers
To load and save the datasets of different data formats we can make use of the Loader and Saver classes, let’s see the following code loads an ARFF file and save it in CSV,
>>> from weka.core.converters import Loader, Saver >>> loader = Loader (classname = "weka.core.converters.ArffLoader") >>> Data=loader.load_file("/some/where/iris.arff") >>> Print(data) >>> Saver= Saver(classname="weka.core.converters.CSVSaver") >>> Saver.save_file(data,"/some/where/iris.csv")
The weka.core.converters module has easy methods for storing and loading the datasets called the load_any_file and save_any_file. Those methods decides the loader and saver based on the file extension,
>>> import weka.core.converters as converters >>> Data = converters.load_any_file("/some/where/iris.arff") >>>converters.save_any_file(data,"/some/where/else/iris.csv")
The filter class we get from the weka.filters module which agree to filter the datasets, for example to remove the last attributes by using the Remove filter,
>> from weka.filters import Filter >>> data = #already loaded data >>> remove = Filter (classname="weka.filters.unsupervised.attribite.Remove", options=["-R","last"]) >>> remove.inputformat(data) >>> Filtered = remove.filter(data) >>> print (filtered)
Let’s see the example for cross validating the J48 classifier on the dataset and the result of specific statistics,
>>> from weka.classifiers import Classifier, Evaluation >>> from weka.core.classes import Random >>> data = #already loaded data >>> data.class_is_last() >>> classifier = Classifier(classname="weka.classifiers.tree.J48", options= ["-C","0.3"]) >>> evaluation = Evaluation (data) >>> evaluation.crossvalidate_model (classifier, data, 10, Random(42)) >>> print (evaluation.summary()) >>> print ("pctCorrect: "+ str(evaluation.percent_correct)) >>> print ("incorrect: " + str (evaluation.incorrect))
In this example we can see how to create the simpleKMeans with three clusters by using the lastly loaded dataset without the attribute class,
>>> from weka.clusters import Clusterer >>> data = #already loaded dataset >>> clusterer =Clusterer (classname= "weka.clusterers.SimpleKMeans", options=["-N", "3"]) >>> clusterer. Build_clusterer(data) >>> print (clusterer)
Once the clusterer is created it can be used as the cluster Instance objects, as follows
>>> for inst i n data >>> cl= clusterer. Cluster_instance(inst) >>> dist = clusterer. Distribution_for_instance(inst) >>> dist= clusterer.distribition_for_instance(inst) >>> print ("cluster=" + str(cl) +",distribution=" +str(dist))
Associators is like the Apriori which can create and output like,
>>> from weka.associations import Associator >>> data = # already loaded dataset >>> associator =Associator (classname = "weka.associations.Apriori", options=["-N", "9", "-I"]) >>>associator.build_associations(data) >>> print (associator)
By using this method we can simply serialize and de-serialize the data. In this below code its a trained classifier to a file and to load it again from the disk to output the model,
>>> from weka.classifiers import Classifier >>> classifier = #already created classifier >>> classifier.serialize("/some/where/out.model") >>> ... >>> classifier2, _ = Classifier.deserialize ("/some/where/out.model") >>> print (classifier2)
The Clusters and the filters offers the serializer and de-serialize methods, the entire tasks of serialization and de-serialization comes from the weka.core.serialization module,
- Write (file, object)
- Write_all( file, [obj1,obj2, …])
- Read (file)
- Read_all (file)
The Weka-Python library requires working in Python 2.7
The javabridge we need to use the version of >==1.0.14 \
The library uses the javabridge library for starting up, communicating with and shutting down the Java Virtual Machine in which the Weka processes get executed.
pygraphviz it is optional
PIL is optional
Matplotlib is optional
In database Oracle requires the Oracle JDK 1.8+ versions
Use Weka version 3.9.3
Conclusion – Weka Python
In this article, we have come to know about the concepts of Weka-Python, hope the article helps you to enhance your knowledge in Machine-Learning techniques.
This is a guide to Weka Python. Here we discuss the concepts of Weka-Python, hope the article helps you to enhance your knowledge in Machine-Learning techniques. You may also have a look at the following articles to learn more –