Introduction to Weka Python
Weka Python makes you to use the Weka within the Python. The JavaBridge library was used to communicating with JVM and to start-up, shutting down the Java Virtual Machine in which to execute the Weka processes. In your classpath we can frequently include the entire Weka Packages. The Weka is an Open-Source Software which make available for the tools of various algorithms in machine learning, data pre-processing and visualization.
What is weka python?
WEKA – is open source software which offers tools for the execution of various algorithms in Machine-Learning technique, data pre-processing and visualization tools therefore you can extend the machine learning skills and also we can make use of it for real-world data mining problems. In Weka-Python allows to the thin wrapper in the order of essential functionalities of weka, by frequently can include the entire class-path in Weka Packages.
Using weka from Python
When using the Python within Weka, it has numerous benefits of library programs that the Python offers. We need to install the Python and the python-weka wrapper libraries for the usage of python. By using Weka-Wrapper 3 Python-3 library to right to use the most non GUI functions of Weka.
Python and Weka both are tools that are broadly used in the analytics of data, by using Python we can get resultant outcomes in the enhanced performance of finding the correct or incorrect instances, recalling the program, and to the precision of data.
Weka Python Example code
Let’s see the following examples in aspects of how to make use of the Python-Weka-Wrapper from Python,
To facilitate to use the library we have to maintain the JVM (Java Virtual Machine). Initially to start the libraries make use the following code,
import weka.core.jvm as jvm jvm.start()
When we want to call the ClassPath variables and to make use of the installed packages in Weka we need to use the following code to call,
jvm.start (system_cp= True, packages= True)
if the Weka home directory is not available in wekafiles, then we have to specify in two options for alternative locations one is to make use of the WEKA_HOME environment variable or to make use the packages parameter to supplying the directory, let’s see the code below as follows
jvm. start (packages="/my/packages/are/somewhere/else")
Generally we have to increase the size of heap maximum for JVM, for that purpose we need to reserves as 512 MB, like
jvm. Start (max_heap_size="512m")
In the end we have to stop the JVM, for that follows the code as below
jvm. Stop ()
The Option-Handling derived from OptionHandler it’s the module of weka.core.classes it allows to get and set the options via property options. There are two examples to instantiate a J48 classifier one is used for option property and another is for shortcut for constructor,
from weka.classifiers import Classifier cls= Classifier(classname = "weka.classifiers. trees. J48") cls.options= ["-C", "0.3"] from weka.classifiers import Classifier cls=Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
By using the option property we can also get the current set options, as follows
from weka.classifiers import Classifier Cls = Classifier (classname = "weka.classifiers.trees.J48", options= ["-C","0.3"]) Print (cls.options)
By using Weka’s Data Generators we can generate Artificial data for example Agrawal classification generator,
from weka.datagenerators import DataGenerator generator = DataGenerator (classname= "weka.datagenerators.classifiers.classification.Agrawal", options= ["-B","-P", "0.05"]) DataGenerator.make_data (generator, ["-o","/some/where/outputfile.arff"])
Loaders and Savers
To load and save the datasets of different data formats we can make use of the Loader and Saver classes, let’s see the following code loads an ARFF file and save it in CSV,
from weka.core.converters import Loader, Saver loader = Loader (classname = "weka.core.converters.ArffLoader") Data=loader.load_file("/some/where/iris.arff") Print(data) Saver= Saver(classname="weka.core.converters.CSVSaver") Saver.save_file(data,"/some/where/iris.csv")
The weka.core.converters module has easy methods for storing and loading the datasets called the load_any_file and save_any_file. Those methods decides the loader and saver based on the file extension,
import weka.core.converters as converters Data = converters.load_any_file("/some/where/iris.arff") converters.save_any_file(data,"/some/where/else/iris.csv")
The filter class we get from the weka.filters module which agree to filter the datasets, for example to remove the last attributes by using the Remove filter,
from weka.filters import Filter data = #already loaded data remove = Filter (classname="weka.filters.unsupervised.attribite.Remove", options=["-R","last"]) remove.inputformat(data) Filtered = remove.filter(data) print (filtered)
Let’s see the example for cross-validating the J48 classifier on the dataset and the result of specific statistics,
from weka.classifiers import Classifier, Evaluation from weka.core.classes import Random data = #already loaded data data.class_is_last() classifier = Classifier(classname="weka.classifiers.tree.J48", options= ["-C","0.3"]) evaluation = Evaluation (data) evaluation.crossvalidate_model (classifier, data, 10, Random(42)) print (evaluation.summary()) print ("pctCorrect: "+ str(evaluation.percent_correct)) print ("incorrect: " + str (evaluation.incorrect))
In this example we can see how to create the simpleKMeans with three clusters by using the lastly loaded dataset without the attribute class,
from weka.clusters import Clusterer data = #already loaded dataset clusterer =Clusterer (classname= "weka.clusterers.SimpleKMeans", options=["-N", "3"]) clusterer. Build_clusterer(data) print (clusterer)
Once the clusterer is created it can be used as the cluster Instance objects, as follows
for inst i n data cl= clusterer. Cluster_instance(inst) dist = clusterer. Distribution_for_instance(inst) dist= clusterer.distribition_for_instance(inst) print ("cluster=" + str(cl) +",distribution=" +str(dist))
Associators is like the Apriori which can create and output like,
from weka.associations import Associator data = # already loaded dataset associator =Associator (classname = "weka.associations.Apriori", options=["-N", "9", "-I"]) associator.build_associations(data) print (associator)
By using this method we can simply serialize and de-serialize the data. In this below code its a trained classifier to a file and to load it again from the disk to output the model,
from weka.classifiers import Classifier classifier = #already created classifier classifier.serialize("/some/where/out.model") ... classifier2, _ = Classifier.deserialize ("/some/where/out.model") print (classifier2)
The Clusters and the filters offers the serializer and de-serialize methods, the entire tasks of serialization and de-serialization comes from the weka.core.serialization module,
- Write (file, object)
- Write_all( file, [obj1,obj2, …])
- Read (file)
- Read_all (file)
The Weka-Python library requires working in Python 2.7
The javabridge we need to use the version of >==1.0.14 \
The library uses the javabridge library for starting up, communicating with and shutting down the Java Virtual Machine in which the Weka processes get executed.
pygraphviz it is optional
PIL is optional
Matplotlib is optional
In database Oracle requires the Oracle JDK 1.8+ versions
Use Weka version 3.9.3
Conclusion – Weka Python
In this article, we have come to know about the concepts of Weka-Python, hope the article helps you to enhance your knowledge in Machine-Learning techniques.
This is a guide to Weka Python. Here we discuss the concepts of Weka-Python, hope the article helps you to enhance your knowledge of Machine-Learning techniques. You may also have a look at the following articles to learn more –