Updated July 1, 2023

Introduction to Weka Python

Weka Python makes you use the Weka within Python. The JavaBridge library was used to communicate with JVM and to start up, shutting down the Java Virtual Machine in which to execute the Weka processes. In your classpath, we can frequently include the entire Weka Packages. The Weka is an Open-Source Software that makes available for various algorithms’ tools in machine learning, data pre-processing, and visualization.

What is weka python?

WEKA – is open source software that offers tools for the execution of various algorithms in Machine-Learning techniques, data pre-processing, and visualization tools; therefore, you can extend the machine-learning skills and also we can use it for real-world data mining problems. In Weka-Python allows the thin wrapper in the order of essential functionalities of Weka by frequently can include the entire class path in Weka Packages.

Using Weka from Python

When using Python within Weka, it has numerous benefits of library programs that Python offers. We need to install the Python and the python-weka wrapper libraries to use python. We are using Weka-Wrapper 3 Python-3 library to right to use the most non-GUI functions of Weka.

Python and Weka are tools that are broadly used in the analytics of data; by using Python we can get resultant outcomes in the enhanced performance of finding the correct or incorrect instances, recalling the program, and the precision of data.

Weka Python Example code

Let’s see the following examples in aspects of how to make use of the Python-Weka-Wrapper from Python,

To facilitate using the library, we must maintain the JVM (Java Virtual Machine). Initially, to start the libraries make use of the following code,

import weka.core.jvm as jvm
jvm.start()

When we want to call the ClassPath variables and make use of the installed packages in Weka, we need to use the following code to call,

jvm.start (system_cp= True, packages= True)

if the Weka home directory is not available in wekafiles, then we have to specify two options for alternative locations. One is to make use of the WEKA_HOME environment variable or to make use of the packages parameter to supply the directory; let’s see the code below as follows

jvm. start (packages="/my/packages/are/somewhere/else")

Generally, we have to increase the size of the heap maximum for JVM; for that purpose, we need to reserve as 512 MB, like

jvm. Start (max_heap_size="512m")

In the end, we have to stop the JVM, for that follows the code below

jvm. Stop ()

Option Handling

The Option-Handling derived from OptionHandler it’s the module of weka.core.classes it allows to get and set the options via property options. There are two examples to instantiate a J48 classifier one is used for option property and another is for shortcut for constructor,

from weka.classifiers import Classifier
cls= Classifier(classname = "weka.classifiers. trees. J48")
cls.options= ["-C", "0.3"]
from weka.classifiers import Classifier
cls=Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])

By using the option property we can also get the current set options, as follows

from weka.classifiers import Classifier
Cls = Classifier (classname = "weka.classifiers.trees.J48", options= ["-C","0.3"])
Print (cls.options)

Data Generators

By using Weka’s Data Generators we can generate Artificial data for example Agrawal classification generator,

from weka.datagenerators import DataGenerator
generator = DataGenerator (classname= "weka.datagenerators.classifiers.classification.Agrawal", options= ["-B","-P", "0.05"])
DataGenerator.make_data (generator, ["-o","https://cdn.educba.com/some/where/outputfile.arff"])

Loaders and Savers

To load and save the datasets of different data formats we can make use of the Loader and Saver classes, let’s see the following code loads an ARFF file and save it in CSV,

from weka.core.converters import Loader, Saver
loader = Loader (classname = "weka.core.converters.ArffLoader")
Data=loader.load_file("https://cdn.educba.com/some/where/iris.arff")
Print(data)
Saver= Saver(classname="weka.core.converters.CSVSaver")
Saver.save_file(data,"https://cdn.educba.com/some/where/iris.csv")

The weka.core.converters module has easy methods for storing and loading the datasets called the load_any_file and save_any_file. Those methods decides the loader and saver based on the file extension,

import weka.core.converters as converters
Data = converters.load_any_file("https://cdn.educba.com/some/where/iris.arff")
converters.save_any_file(data,"https://cdn.educba.com/some/where/else/iris.csv")

Filters

The filter class we get from the weka.filters module which agree to filter the datasets, for example to remove the last attributes by using the Remove filter,

from weka.filters import Filter
data =  			#already loaded data
remove = Filter (classname="weka.filters.unsupervised.attribite.Remove", options=["-R","last"])
remove.inputformat(data)
Filtered = remove.filter(data)
print (filtered)

Classifiers

Let’s see the example for cross-validating the J48 classifier on the dataset and the result of specific statistics,

from weka.classifiers import Classifier, Evaluation
from weka.core.classes import Random
data = 		#already loaded data
data.class_is_last()	
classifier = Classifier(classname="weka.classifiers.tree.J48", options= ["-C","0.3"])
evaluation = Evaluation (data)
evaluation.crossvalidate_model (classifier, data, 10, Random(42)) 
print (evaluation.summary())
print ("pctCorrect: "+ str(evaluation.percent_correct))
print ("incorrect: " + str (evaluation.incorrect))

Clusterers

In this example we can see how to create the simpleKMeans with three clusters by using the lastly loaded dataset without the attribute class,

from weka.clusters import Clusterer
data 		= #already loaded dataset
clusterer =Clusterer (classname= "weka.clusterers.SimpleKMeans", options=["-N", "3"])
clusterer. Build_clusterer(data)
print (clusterer)

Once the clusterer is created it can be used as the cluster Instance objects, as follows

for inst i n data
cl= clusterer. Cluster_instance(inst)
dist = clusterer. Distribution_for_instance(inst)
dist= clusterer.distribition_for_instance(inst)
print ("cluster=" + str(cl) +",distribution=" +str(dist))

Associators

Associators is like the Apriori which can create and output like,

from weka.associations import Associator
data = 		# already loaded dataset
associator =Associator (classname = "weka.associations.Apriori", options=["-N", "9", "-I"])
associator.build_associations(data)
print (associator)

Serialization

By using this method we can simply serialize and de-serialize the data. In this below code its a trained classifier to a file and to load it again from the disk to output the model,

from weka.classifiers import Classifier
classifier = 	#already created classifier
classifier.serialize("https://cdn.educba.com/some/where/out.model")
...	
classifier2, _ = Classifier.deserialize ("https://cdn.educba.com/some/where/out.model")
print (classifier2)

The Clusters and the filters offers the serializer and de-serialize methods, the entire tasks of serialization and de-serialization comes from the weka.core.serialization module,

Write (file, object)
Write_all( file, [obj1,obj2, …])
Read (file)
Read_all (file)

Requirements

The Weka-Python library requires working in Python 2.7

The javabridge we need to use the version of >==1.0.14 \

The library uses the javabridge library for starting up, communicating with and shutting down the Java Virtual Machine in which the Weka processes get executed.

pygraphviz it is optional

PIL is optional

Matplotlib is optional

In database Oracle requires the Oracle JDK 1.8+ versions

Use Weka version 3.9.3

Conclusion – Weka Python

In this article, we have come to know about the concepts of Weka-Python, hope the article helps you to enhance your knowledge in Machine-Learning techniques.

Quiz Result
Total Questions	Correct Answers	Wrong Answers	Percentage