Updated July 1, 2023
Introduction to Weka Python
Weka Python makes you use the Weka within Python. The JavaBridge library was used to communicate with JVM and to start up, shutting down the Java Virtual Machine in which to execute the Weka processes. In your classpath, we can frequently include the entire Weka Packages. The Weka is an Open-Source Software that makes available for various algorithms’ tools in machine learning, data pre-processing, and visualization.
What is weka python?
WEKA – is open source software that offers tools for the execution of various algorithms in Machine-Learning techniques, data pre-processing, and visualization tools; therefore, you can extend the machine-learning skills and also we can use it for real-world data mining problems. In Weka-Python allows the thin wrapper in the order of essential functionalities of Weka by frequently can include the entire class path in Weka Packages.
Using Weka from Python
When using Python within Weka, it has numerous benefits of library programs that Python offers. We need to install the Python and the python-weka wrapper libraries to use python. We are using Weka-Wrapper 3 Python-3 library to right to use the most non-GUI functions of Weka.
Python and Weka are tools that are broadly used in the analytics of data; by using Python we can get resultant outcomes in the enhanced performance of finding the correct or incorrect instances, recalling the program, and the precision of data.
Weka Python Example code
Let’s see the following examples in aspects of how to make use of the Python-Weka-Wrapper from Python,
To facilitate using the library, we must maintain the JVM (Java Virtual Machine). Initially, to start the libraries make use of the following code,
import weka.core.jvm as jvm
jvm.start()
When we want to call the ClassPath variables and make use of the installed packages in Weka, we need to use the following code to call,
jvm.start (system_cp= True, packages= True)
if the Weka home directory is not available in wekafiles, then we have to specify two options for alternative locations. One is to make use of the WEKA_HOME environment variable or to make use of the packages parameter to supply the directory; let’s see the code below as follows
jvm. start (packages="/my/packages/are/somewhere/else")
Generally, we have to increase the size of the heap maximum for JVM; for that purpose, we need to reserve as 512 MB, like
jvm. Start (max_heap_size="512m")
In the end, we have to stop the JVM, for that follows the code below
jvm. Stop ()
Option Handling
The Option-Handling derived from OptionHandler it’s the module of weka.core.classes it allows to get and set the options via property options. There are two examples to instantiate a J48 classifier one is used for option property and another is for shortcut for constructor,
from weka.classifiers import Classifier
cls= Classifier(classname = "weka.classifiers. trees. J48")
cls.options= ["-C", "0.3"]
from weka.classifiers import Classifier
cls=Classifier(classname="weka.classifiers.trees.J48", options=["-C", "0.3"])
By using the option property we can also get the current set options, as follows
from weka.classifiers import Classifier
Cls = Classifier (classname = "weka.classifiers.trees.J48", options= ["-C","0.3"])
Print (cls.options)
Data Generators
By using Weka’s Data Generators we can generate Artificial data for example Agrawal classification generator,
from weka.datagenerators import DataGenerator
generator = DataGenerator (classname= "weka.datagenerators.classifiers.classification.Agrawal", options= ["-B","-P", "0.05"])
DataGenerator.make_data (generator, ["-o","https://cdn.educba.com/some/where/outputfile.arff"])
Loaders and Savers
To load and save the datasets of different data formats we can make use of the Loader and Saver classes, let’s see the following code loads an ARFF file and save it in CSV,
from weka.core.converters import Loader, Saver
loader = Loader (classname = "weka.core.converters.ArffLoader")
Data=loader.load_file("https://cdn.educba.com/some/where/iris.arff")
Print(data)
Saver= Saver(classname="weka.core.converters.CSVSaver")
Saver.save_file(data,"https://cdn.educba.com/some/where/iris.csv")
The weka.core.converters module has easy methods for storing and loading the datasets called the load_any_file and save_any_file. Those methods decides the loader and saver based on the file extension,
import weka.core.converters as converters
Data = converters.load_any_file("https://cdn.educba.com/some/where/iris.arff")
converters.save_any_file(data,"https://cdn.educba.com/some/where/else/iris.csv")
Filters
The filter class we get from the weka.filters module which agree to filter the datasets, for example to remove the last attributes by using the Remove filter,
from weka.filters import Filter
data = #already loaded data
remove = Filter (classname="weka.filters.unsupervised.attribite.Remove", options=["-R","last"])
remove.inputformat(data)
Filtered = remove.filter(data)
print (filtered)
Classifiers
Let’s see the example for cross-validating the J48 classifier on the dataset and the result of specific statistics,
from weka.classifiers import Classifier, Evaluation
from weka.core.classes import Random
data = #already loaded data
data.class_is_last()
classifier = Classifier(classname="weka.classifiers.tree.J48", options= ["-C","0.3"])
evaluation = Evaluation (data)
evaluation.crossvalidate_model (classifier, data, 10, Random(42))
print (evaluation.summary())
print ("pctCorrect: "+ str(evaluation.percent_correct))
print ("incorrect: " + str (evaluation.incorrect))
Clusterers
In this example we can see how to create the simpleKMeans with three clusters by using the lastly loaded dataset without the attribute class,
from weka.clusters import Clusterer
data = #already loaded dataset
clusterer =Clusterer (classname= "weka.clusterers.SimpleKMeans", options=["-N", "3"])
clusterer. Build_clusterer(data)
print (clusterer)
Once the clusterer is created it can be used as the cluster Instance objects, as follows
for inst i n data
cl= clusterer. Cluster_instance(inst)
dist = clusterer. Distribution_for_instance(inst)
dist= clusterer.distribition_for_instance(inst)
print ("cluster=" + str(cl) +",distribution=" +str(dist))
Associators
Associators is like the Apriori which can create and output like,
from weka.associations import Associator
data = # already loaded dataset
associator =Associator (classname = "weka.associations.Apriori", options=["-N", "9", "-I"])
associator.build_associations(data)
print (associator)
Serialization
By using this method we can simply serialize and de-serialize the data. In this below code its a trained classifier to a file and to load it again from the disk to output the model,
from weka.classifiers import Classifier
classifier = #already created classifier
classifier.serialize("https://cdn.educba.com/some/where/out.model")
...
classifier2, _ = Classifier.deserialize ("https://cdn.educba.com/some/where/out.model")
print (classifier2)
The Clusters and the filters offers the serializer and de-serialize methods, the entire tasks of serialization and de-serialization comes from the weka.core.serialization module,
- Write (file, object)
- Write_all( file, [obj1,obj2, …])
- Read (file)
- Read_all (file)
Requirements
The Weka-Python library requires working in Python 2.7
The javabridge we need to use the version of >==1.0.14 \
The library uses the javabridge library for starting up, communicating with and shutting down the Java Virtual Machine in which the Weka processes get executed.
pygraphviz it is optional
PIL is optional
Matplotlib is optional
In database Oracle requires the Oracle JDK 1.8+ versions
Use Weka version 3.9.3
Conclusion – Weka Python
In this article, we have come to know about the concepts of Weka-Python, hope the article helps you to enhance your knowledge in Machine-Learning techniques.
Recommended Articles
We hope that this EDUCBA information on “Weka Python” was beneficial to you. You can view EDUCBA’s recommended articles for more information.