Getting started in Python¶

Every data type object in libscientific is stored in the HEAP and then supports dynamic memory allocation.

In python, there is no need to allocate/deallocate matrix/vectors/tensors and models in general because the python binding itself automatically handles them.

Use libscientific in python¶

First, you need to install the c library and the python package. Please follow the process described here.

A program that use libscientific requires to import the python binding as follows

import libscientific
...

Vector operations¶

Create a vector in python¶

There are four different types of vectors

Double vector: dvector
Integer vector: ivector
Unsigned integer vector: uivector
String vector: strvector

Here we show an example on how create these four vector types.

#!/usr/bin/env python3
import libscientific
from random import random

# Create a list of values that you whant to convert to a double vector
a = [random() for j in range(5)]

# Transform the list a into a double vector d
d = libscientific.vector.DVector(a)

# Just print to video the content of vector d
d.debug()

# If you want to catch the value in position 1
print(d[1])

# If you want to modify the value in position 1
d[1] = -2

#If you want to get back the result as a "list" 
dlst = d.tolist()

for item in dlst:
    print(item)

Append a value to a given vector¶

Here we show an example on how to append a value to a vector.

#!/usr/bin/env python3
import libscientific
from random import random

# Create a list of values that you whant to convert to a double vector
a = [random() for j in range(5)]
d = libscientific.vector.DVector(a)
# print the output of the double vector d
print("orig vector")
d.debug()


# append the value 0.98765 at the end of d
d.append(0.98765)
print("append 0.98765 at the end")
d.debug()

# extend the vector d with more other values from a list
d.extend([0.4362, 0.34529, 0.99862])
print("extent the vector with 3 more values")
d.debug()

Matrix operations¶

Matrix is a user-defined data type that contains information in regards to - the number of rows - the number of columns - the 2D data array which defines the matrix

The data array in python uses the c language implementation. However, memory allocation/destruction is carried out directly from the python class. Hence there is no need to free up the memory manually.

Create a matrix in python¶

In this example, we show how to create a matrix from a list of lists (or numpy array), modify its content and convert it again to a list of lists.

#!/usr/bin/env python3
import libscientific
from random import random

# Create a random list of list 
a = [[random() for j in range(2)] for i in range(10)]

# Convert the list of list matrix into a libscientific matrix
m = libscientific.matrix.Matrix(a)

# Get the value at row 1, column 1
print("Get value example")
print(m[1, 1])

# Modify the value at row 1, column 1
print("Set value example")
m[1, 1] = -2.
m.debug()


# Convert the matrix again to a list of list
mlst = m.tolist()
for row in mlst:
    print(row)

Tensor operations¶

Tensor is a user-defined data type that contains: - order: the number of matrix - m: the array the 2D data array, which defines the tensor itself.

The data array in python uses the c language implementation. However, memory allocation/destruction is carried out directly from the python class. Hence there is no need to free up the memory manually

Create a tensor in python¶

In this example, we show how to create a tensor from a list of list of lists (or numpy array), modify its content and convert it again to a list of lists.

#!/usr/bin/env python3
import libscientific
from random import random

# Create a random list of list 
a = [[[random() for j in range(2)] for i in range(10)] for k in range(3)]

# Convert the list of list of lists into a libscientific tensor
t = libscientific.tensor.Tensor(a)

# Get the value at row 1, column 1
print("Get value example")
print(t[1, 1, 1])

# Modify the value at row 1, column 1
print("Set value example")
t[1, 1, 1] = -2.
t.debug()


# Convert the matrix again to a list of list
tlst = t.tolist()
i = 1
for block in tlst:
    print("Block %d" % (i))
    for row in block:
        print(row)
    i+=1

Multivariate analysis algorithms¶

In this section, you will find examples of running multivariate analysis algorithms. In particular, the algorithm described here is extracted from official libscientific publications and is adapted to run in multithreading to speed up the calculation.

PCA and PLS implements the NIPALS algorithm described in the following publication:

P. Geladi, B.R. Kowalski
Partial least-squares regression: a tutorial
Analytica Chimica Acta Volume 185, 1986, Pages 1–17
DOI:10.1016/0003-2670(86)80028-9

CPCA implements the NIPALS algorithm described in the following publication:

ANALYSIS OF MULTIBLOCK AND HIERARCHICAL PCA AND PLS MODELS
JOHAN A. WESTERHUIS, THEODORA KOURTI* AND JOHN F. MACGREGOR
J. Chemometrics 12, 301–321 (1998)
DOI:/10.1002/(SICI)1099-128X(199809/10)12:5<301::AID-CEM515>3.0.CO;2-S

Principal Component Analysis (PCA)¶

Here is an example that shows to compute a principal component analysis on a matrix.

#!/usr/bin/env python3

import libscientific
import random

def mx_to_video(m, decimals=5):
    for row in m:
        print("\t".join([str(round(x, decimals)) for x in row]))

random.seed(123456)

# Create a random matrix of 10 objects and 4 features
a = [[random.random() for j in range(4)] for i in range(10)]
print("Original Matrix")
mx_to_video(a)

# Compute 2 Principal components using the UV scaling (unit variance scaling)
model = libscientific.pca.PCA(scaling=1, npc=2)
# Fit the model
model.fit(a)

# Show the scores
print("Showing the PCA scores")
scores = model.get_scores()
mx_to_video(scores, 3)

# Show the loadings
print("Showing the PCA loadings")
loadings = model.get_loadings()
mx_to_video(loadings, 3)

# Show the explained variance
print(model.get_exp_variance())

# Show the loadings
print("Predict/Project new data into the PCA model")
p_scores = model.predict(a)
mx_to_video(p_scores)

# Reconstruct the original PCA matrix from the 2 principal components
print("Reconstruct the original PCA matrix using the PCA Model")
ra = model.reconstruct_original_matrix()
mx_to_video(ra)

# Save model
model.save("mymodel.sqlite3")

# Load model
model2 = PCA()
model2.load("mymodel.sqlite3")

Consensus Principal Component Analysis (CPCA)¶

Here is an example that shows how to compute a consenus principal component analysis on a tensor.

#!/usr/bin/env python3

import libscientific
import random

def mx_to_video(m, decimals=5):
    for row in m:
        print("\t".join([str(round(x, decimals)) for x in row]))

def t_to_video(t):
    i = 1
    for m in t:
        print("Block: %d" % (i))
        mx_to_video(m, 3)
        i+=1

random.seed(123456)

# Create a random matrix of 10 objects and 4 features
a = [[[random.random() for j in range(4)] for i in range(10)] for k in range(4)]

print("Original Matrix")
t_to_video(a)

# Compute 2 Principal components using the UV scaling (unit variance scaling)
model = libscientific.cpca.CPCA(scaling=1, npc=2)
# Fit the model
model.fit(a)

# Show the super scores
print("Showing the CPCA super scores")
sscores = model.get_super_scores()
mx_to_video(sscores, 3)

# Show the super weights
print("Showing the CPCA super weights")
sweights = model.get_super_weights()
mx_to_video(sweights, 3)

# Show the block scores
print("Showing the CPCA block scores")
block_scores = model.get_block_scores()
t_to_video(block_scores)

# Show the block loadings
print("Showing the CPCA block loadings")
block_loadings = model.get_block_loadings()
t_to_video(block_loadings)

# Show the total variance explained by the super scores
print("Showing the CPCA total variance explained")
print(model.get_total_exp_variance())

# Predict/Project new data into the model
print("Project/Predict new data into the CPCA model")
p_ss, p_bs = model.predict(a)
print("Showing the predicted super scores")
mx_to_video(p_ss, 3)
print("Showing the predicted block scores")
t_to_video(p_bs)

# Save model
model.save("mymodel.sqlite3")

# Load model
model2 = CPCA()
model2.load("mymodel.sqlite3")

Partial Least Squares (PLS)¶

A matrix of features or independent variables and a matrix of targets or dependent variables is requested to calculate a PLS model.

Here is a simple example that shows how to calculate a PLS model.

#!/usr/bin/env python3

import libscientific
import random

def mx_to_video(m, decimals=5):
    for row in m:
        print("\t".join([str(round(x, decimals)) for x in row]))

random.seed(123456)
x = [[random.random() for j in range(4)] for i in range(10)]
y = [[random.random() for j in range(1)] for i in range(10)]
xp = [[random.random() for j in range(4)] for i in range(10)]

print("Original Matrix")
print("X")
mx_to_video(x)
print("Y")
mx_to_video(y)
print("XP")
mx_to_video(xp)
print("Computing PLS ...")
model = libscientific.pls.PLS(nlv=2, xscaling=1, yscaling=0)
model.fit(x, y)
print("Showing the PLS T scores")
tscores = model.get_tscores()
mx_to_video(tscores, 3)

print("Showing the PLS U scores")
uscores = model.get_uscores()
mx_to_video(uscores, 3)

print("Showing the PLS P loadings")
ploadings = model.get_ploadings()
mx_to_video(ploadings, 3)

print("Showing the X Variance")
print(model.get_exp_variance())


print("Predict XP")
py, pscores = model.predict(xp)
print("Predicted Y for all LVs")
mx_to_video(py, 3)
print("Predicted Scores")
mx_to_video(pscores, 3)

# Save model
model.save("mymodel.sqlite3")

# Load model
model2 = PLS()
model2.load("mymodel.sqlite3")