Getting started in Python

Every data type object in libscientific is stored in the HEAP and then supports dynamic memory allocation.

In python, there is no need to allocate/deallocate matrix/vectors/tensors and models in general because the python binding itself automatically handles them.

Use libscientific in python

First, you need to install the c library and the python package. Please follow the process described here.

A program that use libscientific requires to import the python binding as follows

1import libscientific
2...

Vector operations

Create a vector in python

There are four different types of vectors

  • Double vector: dvector

  • Integer vector: ivector

  • Unsigned integer vector: uivector

  • String vector: strvector

Here we show an example on how create these four vector types.

 1#!/usr/bin/env python3
 2import libscientific
 3from random import random
 4
 5# Create a list of values that you whant to convert to a double vector
 6a = [random() for j in range(5)]
 7
 8# Transform the list a into a double vector d
 9d = libscientific.vector.DVector(a)
10
11# Just print to video the content of vector d
12d.debug()
13
14# If you want to catch the value in position 1
15print(d[1])
16
17# If you want to modify the value in position 1
18d[1] = -2
19
20#If you want to get back the result as a "list" 
21dlst = d.tolist()
22
23for item in dlst:
24    print(item)

Append a value to a given vector

Here we show an example on how to append a value to a vector.

 1#!/usr/bin/env python3
 2import libscientific
 3from random import random
 4
 5# Create a list of values that you whant to convert to a double vector
 6a = [random() for j in range(5)]
 7d = libscientific.vector.DVector(a)
 8# print the output of the double vector d
 9print("orig vector")
10d.debug()
11
12
13# append the value 0.98765 at the end of d
14d.append(0.98765)
15print("append 0.98765 at the end")
16d.debug()
17
18# extend the vector d with more other values from a list
19d.extend([0.4362, 0.34529, 0.99862])
20print("extent the vector with 3 more values")
21d.debug()

Matrix operations

Matrix is a user-defined data type that contains information in regards to - the number of rows - the number of columns - the 2D data array which defines the matrix

The data array in python uses the c language implementation. However, memory allocation/destruction is carried out directly from the python class. Hence there is no need to free up the memory manually.

Create a matrix in python

In this example, we show how to create a matrix from a list of lists (or numpy array), modify its content and convert it again to a list of lists.

 1#!/usr/bin/env python3
 2import libscientific
 3from random import random
 4
 5# Create a random list of list 
 6a = [[random() for j in range(2)] for i in range(10)]
 7
 8# Convert the list of list matrix into a libscientific matrix
 9m = libscientific.matrix.Matrix(a)
10
11# Get the value at row 1, column 1
12print("Get value example")
13print(m[1, 1])
14
15# Modify the value at row 1, column 1
16print("Set value example")
17m[1, 1] = -2.
18m.debug()
19
20
21# Convert the matrix again to a list of list
22mlst = m.tolist()
23for row in mlst:
24    print(row)

Tensor operations

Tensor is a user-defined data type that contains: - order: the number of matrix - m: the array the 2D data array, which defines the tensor itself.

The data array in python uses the c language implementation. However, memory allocation/destruction is carried out directly from the python class. Hence there is no need to free up the memory manually

Create a tensor in python

In this example, we show how to create a tensor from a list of list of lists (or numpy array), modify its content and convert it again to a list of lists.

 1#!/usr/bin/env python3
 2import libscientific
 3from random import random
 4
 5# Create a random list of list 
 6a = [[[random() for j in range(2)] for i in range(10)] for k in range(3)]
 7
 8# Convert the list of list of lists into a libscientific tensor
 9t = libscientific.tensor.Tensor(a)
10
11# Get the value at row 1, column 1
12print("Get value example")
13print(t[1, 1, 1])
14
15# Modify the value at row 1, column 1
16print("Set value example")
17t[1, 1, 1] = -2.
18t.debug()
19
20
21# Convert the matrix again to a list of list
22tlst = t.tolist()
23i = 1
24for block in tlst:
25    print("Block %d" % (i))
26    for row in block:
27        print(row)
28    i+=1

Multivariate analysis algorithms

In this section, you will find examples of running multivariate analysis algorithms. In particular, the algorithm described here is extracted from official libscientific publications and is adapted to run in multithreading to speed up the calculation.

  • PCA and PLS implements the NIPALS algorithm described in the following publication:

P. Geladi, B.R. Kowalski
Partial least-squares regression: a tutorial
Analytica Chimica Acta Volume 185, 1986, Pages 1–17
DOI:10.1016/0003-2670(86)80028-9
  • CPCA implements the NIPALS algorithm described in the following publication:

ANALYSIS OF MULTIBLOCK AND HIERARCHICAL PCA AND PLS MODELS
JOHAN A. WESTERHUIS, THEODORA KOURTI* AND JOHN F. MACGREGOR
J. Chemometrics 12, 301–321 (1998)
DOI:/10.1002/(SICI)1099-128X(199809/10)12:5<301::AID-CEM515>3.0.CO;2-S

Principal Component Analysis (PCA)

Here is an example that shows to compute a principal component analysis on a matrix.

 1#!/usr/bin/env python3
 2
 3import libscientific
 4import random
 5
 6def mx_to_video(m, decimals=5):
 7    for row in m:
 8        print("\t".join([str(round(x, decimals)) for x in row]))
 9
10random.seed(123456)
11
12# Create a random matrix of 10 objects and 4 features
13a = [[random.random() for j in range(4)] for i in range(10)]
14print("Original Matrix")
15mx_to_video(a)
16
17# Compute 2 Principal components using the UV scaling (unit variance scaling)
18model = libscientific.pca.PCA(scaling=1, npc=2)
19# Fit the model
20model.fit(a)
21
22# Show the scores
23print("Showing the PCA scores")
24scores = model.get_scores()
25mx_to_video(scores, 3)
26
27# Show the loadings
28print("Showing the PCA loadings")
29loadings = model.get_loadings()
30mx_to_video(loadings, 3)
31
32# Show the explained variance
33print(model.get_exp_variance())
34
35# Show the loadings
36print("Predict/Project new data into the PCA model")
37p_scores = model.predict(a)
38mx_to_video(p_scores)
39
40# Reconstruct the original PCA matrix from the 2 principal components
41print("Reconstruct the original PCA matrix using the PCA Model")
42ra = model.reconstruct_original_matrix()
43mx_to_video(ra)
44
45# Save model
46model.save("mymodel.sqlite3")
47
48# Load model
49model2 = PCA()
50model2.load("mymodel.sqlite3")

Consensus Principal Component Analysis (CPCA)

Here is an example that shows how to compute a consenus principal component analysis on a tensor.

 1#!/usr/bin/env python3
 2
 3import libscientific
 4import random
 5
 6def mx_to_video(m, decimals=5):
 7    for row in m:
 8        print("\t".join([str(round(x, decimals)) for x in row]))
 9
10def t_to_video(t):
11    i = 1
12    for m in t:
13        print("Block: %d" % (i))
14        mx_to_video(m, 3)
15        i+=1
16
17random.seed(123456)
18
19# Create a random matrix of 10 objects and 4 features
20a = [[[random.random() for j in range(4)] for i in range(10)] for k in range(4)]
21
22print("Original Matrix")
23t_to_video(a)
24
25# Compute 2 Principal components using the UV scaling (unit variance scaling)
26model = libscientific.cpca.CPCA(scaling=1, npc=2)
27# Fit the model
28model.fit(a)
29
30# Show the super scores
31print("Showing the CPCA super scores")
32sscores = model.get_super_scores()
33mx_to_video(sscores, 3)
34
35# Show the super weights
36print("Showing the CPCA super weights")
37sweights = model.get_super_weights()
38mx_to_video(sweights, 3)
39
40# Show the block scores
41print("Showing the CPCA block scores")
42block_scores = model.get_block_scores()
43t_to_video(block_scores)
44
45# Show the block loadings
46print("Showing the CPCA block loadings")
47block_loadings = model.get_block_loadings()
48t_to_video(block_loadings)
49
50# Show the total variance explained by the super scores
51print("Showing the CPCA total variance explained")
52print(model.get_total_exp_variance())
53
54# Predict/Project new data into the model
55print("Project/Predict new data into the CPCA model")
56p_ss, p_bs = model.predict(a)
57print("Showing the predicted super scores")
58mx_to_video(p_ss, 3)
59print("Showing the predicted block scores")
60t_to_video(p_bs)
61
62# Save model
63model.save("mymodel.sqlite3")
64
65# Load model
66model2 = CPCA()
67model2.load("mymodel.sqlite3")

Partial Least Squares (PLS)

A matrix of features or independent variables and a matrix of targets or dependent variables is requested to calculate a PLS model.

Here is a simple example that shows how to calculate a PLS model.

 1#!/usr/bin/env python3
 2
 3import libscientific
 4import random
 5
 6def mx_to_video(m, decimals=5):
 7    for row in m:
 8        print("\t".join([str(round(x, decimals)) for x in row]))
 9
10random.seed(123456)
11x = [[random.random() for j in range(4)] for i in range(10)]
12y = [[random.random() for j in range(1)] for i in range(10)]
13xp = [[random.random() for j in range(4)] for i in range(10)]
14
15print("Original Matrix")
16print("X")
17mx_to_video(x)
18print("Y")
19mx_to_video(y)
20print("XP")
21mx_to_video(xp)
22print("Computing PLS ...")
23model = libscientific.pls.PLS(nlv=2, xscaling=1, yscaling=0)
24model.fit(x, y)
25print("Showing the PLS T scores")
26tscores = model.get_tscores()
27mx_to_video(tscores, 3)
28
29print("Showing the PLS U scores")
30uscores = model.get_uscores()
31mx_to_video(uscores, 3)
32
33print("Showing the PLS P loadings")
34ploadings = model.get_ploadings()
35mx_to_video(ploadings, 3)
36
37print("Showing the X Variance")
38print(model.get_exp_variance())
39
40
41print("Predict XP")
42py, pscores = model.predict(xp)
43print("Predicted Y for all LVs")
44mx_to_video(py, 3)
45print("Predicted Scores")
46mx_to_video(pscores, 3)
47
48# Save model
49model.save("mymodel.sqlite3")
50
51# Load model
52model2 = PLS()
53model2.load("mymodel.sqlite3")