Sunday, December 8, 2024

What is the difference between ESXi inbox driver and ESXi async driver

 VMware ESXi is a hypervisor software that allows users to create virtual machines (VMs) on physical hardware.

An ESXi driver is a software component that enables the use of a specific device or resource for VMware ESXi.

The ESXi inbox drivers are shipped with ESXi.

ESXi async drivers are developed by hardware vendors or third-party vendors.  They are not bundled with the ESXi software.  They are usually downloaded from Broadcom website and then installed.

When an async driver is installed, the inbox driver is not removed.  This results in more than one driver being installed for the same device.

When both inbox and async drivers are present, both are displayed as installed.  However, only one of them is loaded and is in use.

How to determine which drivers are installed

# esxcli software vib list | less

# esxcli software vib list | egrep <driver_name>

Note : If both inbox and async driver are installed, the above grep command displays both


How to determine which drivers are in use

# esxcfg-info | less

In case of network drivers :

# ethtool -i vmnicX

To identify the vmnic # of the associated NIC :

# esxcfg-nics -l

# esxcli network nic get -n vmnicX


Should you use ESXi inbox driver or the ESXi async driver

Obviously, there is no simple answer to this question.  A careful review of the situation should help to come to a decision.

Factors to be considered could be :

  • Is the hardware vendor doing active development and fixing many issues in the async driver?
  • Do you want to use a native driver or a vmklinux driver?  And why?
  • Comparing the release notes of the inbox driver and async driver, do you see a pattern?
  • Which one seems to be more stable?
  • Which one seems to be more actively maintained?

Although this review may take a long time, making this informed decision may avoid an outage or an unpleasant situation in the future.


Thursday, June 6, 2024

Python hands-on self learning

Classroom training may not be sufficient to learn Python. You have to get your hands dirty. Here are some assignments that you could do, to learn Python the self-help way.

Exercise 1. In a Python script, accept command line arguments. Display all the arguments and also the number of arguments.


Expected output:

$ ex_01.py here there
arguments:
here
there
Number of arguments: 2
$

Exercise 2. Accept a filename as command line argument. Display the contents of that file.

Expected output:

$ cat > ringfile.txt
Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
In the Land of Mordor where the Shadows lie.
(Ctrl+D)$
$
$ ex_02.py ringfile.txt
Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
In the Land of Mordor where the Shadows lie.

$

This is similar to displaying the contents of a file using cat command.


Exercise 3. Accept a filename as command line argument. Display the contents of that file in the opposite order that they appear in the file.

Expected output:

$ ex_03.py ringfile.txt
In the Land of Mordor where the Shadows lie.
One Ring to bring them all and in the darkness bind them
One Ring to rule them all, One Ring to find them,
In the Land of Mordor where the Shadows lie.
One for the Dark Lord on his dark throne
Nine for Mortal Men doomed to die,
Seven for the Dwarf-lords in their halls of stone,
Three Rings for the Elven-kings under the sky,

$


Exercise 4. Accept a string as first argument and a filename as second argument. Display all lines of that file which contain the given string.

Expected output:

$ ex_04.py them ringfile.txt
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them

$


Exercise 5. Accept a username as command line argument. Display if that user is currently logged in or not. Run who or w command and use its output to determine if the given user is logged in or not.

Expected output:

$ ex_05.py yogesh
yogesh is not logged in
$
$ ex_05.py root
root is logged in
$
$ ex_05.py roo
roo is not logged in

$


Exercise 6. Accept command line arguments. Consider all command line arguments as user names and for all the given user names, display if they are logged in or not.

Expected output:

$ ex_06.py yogesh pts roo root
yogesh is not logged in
pts is not logged in
roo is not logged in
root is logged in

$


Exercise 7. Display today's date in the format: 22-Aug-2008

Exercise 8. Display yesterday's date in the format: Thu Aug 21 16:58:10 2008

Exercise 8.1. Display tomorrow's date in the format: 21-Aug-2008


Exercise 9. Declare a list as follows:

nums = [5, 10, 15, 23, 20, 24, 30, 33, 40, 13]

Sort this list numerically in ascending order.

Exercise 9.1. Sort this list numerically in descending order.


Exercise 10. Declare a list as follows:

nums = [5, 10, 20, 23, 15, 23, 20, 5, 24, 30, 33, 40, 3, 13]

Find the duplicate elements from this list


Exercise 11. Declare a list as follows:

nums = [5, 10, 20, 23, 15, 23, 20, 5, 24, 30, 33, 40, 3, 13]

Find unique elements from this list


Exercise 12. Accept a string as command line argument. Declare a dictionary as follows:

food={'apple':'red', 'banana':'yellow', 'tomato':'red', 'spinach':'green', 'lemon':'yellow',}

Check whether the given string is present as a key in this dictionary.

Expected output:

$ ex_38.py apple
We've got apple
$
$ ex_38.py orange
We've got no orange

$


Exercise 13. Declare a dictionary as follows:

dict={d:12, b:20, g:2, a:6, e:1, h:13, c:8, f:5}

Display keys and values from this dictionary sorted by keys.

Expected output:

$ ex_13.py
key = a value = 6
key = b value = 20
key = c value = 8
key = d value = 12
key = e value = 1
key = f value = 5
key = g value = 2
key = h value = 13

$


Exercise 14. Declare a dictionary as follows:

dict={d:12, b:20, g:2, a:6, e:1, h:13, c:8, f:5}

Display keys and values from this dictionary numerically sorted by values.

Expected output:

$ ex_14.py
key = e value = 1
key = g value = 2
key = f value = 5
key = a value = 6
key = c value = 8
key = d value = 12
key = h value = 13
key = b value = 20

$


Exercise 15. Declare two dictionaries as follows:

food={'apple':'red', 'rice':'white', 'banana':'yellow', 'tomato':'red', 'spinach':'green'}

fruits={'plum':'red', 'banana':'yellow', 'blueberry':'blue', 'mulberry':'black', 'apple':'red', 'pear':'green'}

Find common keys in these dictionaries.

Expected output:

$ ex_15.py
apple
banana

$


Exercise 16. Declare two dictionaries as follows:

food={'apple':'red', 'rice':'white', 'banana':'yellow', 'tomato':'red', 'spinach':'green'}

fruits={'plum':'red', 'banana':'yellow', 'blueberry':'blue', 'mulberry':'black', 'apple':'red', 'pear':'green'}

Find keys that are present in dictionary %food and not present in dictionary %fruits.

Expected output:

$ ex_16.py
rice
tomato
spinach

$


Exercise 17. Write a program to read file /etc/passwd and prepare a dictionary as follows:

(a) keys of the dictionary would be the user names

(b) corresponding value would be the home directory of that user

Also, display contents of this dictionary.


Exercise 18. Write a program to read file /etc/passwd and prepare a dictionary as follows:

(a) keys of the dictionary would be the user names

(b) corresponding value would be another dictionary as follows:

i. key = uid value = user id of that user

ii. key = homedir value = home directory of that user

iii. key = shell value = default shell of that user

Also, display contents of this dictionary.


Exercise 19. Write a program to find and print the longest word in a text file.

Expected output:

$ python find_longest_word.py ringfile.txt
Elven-kings
Dwarf-lords

$


Exercise 20. Write a function that simulates the roll of a dice. That is, it generates a random number between 1 and 6.


Exercise 21. Accept a filename as command line argument. Display the word that occurs most in it. Also display the number of occurrences of that word.

Expected output:

$ python ex_21.py ringfile.txt
highest occurance: the (9)

$


Exercise 22. Accept a directory path as command line argument. In the given directory, find the file having oldest modification time. Display the file name along with how many days ago it was modified.

Expected output:

$ ex_22.py /foo/
/foo/ does not exist
$
$ ex_22.py /etc/passwd
/etc/passwd is not a directory
$
$ ex_22.py /etc/
motd was modified 167 days ago

$


Exercise 23. Write a python program to validate a PAN codes.

Get PAN codes to be validated on the command line. There could be more than one PAN codes provided.

$ ./pan_code_validator.py BETPK1234M

$ ./pan_code_validator.py CRLCT3456G H2YPJ5678L

Q. What is a PAN number?

A. Please see https://en.wikipedia.org/wiki/Permanent_account_number

No bonus points if you could write a solution within five minutes. A comprehensive solution is better than a fast and clumsy one.


Exercise 24. Accept a filename as command line argument. Assuming that the file passed is a .json file, read its contents and display them.

Before reading the file, make sure that:

(a) it exists

(c) it is readable

(c) it is a simple file (not a directory or a device file)

(d) if the file is not a well formatted .json, display an appropriate error and exit.


Exercise 25. Write a python program to connect to a remote host using the SSH protocol, run ls command, and show the output.


Exercise 26. Write a python program to create and delete directories repeatedly, starting from a given directory path. Obviously, the program should create directories before they are be deleted. This program would be useful for checking certain features of filesystem, and also NFS and CIFS shared drives.

Arguments to this program should be as listed below.

1. directory path - starting point for creating / deleting directories

2. (optional) duration - run for how much duration, in seconds. Default : 300 seconds (5 minutes)

3. (optional) dontstop - run continuously till you stop the script by sending a signal using ctrl + c. This option should override duration.

4. (optional) dirprefix - string to be used while naming directories

Sub-directories in each directory should be named as dir1, dir2, dir3 and so on. If an optional argument dirprefix is mentioned, the string supplied along with it should be used while naming sub-directories.

Obviously, this program should not delete a directory that does not exist. Also, a thread / process should not create a directory which is already created / being created by another thread / process.


Exercise 27. Write a python program to convert English text to Morse code and vice versa. Or better yet, write two separate python programs, one to convert English text to its equivalent Morse code, and another to convert Morse code to its equivalent English text.

Get names of two files on command line, one for input and one for output, as shown below.

$ ./english_to_morse.py --infile notes.txt --outfile notes_in_morse_code.txt

$ ./morse_to_english.py --infile notes_in_morse_code.txt --outfile notes.txt

Read the English text / Morse code from the input file and write the converted Morse code / English text to the output file.

Q. What is Morse code?

A. See http://en.wikipedia.org/wiki/Morse_Code

Q. Which ASCII characters are considered valid for converting to Morse code?

A. Let's refer to http://en.wikipedia.org/wiki/Morse_Code#Letters.2C_numbers.2C_punctuation

Q. Do you know a web site that does this conversion?

A. There are many. Here's one - http://www.onlineconversion.com/morse_code.htm

Q. What would be the criteria to evaluate my solution?

A. No bonus points if you could write a solution within 10 minutes. A comprehensive solution is better than a fast and clumsy one.

You'll surely get bonus points if you could use some python package rather than writing all of the code yourself from scratch.


Tuesday, February 20, 2024

The fsck of IBM Storage Scale

What is fsck
Wikipedia says : The system utility fsck (file system consistency check) is a tool for checking the consistency of a file system in Unix and Unix-like operating systems, such as Linux, macOS, and FreeBSD.


A file system is a method of storing, organizing, and managing the data in the available storage medium.
File system consistency refers to the correctness and validity of a file system.  Faults in a file system are usually caused by power failures, hardware failures, or improper shutdown of the system.

All file systems have their own ways of storing the data structures stored on the storage medium.  So fsck works with the data structures on disk.  Thus fsck is always available for a specific file system.  In other words, every file system must have its own fsck program.

Before running fsck to check the health of a file system, the file system must be unmounted.


Why is fsck needed

Considering the reliability of the hardware available these days and the robustness of the software, it is rare for a file system to have data corruptions. However, if in case data corruption happends in a file system, it needs to be detected and repaired.

In case of a power failure, the server may not shutdown correctly.  In this case, the data in the memory may not get written to the disk.  This creates inconsistency.  Such inconsistencies, if not corrected, may create further trouble later.

In rare cases, disks have bad sectors. The disks with bad sectors need to be replaced. Then fsck needs to be run to ensure data integrity.

If the applications are reporting input/output errors when accessing or storing data, the file system may have inconsistencies and running fsck is needed in such cases.



What is IBM Storage Scale
IBM's clustered file system that provides parallel data access from multiple nodes is branded as Storage Scale.

To know more about Storage Scale, please visit https://www.ibm.com/docs/en/storage-scale/5.1.8?topic=overview-storage-scale

Here is a brief overview of IBM Storage Scale.
Storage Scale is a file system. But not an ordinary file system running locally on a single computer. Storage Scale runs on multiple computers. These computers together make a cluster. They are known as the nodes of the cluster.  Some of the nodes are arranged access to a storage that is present in the network. The storage is available to the nodes in the form of Network Shared Disks (NSDs). The available NSDs are used to create file system.  Customers use the file system to store and access data, via NFS, CIFS, or object protocols.

Storage Scale provides concurrent high-speed file access. Applications that access the data may be running on multiple systems and accessing the data in parallel.

A storage Scale cluster may consist of 1 to 56 nodes.  The nodes could be assigned the roles of quorum nodes and data nodes.  Rather than entering into the details of Storage Scale, let's focus on our agenda - fsck.



The fsck of IBM Storage Scale
As mentioned earlier, the fsck tool is always specific to a particular file system. For Storage Scale, IBM's engineers have written their own fsck program which is specific to Storage Scale.  It is named mmfsck.  All commands of Storage Scale begin with mm.  If you wonder why, mm stands for multi media.  This file system was developed 25 years ago.  In those days, having a big storage was a super luxury. Multi media was an emerging technology. And the storage required for multi media was supposed to be in huge quantity. As per the trend, all commands of the new file system were named with mm.

When IBM engineers developed this new file system, they developed the mmfsck tool as well. The file system has to be unmouted before running mmfsck on it. Are you thinking "this is a limitation"?  Well, although unmouting the file system for running fsck is a necessity, but some of us do think differently.  Why do I must have a downtime for running mmfsck? Why can't it be done while the file system is online and in use?  We took this thought forward and developed another variant of fsck which does not require the file system to be unmounted. It works while the file system is mounted and in use. We named it mmfsckx.  The x stands for eXtended.

So for Storage Scale now we have two variants of fsck.  mmfsck which requires the file system to be unmounted and mmfsckx which works while the file system is mounted and in use.  Typically we refer mmfsck as offline fsck and mmfsckx as online fsck.

Although the user may not be aware of this, the fsck is a separate program from the file system kernel code. Their intentions are also different. The fsck analyzes the file system’s metadata for the purpose of performing repairs. The kernel code manages the file system’s operations during normal usage.


Is Storage Scale the only file system that has online fsck?  Obviously not.  For example, see XFS Online Fsck Design and BeeGFS File System Check


What next?
Okay, you have a niche file system that allows to run fsck without unmounting, that means without a downtime involved. So can you improve what you already have?  Can you go beyond?  When this thought came to my mind, here are the subsequent thoughts that followed.

1. fsck for filesets
Say we have a huge file system that is being used in a multi-tenant cloud environment.  In some cases, we have a separate fileset for every customer, or "fileset-based multi-tenancy".  Filesets is a method of dividing the file system to have separate access and administrative operations.

Running online fsck on a huge file system would take a long time. It could take be a few days, depending on the amount of data present in the file system. And if a problem is reported by a particular customer, then we would know the fileset in question.  A fileset is a part of the file system that could be used separately.  So why not run online fsck on a single fileset or multiple filesets, rather than on the entire file system?  Of course, running online fsck on a fileset may not detect all issues. But it is a good start. If we could detect and repair all issues in the fileset, then we would avoid to run online fsck on the entire file system.  That would be a big bonus. If we could defect some of the issues in a fileset, I'd say that is still a battle half won. So running online fsck on filesets is a useful functionality we'd want to have.
To know more about filesets, please visit https://www.ibm.com/docs/en/storage-scale/5.1.8?topic=scale-filesets

2. Self healing filesystems
If some part of a disk goes bad, the data corruption may not be noticed immediately.  Detecting such data corruption much later may lead to unwanted consequences. So what if these situations could be avoided proactively?  Is there a way?  What if we have a program that identifies that the system is now idle and runs online fsck during the idle interval?  So the data corruptions are detected and repaired before they are noticed by the users of the data.  A self healing file system is the true masterpiece we'd want to have.

3. Self healing filesets
As mentioned earlier, running online fsck on a huge file system would take a long time, depending how much is the data stored in the file system.  So if we could have the online fsck to run on a single fileset or multiple filesets, then we could use that feature during the idle intervals of the Storage Scale cluster.  So by combining the two features together, we would have self healing filesets.  Is there a cloud that has already implemented this?  If you know, you tell me  :-)

4. Performance
When we think about performance, there are two aspects to consider :
(a) For a given file system, how much time is taken for running offline fsck vs online fsck
(b) What can be done to improve the performance of running online fsck
Let's consider these one at a time.

(a) For a given file system, how much time is taken for running offline fsck vs online fsck
When we run offline fsck, the file system is unmounted and so there is no IO workload.  So load on the worker nodes is low.  While running online fsck, the file system is mounted and IO workload is in progress.  So load on the worker nodes may be high, depending on the IO workload.  Considering these situations, time required for running offline fsck is usually much less than the time required running online fsck.  In theory, we all would agree to this.  The results obtained during functional testing indicate that the time difference is not much when the amount of data in the file system is of small quantity.  The more the amount of data in the file system, the more is the time difference.  When the file system contained enormous amount of data, say in quantities of petabytes, then the time difference is huge.  In one particular case the online fsck took 10 times more time than offline fsck.

(b) What could be done to improve the performance of running online fsck
By default, online fsck uses all available nodes of the cluster to do the work.  The total work is divided in portions. Each node does some part of the work.  The file system manager node manages the distribution of work.
Not all nodes of the cluster would have the same amount of memory and processing power.  Also IO workload may not be the same on all nodes of the cluster.  So if there are say 15 worker nodes, making 15 portions of the entire work and allocating one portion to each node may not be the efficient strategy.

Depending on the available memory, available processing power, and load on each node, certain amount of work could be allocated for each node which would be appropriate to distribute the work evenly amongst all available nodes.  Moreover, if the execution is taking huge amount of time, then the amount IO workload on the nodes may vary at later times.  So the original calculation of evenly distributing the work may not remain to be the most efficient at a later time.  In such cases, a recalculation to evenly distribute the remaining work between the nodes would be beneficial.

Another simpler strategy is also possible. Not all nodes would finish their portion of the work at the same time.  Some may finish earlier than others, depending on the IO workload. The nodes which complete their portion of the work may be allocated a smaller portion of the remaining work. This redistribution of work would help to complete the entire work in lesser time.

There could be more ways to improve performance of online fsck that I could not list here.  Some food for thought for the reader.



References
1. https://en.wikipedia.org/wiki/Fsck
2. https://lwn.net/Articles/248180/
3. https://www.adminschoice.com/repairing-unix-file-system-fsck
4. https://linux.die.net/man/8/fsck
5. https://www.ibm.com/docs/en/aix/7.3?topic=f-fsck-command
6. https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=reference-mmfsckx-command