Dr. Greg Bernstein

October 12th, 2021

From NCyTE

- Students will be able to explain the use of hash functions in securing information.
- Students will be able to describe the basic requirements for a cryptographic hash function.
- Students will be able to identify commonly used algorithms for hashing.
- Students will be able to illustrate how hashing works using online tools.

- Understand the difference between cryptographic and “regular” hash functions
- Understand HMACs and HOTPs
- Understand hash functions use key derivation functions and the appropriate types of different algorithms.
- Understand the difference between the various types of random numbers

From Stallings, *Cryptography and Network Security*, 2003

**Content modifications**: changes to contents including*insertions*,*deletion*,*transposition*,*modification***Sequence Modification**: changes to a sequence of messages including**reordering**, insertion and deletion.**Timing Modification**:*delay*or**replay**of messages.

From NCyTE

**Confidentiality**:*Encryption algorithms***Integrity**: (Cryptographic)*Hash functions***Authenticity**:*Digital signatures*,*digital certificates***Non-repudiation**:*Digital signatures*,*digital certificates*

From NCyTE

From NCyTE

Why not use encryption (symmetric or asymmetric)?

- If I encrypt my message and someone tries to modify it, it won’t decrypt properly!
- To prevent sequence modification put a sequence number in the plaintext
- To prevent replay (timing) based attacks put a timestamp in the plaintext

Encryption can be costly in terms of computational power, particularly public key based encryption

Not all data is “confidential”, for example open source software, but integrity is critical, i.e., executable files!!!

Some data doesn’t have sufficient structure to tell if decrypted data is “incorrect”

However if you need to encrypt there are special block cipher modes

Authenticated encryption with additional data (AEAD) modes “A number of modes of operation have been designed to combine secrecy and authentication in a single cryptographic primitive.”

**Galois/counter mode (GCM)**combines the well-known counter mode of encryption with the new Galois mode of authentication.**Counter with cipher block chaining message authentication code (CCM)**is an authenticated encryption algorithm.

Both these modes are used in TLS 1.3

From NCyTE

A (cryptographic)

**hash function**maps digital data of arbitrary size to digital data of fixed size. The hash is sometimes called a**message digest**.A cryptographic hash function is a hash function that is considered practically

*impossible to invert*(one-wayness) or*find collisions*(i.e. two messages with the same hash value).

From NCyTE

From NCyTE

**Variable input/ Fixed output size**: Can be applied to data of practically any size but the output is always a fixed number of bits.**Pre-image resistant**: Computationally infeasible to find the input value for a given hash value (one-wayness).

From NCyTE

**Collision resistant**: Computationally infeasible to find two inputs that have the same hash value.**Efficiency**: Easy (fast) to compute so practical on hardware and software**Pseudorandomness**: The outputs pass tests designed to detect pseudorandomness.

From NCyTE

**Digital Signatures**: When you sign messages digitally, the hash value of the message is encrypted instead of the message itself. This allows messages of arbitrary lengths to be signed.**Password Files**: Hashes of passwords are stored, not the passwords themselves. Because no one can see the plain password, this provides an extra layer of security in case the password file is stolen.

From NCyTE

**Intrusion/Virus Detection**: A change in the hash value of a file may indicate an intrusion or a virus.**Construction of a pseudorandom number generator**: One of the required properties of a cryptographic function is that the output has to pass pseudorandomness tests.**File synchronization**: Whether to upload a file or not for synchronization (for example with the cloud storage) can be determined by checking the hash value of the file has changed or not since the last update

Is an important application that you use missing from the list?

Can you name a file synchronization program that uses (optionally) hashes?

From NCyTE

**SHA-1**– Designed by NSA- Published by NIST in 1993 as Federal Info. Processing Standard
- Theoretical attacks developed for SHA1 in 2005 suggested the algorithm may not be secure enough for ongoing use.

**SHA-2**– Also designed by NSA- Published by NIST in 2001 as Federal Info. Processing Standard
- Includes six hash functions with digests that are 224, 256, 384 or 512 bits.

From NCyTE

**SHA-3**– Designed by Bertoni, Daemen,Peeters, Van Assche- Published by NIST in 2015 as the new standard
- Not meant to replace SHA-2 as SHA-2 has not been broken

**MD4**– Designed by Rivest in 1990- 128 bit digests; used in TLS certificates
- Practical collision attacks were developed against it

**MD5**– Similar to MD4;- security severely compromised, so not suitable for cryptographic use.

From Wikipedia: Crypto Hash: A cryptographic hash function (specifically SHA-1) at work. A small change in the input (in the word “over”) drastically changes the output (digest).

Small change to a short message

```
from cryptography.hazmat.primitives import hashes
digest = hashes.Hash(hashes.SHA256())
digest.update(b"This is my message. It's not fancy, but it is only a test.")
digest_bytes1 = digest.finalize()
print(digest_bytes1.hex())
# d7bd857b3044c451e9492388c09c713fb131c2fcd58541376e220dc913e6d3dc
digest = hashes.Hash(hashes.SHA256())
digest.update(b"This is my message. It's not fancy, but it is only a test!")
digest_bytes2 = digest.finalize()
print(digest_bytes2.hex())
# 6373b945dc1d5616d1d06193752594a89e4a8b7b020f783b6d8b7c04f9d4571d
```

Hash a long book

```
sherlock = open("sherlockHolmes.txt", "r", encoding="utf8")
book = sherlock.read()
print(f"This book has {len(book)} characters")
digest = hashes.Hash(hashes.SHA256())
digest.update(book.encode("utf-8")) # encode turns the characters into bytes
digest_bytes_book = digest.finalize()
print(f"Hash: {digest_bytes_book.hex()}")
```

Small change to a long book

```
import random
index = random.randint(0, len(book)) # Choose a character to change
mod_book = book[0:index-1] + "x" + book[index:] # change to an "x"
print(f"The modified book has {len(mod_book)} characters")
digest = hashes.Hash(hashes.SHA256())
digest.update(mod_book.encode("utf-8"))
digest_bytes_book2 = digest.finalize()
print(f"Hash: {digest_bytes_book2.hex()}")
```

Python Cryptography supports the following modern hashes

- SHA-2 family
- SHA-3 family
- BLAKE2 avoids length extension attacks and is used in a number of other algorithms including
*Argon2*.

From CryptToy

- Jupyter Notebook (Python) RFC2104 implementing RFC 2104, HMAC Algorithm.
- Jupyter Notebook (Python) RFC4226 implementing RFC 4226, HOTP Algorithm
- Jupyter Notebook (Python) RFC6238 implementing RFC 6238, TOTP Algorithm

From Wikipedia: HMAC

HMAC (hash-based message authentication code) is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. …it may be used to simultaneously verify both the data integrity and the authenticity of a message. …HMAC can provide message authentication using a shared secret *instead* of using digital signatures

From Wikipedia: HMAC

- H is a cryptographic hash function
- m is the message to be authenticated

From Wikipedia: HMAC

*K*is the secret key*K’*is a block-sized key derived from the secret key, K; either by padding to the right with 0s up to the block size, or by hashing down to less than or equal to the block size first and then padding to the right with zeros- ‖ denotes concatenation

From Wikipedia: HMAC

- ⊕ denotes bitwise exclusive or (XOR)
*opad*is the block-sized outer padding, consisting of repeated bytes valued 0x5c (“01011100”)*ipad*is the block-sized inner padding, consisting of repeated bytes valued 0x36 (“00110110”)

- You can use any different hash function
- The hash function gets applied twice (inner and outer)
- The key gets mixed with the
*ipad*and*opad*for each use - The message gets concatenated with a key mix, then the hash of the message gets concatenated with a key mix.

**Something you have**

- We’ve shared a long, complicated secret key!
- Prove it to me!
- Here it is…
**whoops**, now its not secret anymore…

- Maybe I should just share a hash of the key, but that doesn’t change…
- How about I apply an HMAC with the key to a counter? (HOTP)
- How about I apply an HMAC with the key to the current time? (TOTP)
- These always change and protocol doesn’t allow “token” to be used again to prevent
**replay**attacks.

From Wikipedia: HMAC-based one-time password (HOTP)

HMAC-based one-time password (HOTP) is a one-time password (OTP) algorithm based on hash-based message authentication codes (HMAC). It is a cornerstone of the Initiative for Open Authentication (OATH). HOTP was published as an informational IETF RFC 4226 in December 2005… The HOTP algorithm is a freely available open standard.

From Wikipedia: HMAC-based one-time password (HOTP)

- A cryptographic hash method,
*H*(default is SHA-1) - A secret key,
*K*, which is an arbitrary byte string, and must remain private - A counter,
*C*, which counts the number of iterations - A HOTP value length,
*d*(6–10, default is 6, and 6–8 is recommended)*

From Wikipedia: HMAC-based one-time password (HOTP)

Both parties compute the HOTP value derived from the secret key *K* and the counter *C*. Then the **authenticator** checks its locally-generated value against the value supplied by the **authenticated**.

Contains the URI: otpauth://totp/Secure%20App:alice%40google.com?secret=JBSWY3DPEHPK3PXP&issuer=Secure%20App

- Format Key Uri Format, key is in Base32 encoding
- Wikipedia:QR Codes for the special URI
- The Base16, Base32, and Base64 Data Encodings

From Wikipedia: Time-based One-Time Password

Time-based One-time Password (TOTP) is a computer algorithm that generates a one-time password (OTP) which uses the **current time** as a source of uniqueness. As an extension of the HMAC-based One-time Password algorithm (HOTP), it has been adopted as Internet Engineering Task Force (IETF) standard RFC 6238.

From OpenSSL

OpenSSL is a robust, commercial-grade, and full-featured toolkit for the Transport Layer Security (TLS) and Secure Sockets Layer (SSL) protocols. It is also a general-purpose cryptography library.

Documentation: OpenSSL man pagess not very readable.

From LibreSSL

LibreSSL is a version of the TLS/crypto stack forked from OpenSSL in 2014, with goals of modernizing the codebase, improving security, and applying best practice development processes.

From LibreSSL – LibreSSL releases contain several parts:

*libcrypto*: a library of cryptography fundamentals*libssl*: a TLS library*libtls*: a new TLS library, designed to make it easier to write foolproof applications- Various utilities such as
*openssl(1)*,*nc(1)*, and*ocspcheck(8)*.

Isn’t

*SSL*out of date and superceded by TLS?**Yes**Why are do we care about

*OpenSSL*and*LibreSSL*? These are just the names of the projects they both support the latest TLS1.3 standards and**modern crypto algorithms**.Both include the

*openssl***command**which gets us direct access to many cypto algorithmsOpenSSH uses

*LibreSSL*so you probably have the*openssl*command on your system.

```
openssl help
--- A bunch of other info but this is what we are interested in
Message Digest commands (see the `dgst' command for more details)
blake2b512 blake2s256 gost md4
md5 mdc2 rmd160 sha1
sha224 sha256 sha3-224 sha3-256
sha3-384 sha3-512 sha384 sha512
sha512-224 sha512-256 shake128 shake256
sm3
```

Compute SHA3-256 and SHA512 Hashes

```
greg@DESKTOP-71U86TK MINGW64 /c/Greg/Customers/CSUEB/CS671/Crypto
$ openssl sha3-256 IntroToCrypto.pdf
SHA3-256(IntroToCrypto.pdf)= e63298e99eb4fe0f99009b9ef5d885cf310bd62f9772e29844b3b76d22e9a86c
(base)
greg@DESKTOP-71U86TK MINGW64 /c/Greg/Customers/CSUEB/CS671/Crypto
$ openssl sha512 IntroToCrypto.pdf
SHA512(IntroToCrypto.pdf)= c13b1fd7c5072a92cb14295988cdbf08abe6d58f6e80987d5f853b7ef463e04aee5f19337c00a19487b143128515ac55c1dc54fb082b666c05a9aa315a00c59f
(base)
```

Pseudo Random Number Generators (PRNGs) – Used for simulation

Cryptographically Secure Psuedo Random Numbr Generators (CSPRNGs) –

True random numbers

From wikipedia: PRNG

A pseudorandom number generator (PRNG) is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generated sequence is not truly random, because it is completely determined by an initial value, called the PRNG’s seed.

From wikipedia: PRNG

- Shorter-than-expected periods for some seed states (such seed states may be called “weak” in this context);
- Lack of uniformity of distribution for large quantities of generated numbers;
- Correlation of successive values;
- Poor dimensional distribution of the output sequence;
- Distances between where certain values occur are distributed differently from those in a random sequence distribution.

Linear Congruential Generators (LCGs) Class of algorithms used extensively in computing, but have issues.

Mersenne Twister 1997 designed to correct many of the flaws in previous generators. Almost all modern scientific software/libraries support this.

List of random number generators Good resource

From Python Random Module

A cryptographically secure pseudorandom number generator (CSPRNG) is a pseudorandom number generator (PRNG) with properties that make it suitable for use in cryptography.

CSPRNG requirements fall into two groups: first, that they pass statistical randomness tests; and secondly, that they hold up well under serious attack, even when part of their initial or running state becomes available to an attacker.

“A secure block cipher can be converted into a CSPRNG by running it in counter mode”

NSA kleptographic backdoor in the Dual_EC_DRBG PRNG NSA got caught trying to insert a backdoor into a security standard.

From Hardware random number generator

In computing, a hardware random number generator (HRNG) or true random number generator (TRNG) is a device that generates random numbers from a physical process, rather than by means of an algorithm.

From Wikipedia: TPM

These are usually CSPRNGs that are started with some *hopefully* true random numbers “entropy”

Linux Myths about /dev/urandom Good explanations!

Python secrets, os.urandom

*High-speed harvesting of random numbers*, Ingo Fischer, Daniel J. Gauthier, Science 26 Feb 2021: Vol. 371, Issue 6532, pp. 889-890G. M. Bernstein and M. A. Lieberman,

*Secure Random Number Generation Using Chaotic Circuits*, IEEE Transactions on Circuits and Systems, Vol. 37, No. 9, September 1990.US Patent 5,007,087:

*Method and apparatus for generating secure random numbers using chaos*, Bernstein; Greg M., Lieberman; Michael A., April 9, 1991.

In cryptography, key stretching techniques are used to make a possibly weak key, typically a password or passphrase, more secure against a brute-force attack by increasing the resources (time and possibly space) it takes to test each possible key.

Make password encryption harder

Adaptive one-way functions compute a one-way (irreversible) transform. Each function allows configuration of ‘work factor’. Underlying mechanisms used to achieve irreversibility and govern work factors (such as time, space, and parallelism) vary between functions

**Argon2**is the winner of the password hashing competition and should be considered as your first choice for new applications**PBKDF2**when FIPS certification or enterprise support on many platforms is required;**scrypt**where resisting any/all hardware accelerated attacks is necessary but support isn’t.**bcrypt**where PBKDF2 or scrypt support is not available.

From Wikipedia: Key derivation function

In cryptography, a key derivation function (KDF) is a cryptographic hash function that derives one or more secret keys from a secret value such as a main key, a password, or a passphrase using a pseudorandom function.

The most common is HKDF. Not for passwords!