Nooooootes: Difference between revisions
Chen Junyu (talk | contribs) (Created page with "==generate sound by python == <source lang="python"> import wave, struct filename = "output.wav" nframes=0 nchannels=1 sampwidth=2 # in bytes so 2=16bit, 1=8bit framerate=441...") |
Chen Junyu (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
== BIT == | |||
binary 二进制的,二元的 digit 数字 | |||
Binary digit bit is general tool of python to open files,count... | |||
== Bit shift == | |||
The simple act of opening an audio file in a text editor (or the reverse of opening a text as a digital audio) raises many questions. What exactly '''is''' a digital representation of a text, or a sound. The fact that once digital, bits can be as easily interpreted as text or as sound, or image, or other kind of data is at once fascinating, but also deceptively reinforces an idea of multimedia as inherantly bridging and mixing together media. Media formats, and their underlying digital representations, are highly specialized codes (algorithmic, legal) involved in the related processes of encoding and decoding. | |||
A bit shift is the process by which the bits of some data value are shifted in position either "left" or "right" (that is away from or towards the "least significant" bit position). In a simple numerical representation of integers such a shift corresponds to multiplication and division by a power of two. This as a result of the design and working of the system of binary representation, with each column defined to represent powers of 2. The same operation performed on the characters of a text represented as ASCII code values would produce a much different result as the system of representation is structured very differently (with groupings of characters organized not so much by numeric relations, but by clusters of associated symbols and the conventions of the alphabet). | |||
== Raw media == | |||
We can use python to generate the "raw" bits of say an audio waveform or a bitmap image. Certain formats, such as audio WAV files or bitmap formats like TGA and BMP (when uncompressed) are easy to generate and manipulate with generic tools like python because the formats are often mostly "raw" sample or pixel data with a short preceding "header" that declares some key properties about the file (such as sampling rate, or image size). | |||
== Generating "Raw" Audio == | |||
{{:Raw audio}} | |||
==generate sound by python == | ==generate sound by python == | ||
<source lang="python"> | <source lang="python"> |
Revision as of 12:05, 14 October 2013
BIT
binary 二进制的,二元的 digit 数字 Binary digit bit is general tool of python to open files,count...
Bit shift
The simple act of opening an audio file in a text editor (or the reverse of opening a text as a digital audio) raises many questions. What exactly is a digital representation of a text, or a sound. The fact that once digital, bits can be as easily interpreted as text or as sound, or image, or other kind of data is at once fascinating, but also deceptively reinforces an idea of multimedia as inherantly bridging and mixing together media. Media formats, and their underlying digital representations, are highly specialized codes (algorithmic, legal) involved in the related processes of encoding and decoding.
A bit shift is the process by which the bits of some data value are shifted in position either "left" or "right" (that is away from or towards the "least significant" bit position). In a simple numerical representation of integers such a shift corresponds to multiplication and division by a power of two. This as a result of the design and working of the system of binary representation, with each column defined to represent powers of 2. The same operation performed on the characters of a text represented as ASCII code values would produce a much different result as the system of representation is structured very differently (with groupings of characters organized not so much by numeric relations, but by clusters of associated symbols and the conventions of the alphabet).
Raw media
We can use python to generate the "raw" bits of say an audio waveform or a bitmap image. Certain formats, such as audio WAV files or bitmap formats like TGA and BMP (when uncompressed) are easy to generate and manipulate with generic tools like python because the formats are often mostly "raw" sample or pixel data with a short preceding "header" that declares some key properties about the file (such as sampling rate, or image size).
Generating "Raw" Audio
RAW Audio is an audio file format for storing uncompressed audio in raw form. Comparable to WAV or AIFF in size, RAW Audio file does not include any header information (sampling rate, bit depth, endian, or number of channels)
convert an mp3 to raw
Using
- ffmpeg: " A complete, cross-platform solution to record, convert and stream audio and video."
- Sox: the Swiss Army knife of sound processing programs
Find out the properties of the audio file with ffmpeg
ffmpeg -i file.mp3 sox -V file.wav
Sox has no handler for mp3 formate, hence you have need to convert from mp3 to wav, ffmpeg does it quickly and well
ffmpeg -i file.mp3 -acodec pcm_s16le -ar 44100 file.wav
-acodec pcm_s16le
: audio codec is PCM signed 16bit-ar 44100
: audio rate 44.1kHz
Sox: display information on the wav file
sox -V file.wav
Sox: convert the wav file to raw
sox file.wav -b 16 -c 1 -r 44100 -t raw file.raw
-b 16
bit-rate-c 1
number of channels-r 44100
sampling rate-t raw
target raw
play (sox): play back the raw file at same bit-rate and sample-rate
play -b 16 --endian little -e signed -r 44100 -c 1 Zong3.raw
--endian little
: "specify whether the byte-order of the audio data is, respectively, `little endian'... Endianness applies only to data encoded as floating-point, or as signed or unsigned integers of 16 or more bits. It is often necessary to specify one of these options for headerless files"-e signed
encoding. Signed is commonly used with a 16 or 24 -bit encoding size.
Same thing but with aplay:
aplay -f S16_LE -c 1 -r 44100 Zong3.raw
-f
sample forma+bit rate+endian:- 8bit: S8 U8
- 16bit: S16_LE S16_BE U16_LE U16_BE
- 24bit: S24_LE S24_BE U24_LE U24_BE
- 32bit: S32_LE S32_BE U32_LE U32_BE
-c
: channel number-r
: sample rate. 4000 Hz is the minimum allowed by aplay. 44100Hz is the default sample rate for CDs
convert image to raw audio
copy the image to another file, with the extension raw
cp img.png img.raw
play: use sox previously use command to play that audio
play -b 16 --endian big -e signed -r 44100 -c 1 img.raw
- lowering the sampling rate
-r
will slow-down the playback
Save the result of playing the raw file into audio file with a head (wav)
sox -r 44100 -b 16 -c 1 -e signed img.raw img.wav
Notice that the same audio parameters need to be declared: sample-rate, bit-rate, channel number, encoding
play: without having to declare any parameters
play img.wav
Links
- Raw Audio File Formats Information (at the Wayback Machine)
- SOX Conversions, Raw Files, Splitting And Merging Channels
- Audio Conversion Cheat Sheet
- [SoX] in the Cookbook
generate sound by python
import wave, struct
filename = "output.wav"
nframes=0
nchannels=1
sampwidth=2 # in bytes so 2=16bit, 1=8bit
framerate=44100
bufsize=2048
w = wave.open(filename, 'w')
w.setparams((nchannels, sampwidth, framerate, nframes, 'NONE', 'not compressed'))
max_amplitude = float(int((2 ** (sampwidth * 8)) / 2) - 1)
# split the samples into chunks (to reduce memory consumption and improve performance)
#for chunk in grouper(bufsize, samples):
# frames = ''.join(''.join(struct.pack('h', int(max_amplitude * sample)) for sample in channels) for channels in chunk if channels is not None)
# w.writeframesraw(frames)
freq = 440
# this means that FREQ times a second, we need to complete a cycle
# there are FRAMERATE samples per second
# so FRAMERATE / FREQ = CYCLE LENGTH
cycle = framerate / freq
data = ''
for i in range(10): #正负波重复的次数
for x in range(100):
data += struct.pack('h', int(0.5 * max_amplitude))
for x in range(100):
data += struct.pack('h', int(-0.5 * max_amplitude))
w.writeframesraw(data)
w.close()
Generating "Raw" Images
Raw: This is the simplest of all ways to store images, just as "raw" bytes. For example one byte per pixel for grey scale or 3 bytes per pixel for RGB colour. There is no standard header and so even the size of the image needs to be specified for programs that might read the image. [1]
Imagemagick can convert a bit-map file to a raw .dat – a not specific file type
convert image.png image.dat
And, it can also convert the .dat back into a bit-map
Andre Castro (talk) 10:14, 29 January 2018 (CET) if you run:
identify image.dat img.dat PNG 2102x2799 2102x2799+0+0 8-bit RGB 256c 33.6KB 0.000u 0:00.000
Hence the image still with PNG headers
If you stick to the size, bit-depth and color profile of the original, the conversion will "preserve" the original
convert -depth 8 -size 2102x2799 rgb:image.dat image2.png
If you change those parameters, it is likely that the resulting bitmap will be something else!
convert -depth 8 -size 1000x1000 rgb:image.dat image2.png
convert -depth 24 -size 1000x1000 rgb:image.dat image2.png
convert -depth 8 -size 1000x1000 gray:image.dat image2.png
What we are doing here is, not unlike playing back audio files at bit-rate and sample-rates, different from the ones they were encoded in.
But we can convert from a raw audio to an image file with
convert -depth 8 -size 1000x1000 rgb:Zong3.raw Zong3.jpg
or even an animated gif
convert -depth 8 -size 1000x1000 rgb:Zong3.raw Zong3.jpg
What gets lost in a conversion ??
...
code experiments with Raw images
With a small amount of code, it's easy to dump out a stream of data as bytes:
# raw.py
import struct, sys
out = open("image.data", "wb")
for x in range(100):
out.write(struct.pack('B', 255))
out.write(struct.pack('B', 0))
out.write(struct.pack('B', 0))
out.write(struct.pack('B', 128))
The "wb" option in the open command means that we want to "write" "binary" data (the default behaviour of read would be to read a text file).
When you run the script:
python raw.py
In Python, struct.pack is a way of converting a number (from 0 to 255) into it's corresponding binary representation as a "byte" or 8 bits (where 0 is all bits off 00000000 and 255 is all bits on 11111111). The code above loops 100 times outputting the bits of the sequence 255, 0, 0, 128.
255 0 0 128 255 0 0 128 255 0 0 128 255 0 0 128...
We can then use an image application like the GIMP to interpret this raw data as an image... In this case we tell the GIMP that it should interpret the bytes as being RGBA formatted pixels in the size 10 x 10:
It then interprets the numbers as a stream of pixels in the form:
RED GREEN BLUE ALPHA, RED GREEN BLUE ALPHA, ...
So for the code above:
255 (full) RED, 0 GREEN, 0 BLUE, 128 (half) ALPHA, ...
The result is an image 10 pixels wide by 10 pixels tall, where every pixel is red with 50% transparency.
Put a header on it
Working with raw data file can be inconvenient however since everytime you want to view the data as an image, you need to explicitly tell an application (such as GIMP) what the size and format of the image is. We can improve the situation by attaching preceding the raw data with a simple "header" to and follow the guidelines (which specify the order of the bytes) of a specific simple image format.
Targa is an early very simple format for images. It comes from an early manufacturer of video display cards, named Targa, who created a minimal format for files to display on their hardware. The format is still popular today in Game development and other communities for whom the simplicity of the format is useful for "wrapping" raw image data with a header so that it's self-contained and directly loadable by different programs without needing to explicitly specify information width and height and bit depth.
import struct, sys
out = open("image.tga", "wb")
width = 320
height = 240
header = struct.pack("<BBBHHBHHHHBB",0,0,2,0,0,8,0,0,width,height,32,1<<5)
out.write(header)
for y in xrange(height):
for x in xrange(width):
r = 0
g = 0
b = 0
a = 128
if y < 32:
r = 255
a = 255
if x > 64 and x < 256:
g = 255
if y > 120:
r = 128
out.write(struct.pack('B', b))
out.write(struct.pack('B', g))
out.write(struct.pack('B', r))
out.write(struct.pack('B', a))
This script outputs a TGA format, which has been opened in the GIMP and exported to PNG
Next: Raw image sequence
Resources
- This example is based on this Raw image example (in Portugese)
references
- ↑ Bourke, Paul. n.d. ‘A Beginners Guide to Bitmaps’. Accessed 2 January 2018. http://paulbourke.net/dataformats/bitmaps/.