Skip to main content

· 3 min read

紀錄一下自己最近研讀虛擬化技術的筆記

因為最近要寫有關 qemu 的東西順便學習一下

在介紹之前讓我們先了解一下一些名詞

Hypervisor ( virtual machine monitor )

用來管理和執行 virtual machine 的軟體

Host machine

用來執行 hypervisor 的機器

Guest machine

被 hypervisor 管理的機器

比如說你在你的電腦上使用 Virtual Box 開了兩個虛擬機,一個是 Windows 10,一個是 Ubuntu

那 Virtual Box 就是 hypervisor,你的電腦就是 Host machine,而 Windows 10 和 Ubuntu 就是 Guest machine

我們通常會將 Guest machine 稱為 instance,instance 可以是不同的作業系統

那跟這個對比的就是 Operating-system-level virtualization,在這通常會將 Guest machine 稱作是 container,container 只能是同一種的作業系統,但是 user space 可以不一樣,像是不同的 Linux distribution 用的是同一個 kernel

那現在正式開始介紹 hypervisor 的類型

Hypervisor Type

  • Type-1, native or bare-metal hypervisor

這種類型的 hypervisor 可以在硬體上執行

Citrix XenServer, Microsoft Hyper-V and VMware ESX/ESXi 屬於這種

  • Type-2 or hosted hypervisors

Hypervisor 被當成 Host machine 的一隻 process 執行

VMware Workstation, VMware Player, VirtualBox, Parallels Desktop for Mac and QEMU 屬於這種

現在 Type-1 和 Type-2 兩者之間的界線越來越模糊,因為原本是 Type-2 的 virtual machine 鑒於效能考量紛紛跨入 Type-1,像是 Linux 的 KVM 就可以被歸類在兩種不同的 hypervisor

Virtual machine

我們所謂的 virtual machine 主要分為以下三種

Virtual machine type

  • System virtual machines ( Full virtualization )
  • Hardware-assisted virtualization
  • Operating-system-level virtualization

System virtual machines ( Full virtualization )

提供了虛擬化整個作業系統的功能

Parallels Workstation, Parallels Desktop for Mac, VirtualBox, Virtual Iron, Oracle VM, Virtual PC, Virtual Server, Hyper-V, VMware Workstation, VMware Server (discontinued, formerly called GSX Server), VMware ESXi, QEMU, Adeos, Mac-on-Linux

Hardware-assisted virtualization

透過硬體的支援,使得效率提高

KVM, VMware Workstation, VMware Fusion, Hyper-V, Windows Virtual PC, Xen, Parallels Desktop for Mac, Oracle VM Server for SPARC, VirtualBox and Parallels Workstation

Operating-system-level virtualization

這種技術是指 kernel 作業系統本身提供的虛擬化功能,像是 Linux container 簡稱為 LXC,Docker 在 0.9 版前其實就是 LXC 的前端

Reference:

· One min read

function f(s) { var a = (s == undefined) ? 'qq' : s; }

Can be rewrited as following code

function f(s)
{
var a = s || 'qq';
}

First statement will evaluate value of s is true or not.

If true, s is assigned to a.

Else, statement will find next value a.k.a. 'qq'.

'qq' will be assigned to a.

For further, you could directly write code like this in ES6.

function f(s = 'qq')
{
}

· 2 min read

allocate string by array or pointer

What's difference between them?

#include<stdio.h>

int main()
{
char a[] = "apple";
char *b = "apple";
}

Answer

    char a[] = "apple";

When string is allocated by array, all the characters are saved in the stack.

    char *b = "apple";

When string is allocated by pointer, only pointer is saved in the stack, and it points to the string, which is saved in the read-only section.

function argument array or pointer

What's difference between them?

#include<stdio.h>

void func1(char * s)
{
printf("%s",s);
}
void func2(char s[])
{
printf("%s",s);
}
int main()
{
func1("apple");
func2("apple");
return 0;
}

In book "The c programming language 2nd"

As formal parameters in a function definition,
char s[];
and
char *s;
are equivalent; we prefer the latter because it says more explicitly that the
parameter is a pointer

But in which situation, we prefer to use array argument? In one mail to linux kernel mentioned that

https://lkml.org/lkml/2015/9/3/499

The "array as function argument" syntax is occasionally useful
(particularly for the multi-dimensional array case), so I very much
understand why it exists, I just think that in the kernel we'd be
better off with the rule that it's against our coding practices.

sizeof is an operator

sizeof value is determined in compile time.

sizeof is only correctly used in two places.

array

char s[10];
printf("%zu",sizeof(s));
// 10

type

char* s="hello";
printf("%zu",sizeof(s));
printf("%zu",sizeof(char *));
/*
8
8
*/
// sizeof(variable) == sizeof(variable type)

https://en.wikipedia.org/wiki/Sizeof

Tags:

· One min read

While tracing malloc.c code, I found some interesting bitwise operation.

#define MALLOC_ALIGN_MASK      (MALLOC_ALIGNMENT - 1)
#define MINSIZE
(unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK))

I found the answer in the stackoverflow.

http://stackoverflow.com/questions/14561402/how-is-this-size-alignment-working

All powers of two (1, 2, 4, 8, 16, 32...) can be aligned by simple a and operation.

This gives the size rounded down:

size &= ~(alignment - 1);

or if you want to round up:

size = (size + alignment-1) & ~(alignment-1);

MINSIZE macro trying to find largest number alignment to MALLOC_ALIGNMENT

For example

MIN_CHUNK_SIZE = 25;
MALLOC_ALIGNMENT = 8
MALLOC_ALIGN_MASK = 8-1 = 7;
MINSIZE = (25+7) & ~7 = 32;

More

https://graphics.stanford.edu/\~seander/bithacks.html

· One min read

TODO list:

  • [O] K&R The C Programming Language (Second Edition)
  • Robert Love linux kernel development (Third Edition)
  • [O][sqlab website](http://sqlab.github.io/)

· 3 min read

網路上看到蠻多說法的像是 call by value vs call by reference 事實上 call by assignment 才是 python 傳遞參數的方式 主要要介紹 python 的 object 像是以下這行程式碼

a = 1

a 就是一個 reference 指向數值為 1 的 PyIntObject 就像是 c 裡面的指標指向一個變數一樣 繼續看一段程式碼

def fun(a):
print id(a)
a = 1
print id(a)
fun(a)
"""
11211096
11211096
"""

那這裡實際上內部是怎麼做的呢? 傳進去 fun function 裡的事實上是 copy of reference to PyIntObject 事實上和還沒傳進去的 a 是兩個不同的 reference 指向同一個 object id 這個函式 return 的是 object 的位置 所以兩個的值才會一樣 要是有興趣可以看 cpython 裡 id function 的實作

static PyObject *
builtin_id(PyModuleDef *self, PyObject *v)
/*[clinic end generated code: output=0aa640785f697f65 input=5a534136419631f4]*/
{
return PyLong_FromVoidPtr(v);
}

我們修改一下程式碼

def fun(a):
print id(a)
a += 1
print id(a)
a = 1
print id(a)
fun(a)
print id(a)
"""
41632088
41632088
41632064
41632088
"""

我們會發覺 id(a) 在 fun function 前後都不會變 這是因為傳進去 function 裡的是 copy of reference to object 所以不會影響到原本的 reference 但是在 fun function 裡的 a 經過 +=1 後 id 改變了 這是因為 int 是 immutable type 當你要嘗試改 immutable type 的 variable 的時候 他會先產生一個新的 object 再將你的 reference 指向那個 object 要是是mutable variable呢?

def fun(a):
print id(a)
a.append(1)
a = []
print id(a)
fun(a)
print id(a)
"""
[]
140347177386784
[1]
140347177386784
[1]
140347177386784
"""

執行後會發現 id 都不變 這是因為 mutable variable 的處理方式和 immutable 不同 他是將 reference 指到的 object 擴充而不是重新 assign 一個 那 mutable variable 的 assign 呢?

def fun(a):
print id(a)
a.append(1)
a = []
print id(a)
fun(a)
print id(a)
"""
[]
140318457238304
[1]
140318457251672
[]
140318457238304
"""

這裡 mutable variable 就和 immutable variable 一樣了 直接捨棄了原本的 object 指向一個新的 object 所以原本傳進去的也是 copy of reference to object 原本的 a 不會受到影響

Reference:

· 2 min read

image from qira.me website


  • QIRA is timeless debugger
  • Fullname is QEMU Interactive Runtime Analyser
  • QIRA was initially developed at Google by George Hotz. Work continues at CMU.

qira website

http://qira.me/

qira github repository

https://github.com/BinaryAnalysisPlatform/qira

Installation

cd ~/
git clone https://github.com/BinaryAnalysisPlatform/qira.git
cd qira/
./install.sh

If you want to run with other architecture, run the following command

It will fetch other architecture's library

./fetchlib.sh

Usage

Usage

cd ~/
wget http://train.cs.nctu.edu.tw/files/magic
chmod +x ./magic
qira -s ./magic

open other terminal and type

nc 0 4000

use this terminal to interactive with program

You could trace the instructions with web browser on http://localhost:3002/

Keyboard Shortcuts in web/client/controls.js

j -- next invocation of instruction
k -- prev invocation of instruction

shift-j -- next toucher of data
shift-k -- prev toucher of data

m -- go to return from current function
, -- go to start of current function

z -- zoom out max on vtimeline

l -- set iaddr to instruction at current clnum

left -- -1 fork
right -- +1 fork
up -- -1 clnum
down -- +1 clnum

esc -- back

shift-c -- clear all forks

n -- rename instruction
shift-n -- rename data
; -- add comment at instruction
shift-; -- add comment at data

g -- go to change, address, or name
space -- toggle flat/function view

p -- analyze function at iaddr
c -- make code at iaddr, one instruction
a -- make ascii at iaddr
d -- make data at iaddr
u -- make undefined at iaddr

Further

qira is made of following compoments

  • qemu
  • flask
  • python
  • qiradb

qemu is used to emulate other architecture

Flask is a microframework for Python based on Werkzeug, Jinja 2 and good intentions

The most code of qira is written by Python

qiradb is a python package deal with the instruction trace

Working with ida plugin

Testing environment

  • Windows 10
  • Vmware workstation Pro 12 with Ubuntu 15.10

Install qira 1.2 on Ubuntu 15.10 and port-forwarding 3002 port

Copy qira_ida66_windows.p64 and qira_ida66_windows.plw from qira/ida/bin/ to ida pro plugins/ directory

Open Chrome and IDA PRO on windows 10

It should work like this

ida plugin

· 2 min read

hackmd

https://hackmd.io/

還蠻推薦 hackmd 的

當初是看到 lab 的同學在用

他其中一個功能很好用就是上傳圖片

他會把圖片直接上傳到 http://imgur.com/

再用 markdown 的寫法將圖片插入

![](圖片網址)

hackpad

https://hackpad.com/

這應該是最多人聽過支援 markdown 語法的協作平台

就不介紹了

stackeditor

https://stackedit.io/editor#

線上的 markdown 編輯器

可以和 google drive 和 dropbox 使用

如果有再寫 google blogger 的可以用這個

stackeditor 寫完後就可以直接 publish 算是一個不錯的功能

缺點是沒有辦法和別人共同編輯

Code Formatter

  • python: yapf

https://github.com/google/yapf

  • C/C++: astyle

http://astyle.sourceforge.net/

CTF 相關

Kali Linux

https://www.kali.org/

gdb

gdb peda

https://github.com/longld/peda

讓你的 gdb 介面變好看

還多的一些好用的 funtcion

qira

strace

ltrace

ODA

https://www.onlinedisassembler.com/odaweb/

· 4 min read

Defenition

Symbolic execution (also symbolic evaluation) is a means of analyzing a program to determine what inputs cause each part of a program to execute.

An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would, a case of abstract interpretation.

It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch.

Example

int f() {
...
y = read();
z = y * 2;
if (z == 12) {
fail();
} else {
printf("OK");
}
}

During "concrete" execution, the program would read a concrete input value (e.g., 5) and assign it to y.

During symbolic execution, the program reads a symbolic value (e.g., λ) and assigns it to y.

The program would then proceed with the multiplication and assign λ 2 to z. When reaching the if statement, it would evaluate λ 2 == 12.

At this point of the program, λ could take any value, and symbolic execution can therefore proceed along both branches, by "forking" two paths.

Each path get assigned a copy of the program state at the branch instruction as well as a path constraint.

When paths terminate (e.g., as a result of executing fail() or simply exiting), symbolic execution computes a concrete value for λ by solving the accumulated path constraints on each path.

Limitations

Path Explosion

Symbolically executing all feasible program paths does not scale to large programs.

Program-Dependent Efficacy

Symbolic execution is used to reason about a program path-by-path which is an advantage over reasoning about a program input-by-input as other testing paradigms use (e.g. Dynamic program analysis).

However, if few inputs take the same path through the program, there is little savings over testing each of the inputs separately.

Environment Interactions

Programs interact with their environment by performing system calls, receiving signals, etc. Consistency problems may arise when execution reaches components that are not under control of the symbolic execution tool (e.g., kernel or libraries).

int main()
{
FILE *fp = fopen("doc.txt");
...
if (condition) {
fputs("some data", fp);
} else {
fputs("some other data", fp);
}
...
data = fgets(..., fp);
}

This program opens a file and, based on some condition, writes different kind of data to the file. It then later reads back the written data. In theory, symbolic execution would fork two paths at line 5 and each path from there on would have its own copy of the file. The statement at line 11 would therefore return data that is consistent with the value of "condition" at line 5. In practice, file operations are implemented as system calls in the kernel, and are outside the control of the symbolic execution tool. The main approaches to address this challenge are:

Executing calls to the environment directly. The advantage of this approach is that it is simple to implement. The disadvantage is that the side effects of such calls will clobber all states managed by the symbolic execution engine. In the example above, the instruction at line 11 would return "some datasome other data" or "some other datasomedata" depending on the sequential ordering of the states.

Modeling the environment. In this case, the engine instruments the system calls with a model that simulates their effects and that keeps all the side effects in per-state storage. The advantage is that one would get correct results when symbolically executing programs that interact with the environment. The disadvantage is that one needs to implement and maintain many potentially complex models of system calls. Tools such as KLEE[5] and Cloud9 take this approach by implementing models for file system operations, sockets, IPC, etc.

Forking the entire system state. Symbolic execution tools based on virtual machines solve the environment problem by forking the entire VM state. For example, in S2E[6] each state is an independent VM snapshot that can be executed separately. This approach alleviates the need for writing and maintaining complex models and allows virtually any program binary to be executed symbolically. However, it has higher memory usage overheads (VM snapshots may be large).

Tools

ToolIt can analyze Arch/Langurl
KLEELLVMhttp://klee.github.io/
S2Ex86, x86-64, ARM / User and kernel-mode binarieshttp://s2e.epfl.ch
Tritonx86 and x86-64http://triton.quarkslab.com
angrlibVEX basedhttp://angr.io/

Reference